Searching with ONE
ONE contains a method that allows you to search for sessions of interest. The possible search terms can be listed using,
[19]:
from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')
print(one.search_terms())
('dataset', 'date_range', 'laboratory', 'number', 'projects', 'subject', 'task_protocol')
We can search for sessions within a specified date range (inclusive)
[20]:
from pprint import pprint
eids = one.search(date_range=['2021-01-01', '2021-01-01'])
pprint(eids)
['ef91b4d0-02a3-48c4-b6ad-610d346e5f68', 'b4e3383c-6cdb-49af-81a1-39b8f88aa5fd']
A single date can be provided instead of a range. Also to define only the upper or lower date bound, set the other element to None.
[21]:
assert one.search(date_range=['2021-01-01', '2021-01-01']) == one.search(date_range='2021-01-01')
To get more information about the sessions we can add a details=True flag
[22]:
eids, details = one.search(date_range=['2021-01-01', '2021-01-01'], details=True)
pprint(details)
[{'date': datetime.date(2021, 1, 1),
'lab': 'wittenlab',
'number': 2,
'projects': 'witten_learning_dop',
'subject': 'fip_12',
'task_protocol': '_iblrig_tasks_FPChoiceWorld6.4.2'},
{'date': datetime.date(2021, 1, 1),
'lab': 'wittenlab',
'number': 2,
'projects': 'witten_learning_dop',
'subject': 'fip_11',
'task_protocol': '_iblrig_tasks_FP_biasedChoiceWorld6.4.2'}]
Multiple search terms can also be combined, for example we can search for any sessions from the subject SWC_043 that contain the datasets spikes.times and spikes.clusters
[23]:
eids = one.search(subject='SWC_043', dataset=['spikes.times', 'spikes.clusters'])
pprint(eids)
['4ecb5d24-f5cc-402c-be28-9d0f7cb14b3a',
'c6db3304-c906-400c-aa0f-45dd3945b2ea',
'88d24c31-52e4-49cc-9f32-6adbeb9eba87',
'6fb1e12c-883b-46d1-a745-473cde3232c8',
'695a6073-eae0-49e0-bb0f-e9e57a9275b9',
'6f09ba7e-e3ce-44b0-932b-c003fb44fb89',
'f3ce3197-d534-4618-bf81-b687555d1883']
More search terms are available when making remote queries (using the remote Alyx database instead of the local cache). You can view all the remote search terms with the ‘remote’ arg:
[24]:
one.search_terms('remote')
eids = one.search(performance_gte=70, query_type='remote')
Warning.
Remote search queries behave slightly differently. See “Gochas” below.
Search term arguments may be shortened, so long as they are not ambiguous:
[25]:
assert one.search(task_protocol='training') == one.search(task='training')
assert one.search(project='brainwide') == one.search(proj='brainwide')
assert one.search(date_range='2021-01-01') == one.search(date='2021-01-01')
one.search(dat='2020-01-01')
will raise a ValueError as ‘dat’ could be short for both ‘date_range’ and ‘dataset’.
Warning.
There are more search terms when using remote mode, for example ‘data’ can match both ‘dataset’ and ‘datasets’ in remote mode.
To find out more information about the one.search
method we can use the help function
[26]:
help(one.search)
Help on method search in module one.api:
search(details=False, query_type=None, **kwargs) method of one.api.OneAlyx instance
Searches sessions matching the given criteria and returns a list of matching eids.
For a list of search terms, use the method
one.search_terms(query_type='remote')
For all of the search parameters, a single value or list may be provided. For `dataset`,
the sessions returned will contain all listed datasets. For the other parameters,
the session must contain at least one of the entries.
For all but `date_range` and `number`, any field that contains the search string is
returned. Wildcards are not permitted, however if wildcards property is True, regular
expressions may be used (see notes and examples).
Parameters
----------
dataset : str
A (partial) dataset name. Returns sessions containing matching datasets.
A dataset matches if it contains the search string e.g. 'wheel.position' matches
'_ibl_wheel.position.npy'. C.f. `datasets` argument.
date_range : str, list, datetime.datetime, datetime.date, pandas.timestamp
A single date to search or a list of 2 dates that define the range (inclusive). To
define only the upper or lower date bound, set the other element to None.
lab : str, list
A str or list of lab names, returns sessions from any of these labs (can be partial,
i.e. any task protocol containing that str will be found).
number : str, int
Number of session to be returned, i.e. number in sequence for a given date.
subject : str, list
A list of subject nicknames, returns sessions for any of these subjects (can be
partial, i.e. any task protocol containing that str will be found).
task_protocol : str, list
The task protocol name (can be partial, i.e. any task protocol containing that str
will be found).
project(s) : str, list
The project name (can be partial, i.e. any task protocol containing that str
will be found).
performance_lte / performance_gte : float
Search only for sessions whose performance is less equal or greater equal than a
pre-defined threshold as a percentage (0-100).
users : str, list
A list of users.
location : str, list
A str or list of lab location (as per Alyx definition) name.
Note: this corresponds to the specific rig, not the lab geographical location per se.
dataset_types : str, list
One or more of dataset_types.
datasets : str, list
One or more (exact) dataset names. Returns insertions containing all of these datasets.
details : bool
If true also returns a dict of dataset details.
query_type : str, None
Query cache ('local') or Alyx database ('remote').
limit : int
The number of results to fetch in one go (if pagination enabled on server).
Returns
-------
list
List of eids.
(list of dicts)
If details is True, also returns a list of dictionaries, each entry corresponding to a
matching session.
Examples
--------
Search for sessions with 'training' in the task protocol.
>>> eids = one.search(task='training')
Search for sessions by subject 'MFD_04'.
>>> eids = one.search(subject='MFD_04')
Do an exact search for sessions by subject 'FD_04'.
>>> assert one.wildcards is True, 'the wildcards flag must be True for regex expressions'
>>> eids = one.search(subject='^FD_04$', query_type='local')
Search for sessions on a given date, in a given lab, containing trials and spike data.
>>> eids = one.search(date='2023-01-01', lab='churchlandlab', dataset=['trials', 'spikes'])
Notes
-----
- In default and local mode, most queries are case-sensitive partial matches. When lists
are provided, the search is a logical OR, except for `datasets`, which is a logical AND.
- All search terms are true for a session to be returned, i.e. subject matches AND project
matches, etc.
- In remote mode most queries are case-insensitive partial matches.
- In default and local mode, when the one.wildcards flag is True (default), queries are
interpreted as regular expressions. To turn this off set one.wildcards to False.
- In remote mode regular expressions are only supported using the `django` argument.
Advanced searching
By default ONE searches most terms function as an LIKE OR expression, returning results that contain any of the search values as a substring. For example one.search(subject=['foo', 'bar'])
returns all sessions where the subject name contains ‘foo’ or contains ‘bar’. The exception is the dataset search term, which is a LIKE AND expression, i.e. the session must contain one or more dataset names containing ‘foo’ AND one or more datasets containing ‘bar’. Note that all expressions are
case-sensitive in auto/local mode and case-insensitive in remote mode.
For more precise searches, regular expressions (a.k.a. regex) can be used. This is on by default and can be deactivated by setting the wildcards flag: one.wildcards = False
(note that this also affects the list and load methods, see the advanced loading section of Loading with ONE for more details).
Regex allows one to make exact searches by asserting the start and end of the string:
[27]:
eids = one.search(subject='FD_04') # includes sessions with subject 'MFD_04'
assert one.wildcards is True, 'the wildcards flag must be True for regex expressions'
eids = one.search(subject='^FD_04$') # exact subject name match
Likewise, to search for sessions that include one dataset OR another, we can use the |
character in our regex:
[28]:
# Sessions containing either leftCamera.times OR rightCamera.times:
eids = one.search(proj='brainwide', dataset='leftCamera\.times|rightCamera\.times')
# XOR expressions are also possible:
eids = one.search(proj='brainwide', dataset='(leftCamera\.times|rightCamera\.times){1}')
C:\Users\User\Documents\Github\ONE\one\api.py:460: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.
return all(any(x.str.contains(y, regex=self.wildcards) & exists) for y in dsets)
Note that the wildcards flag causes certain characters to be interpreted differently (e.g. .
matches any character). To avoid this, either set the wildcards flag to False or escape the string using re.escape
:
[29]:
import re
subject = 'NYU-14.1'
if one.wildcards:
subject = re.escape(subject)
eids = one.search(subject=subject) # 'NYU\\-14\\.1'
Gotchas
ONE.search strikes a balance between usability, functionality, and stability. We have been careful to ensure that the results of the search function have remained consistent across versions, however there are some confusing and unintuitive behaviours as a result…
Difference between search term behaviours
As mentioned above, different search terms perform differently. Below are the search terms and their approximate SQL equivalents:
Term |
Lookup |
---|---|
dataset |
LIKE AND |
dataset_qc_lte |
<= |
number |
EXACT |
date_range |
BETWEEN |
subject, etc. |
LIKE OR |
Combinations of terms form a logical AND, for example one.search(subject=['foo', 'bar'], project='baz')
searches for sessions where the subject name contains foo OR bar, AND the project contains baz. NB: When dataset_qc_lte
which is provided with dataset(s)
, sessions are returned where ALL matching datasets have a less than or equal QC value. When dataset_qc_lte
is provided alone, sessions are returned where ANY of the datasets have a less than or equal QC value.
Difference between remote mode search terms
Many search terms perform differently between auto/local mode and remote mode, namely in remote mode, search queries are case-insensitive.
The dataset, datasets and dataset_types remote arguments
In remote mode there are three ways to search for datasets:
dataset - a partial, case-insensitive match of a single dataset (multiple datasets not supported).
datasets - an exact, case-sensitive match of one or more datasets. All datasets must be present. If
dataset_qc
provided, this criterion applies only to these datasets.dataset_type - an exact, case-sensitive match of one or more dataset types. All dataset types must be present.
Regex systems between modes
Regex searches can be made in remote mode by using special Django queries, for example,
eids = one.search(django='subject__nickname__regex,^FD_04$', query_type='remote')
Regular expression syntax is different between modes, however: in remote mode the regex is parsed by a PostgreSQL database, while in other modes it is done using Python’s re.search.
Searching data with a release tag
Datasets associated with a given paper and/or data release are associated with a tag. You can list the available release tags like so:
[30]:
assert not one.offline, 'ONE must be online to query tags'
tags = one.alyx.rest('tags', 'list')
for tag in tags:
print('%s: %s' % (tag['name'], tag['description']))
2021_Q1_IBL_et_al_Behaviour: https://doi.org/10.7554/eLife.63711
2021_Q2_PreRelease: https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522/3
2021_Q2_Varol_et_al: https://doi.org/10.1109/ICASSP39728.2021.9414145
2021_Q3_Whiteway_et_al: https://doi.org/10.1371/journal.pcbi.1009439
2022_Q2_IBL_et_al_RepeatedSite: https://doi.org/10.1101/2022.05.09.491042
2022_Q3_IBL_et_al_DAWG: https://doi.org/10.1101/827873
2022_Q4_IBL_et_al_BWM: https://figshare.com/articles/preprint/Data_release_-_Brainwide_map_-_Q4_2022/21400815
2023_Q1_Biderman_Whiteway_et_al:
2023_Q1_Mohammadi_et_al:
You can download a cache table for any given release tag, allowing you to restrict your ONE.search queries to a given tag in auto/local mode. See FAQ section ‘How do I download the datasets cache for a specific IBL paper release?’ for more information.
To search for session containing datasets assocaited with a given release tag in remote mode, you can use the following query parameter:
[31]:
tag = '2021_Q1_IBL_et_al_Behaviour'
eids = one.search(django='data_dataset_session_related__tags__name,' + tag, query_type='remote')
Searching insertions
A session may contain multiple insertions recording different brain areas. To find data associated with a specific brain area, it is useful to search by insertion instead of by session.
The OneAlyx.search_insertions
method takes similar arguments to the remote search method, and returns a list of probe UUIDs (pIDs), which can be interconverted with session IDs (eIDs) for loading data.
Note.
The search_insertions method is only available in remote mode.
[32]:
one.search_terms('remote', 'insertions')
pids = one.search_insertions(atlas_acronym=['STR', 'CA3'], query_type='remote')
For searching insertions associated with a given release tag, see the method examples by typing help(one.search_insertions)
.