Searching with ONE

ONE contains a method that allows you to search for sessions of interest. The possible search terms can be listed using,

[7]:

from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')

print(one.search_terms())

('auto_datetime', 'limit', 'django', 'end_time', 'performance_gte', 'parent_session', 'project', 'dataset_qc_lte', 'id', 'start_time', 'histology', 'offset', 'performance_lte', 'procedures', 'json', 'narrative', 'name', 'atlas_name', 'nickname', 'number', 'type', 'extended_qc', 'atlas_id', 'date_range', 'n_trials', 'n_correct_trials', 'projects', 'laboratory', 'qc', 'task_protocol', 'datasets', 'users', 'dataset_types', 'subject', 'location', 'tag', 'atlas_acronym')

We can search for sessions within a specified date range (inclusive)

[8]:

from pprint import pprint
eids = one.search(date_range=['2023-10-25', '2023-10-25'])
pprint(list(eids))

[UUID('024113d9-245e-4b67-afdb-9a7213d94446'),
 UUID('9119c8d3-9894-4a88-9420-a6fa8ea05a0e'),
 UUID('329f5a74-93ec-48de-a569-cf40a386f712')]

A single date can be provided instead of a range. Also to define only the upper or lower date bound, set the other element to None.

[9]:

assert one.search(date_range=['2023-10-25', '2023-10-25']) == one.search(date_range='2023-10-25')

To get more information about the sessions we can add a details=True flag

[10]:

eids, details = one.search(date_range=['2023-10-25', '2023-10-25'], details=True)
pprint(details)

[{'date': datetime.date(2023, 10, 25),
  'id': '024113d9-245e-4b67-afdb-9a7213d94446',
  'lab': 'wittenlab',
  'number': 1,
  'projects': ['witten_learning_dop'],
  'start_time': '2023-10-25T11:01:50.579936',
  'subject': 'fip_912',
  'task_protocol': '_iblrig_tasks_FPROptoChoiceWorld6.4.2',
  'url': 'https://openalyx.internationalbrainlab.org/sessions/024113d9-245e-4b67-afdb-9a7213d94446'},
 {'date': datetime.date(2023, 10, 25),
  'id': '9119c8d3-9894-4a88-9420-a6fa8ea05a0e',
  'lab': 'wittenlab',
  'number': 1,
  'projects': ['witten_learning_dop'],
  'start_time': '2023-10-25T09:55:24.599600',
  'subject': 'fip_911',
  'task_protocol': '_iblrig_tasks_FPROptoChoiceWorld6.4.2',
  'url': 'https://openalyx.internationalbrainlab.org/sessions/9119c8d3-9894-4a88-9420-a6fa8ea05a0e'},
 {'date': datetime.date(2023, 10, 25),
  'id': '329f5a74-93ec-48de-a569-cf40a386f712',
  'lab': 'wittenlab',
  'number': 2,
  'projects': ['witten_learning_dop'],
  'start_time': '2023-10-25T08:57:09.804297',
  'subject': 'fip_910',
  'task_protocol': '_iblrig_tasks_FPLOptoChoiceWorld6.4.2',
  'url': 'https://openalyx.internationalbrainlab.org/sessions/329f5a74-93ec-48de-a569-cf40a386f712'}]

Multiple search terms can also be combined, for example we can search for any sessions from the subject SWC_043 that contain the datasets spikes.times and spikes.clusters

[11]:

eids = one.search(subject='SWC_043', datasets=['spikes.times.npy', 'spikes.clusters.npy'])
pprint(eids)

[UUID('4ecb5d24-f5cc-402c-be28-9d0f7cb14b3a'),
 UUID('c6db3304-c906-400c-aa0f-45dd3945b2ea'),
 UUID('88d24c31-52e4-49cc-9f32-6adbeb9eba87'),
 UUID('6fb1e12c-883b-46d1-a745-473cde3232c8'),
 UUID('695a6073-eae0-49e0-bb0f-e9e57a9275b9'),
 UUID('6f09ba7e-e3ce-44b0-932b-c003fb44fb89'),
 UUID('f3ce3197-d534-4618-bf81-b687555d1883')]

More search terms are available when making local queries (using the local cache instead of the remote Alyx database). You can view all the local search terms with the ‘local’ arg:

[12]:

one.search_terms('remote')
eids = one.search(performance_gte=70, query_type='remote')

Warning.

Local search queries behave slightly differently. See “Gochas” below.

Search term arguments may be shortened, so long as they are not ambiguous:

[13]:

assert one.search(task_protocol='training', date_range='2020-03-23') == one.search(task='training', date='2020-03-23')

one.search(dat='2020-01-01') will raise a ValueError as ‘dat’ could be short for both ‘date_range’ and ‘dataset’.

Warning.

There are more search terms when using remote mode, for example ‘data’ can match both ‘dataset’ and ‘datasets’ in remote mode.

To find out more information about the one.search method we can use the help function

[14]:

help(one.search)

Help on method search in module one.api:

search(details=False, query_type=None, **kwargs) method of one.api.OneAlyx instance
    Searches sessions matching the given criteria and returns a list of matching eids.

    For a list of search terms, use the method

        one.search_terms(query_type='remote')

    For all search parameters, a single value or list may be provided.  For `dataset`,
    the sessions returned will contain all listed datasets.  For the other parameters,
    the session must contain at least one of the entries.

    For all but `date_range` and `number`, any field that contains the search string is
    returned.  Wildcards are not permitted, however if wildcards property is True, regular
    expressions may be used (see notes and examples).

    Parameters
    ----------
    datasets : str, list
        One or more (exact) dataset names. Returns sessions containing all of these datasets.
    date_range : str, list, datetime.datetime, datetime.date, pandas.timestamp
        A single date to search or a list of 2 dates that define the range (inclusive).  To
        define only the upper or lower date bound, set the other element to None.
    lab : str, list
        A str or list of lab names, returns sessions from any of these labs (can be partial,
        i.e. any task protocol containing that str will be found).
    number : str, int
        Number of session to be returned, i.e. number in sequence for a given date.
    subject : str, list
        A list of subject nicknames, returns sessions for any of these subjects (can be
        partial, i.e. any task protocol containing that str will be found).
    task_protocol : str, list
        The task protocol name (can be partial, i.e. any task protocol containing that str
        will be found).
    project(s) : str, list
        The project name (can be partial, i.e. any task protocol containing that str
        will be found).
    performance_lte / performance_gte : float
        Search only for sessions whose performance is less equal or greater equal than a
        pre-defined threshold as a percentage (0-100).
    users : str, list
        A list of users.
    location : str, list
        A str or list of lab location (as per Alyx definition) name.
        Note: this corresponds to the specific rig, not the lab geographical location per se.
    dataset_types : str, list
        One or more of dataset_types. Unlike with `datasets`, the dataset types for the
        sessions returned may not be reachable (i.e. for recent sessions the datasets may not
        yet be available).
    dataset_qc_lte : int, str, one.alf.spec.QC
        The maximum QC value for associated datasets. NB: Without `datasets`, not all
        associated datasets with the matching QC values are guarenteed to be reachable.
    details : bool
        If true also returns a dict of dataset details.
    query_type : str, None
        Query cache ('local') or Alyx database ('remote').
    limit : int
        The number of results to fetch in one go (if pagination enabled on server).

    Returns
    -------
    list of UUID
        List of eids.
    (list of dicts)
        If details is True, also returns a list of dictionaries, each entry corresponding to a
        matching session.

    Examples
    --------
    Search for sessions with 'training' in the task protocol.

    >>> eids = one.search(task='training')

    Search for sessions by subject 'MFD_04'.

    >>> eids = one.search(subject='MFD_04')

    Do an exact search for sessions by subject 'FD_04'.

    >>> assert one.wildcards is True, 'the wildcards flag must be True for regex expressions'
    >>> eids = one.search(subject='^FD_04$', query_type='local')

    Search for sessions on a given date, in a given lab, containing trials and spike data.

    >>> eids = one.search(date='2023-01-01', lab='churchlandlab', dataset=['trials', 'spikes'])

    Notes
    -----
    - In default and local mode, most queries are case-sensitive partial matches. When lists
      are provided, the search is a logical OR, except for `datasets`, which is a logical AND.
    - All search terms are true for a session to be returned, i.e. subject matches AND project
      matches, etc.
    - In remote mode most queries are case-insensitive partial matches.
    - In default and local mode, when the one.wildcards flag is True (default), queries are
      interpreted as regular expressions. To turn this off set one.wildcards to False.
    - In remote mode regular expressions are only supported using the `django` argument.
    - In remote mode, only the `datasets` argument returns sessions where datasets are
      registered *and* exist. Using `dataset_types` or `dataset_qc_lte` without `datasets`
      will not check that the datasets are reachable.

Advanced searching

By default ONE searches most terms function as an LIKE OR expression, returning results that contain any of the search values as a substring. For example one.search(subject=['foo', 'bar']) returns all sessions where the subject name contains ‘foo’ or contains ‘bar’. The exception is the dataset search term, which is a LIKE AND expression, i.e. the session must contain one or more dataset names containing ‘foo’ AND one or more datasets containing ‘bar’. Note that all expressions are case-sensitive in local mode and case-insensitive in remote mode.

For more precise searches, regular expressions (a.k.a. regex) can be used in local mode. This is on by default and can be deactivated by setting the wildcards flag: one.wildcards = False (note that this also affects the list and load methods, see the advanced loading section of Loading with ONE for more details).

Regex allows one to make exact searches by asserting the start and end of the string:

[15]:

eids = one.search(subject='FD_04', query_type='local')  # includes sessions with subject 'MFD_04'
assert one.wildcards is True, 'the wildcards flag must be True for regex expressions'
eids = one.search(subject='^FD_04$', query_type='local')  # exact subject name match

Likewise, to search for sessions that include one dataset OR another, we can use the | character in our regex:

[16]:

# Sessions containing either leftCamera.times OR rightCamera.times:
eids = one.search(proj='brainwide', datasets='leftCamera\.times|rightCamera\.times', query_type='local')
# XOR expressions are also possible:
eids = one.search(proj='brainwide', datasets='(leftCamera\.times|rightCamera\.times){1}', query_type='local')

Note that the wildcards flag causes certain characters to be interpreted differently (e.g. . matches any character). To avoid this, either set the wildcards flag to False or escape the string using re.escape:

[17]:

import re
subject = 'NYU-14.1'
if one.wildcards:
    subject = re.escape(subject)
eids = one.search(subject=subject, query_type='local')  # 'NYU\\-14\\.1'

Gotchas

ONE.search strikes a balance between usability, functionality, and stability. We have been careful to ensure that the results of the search function have remained consistent across versions, however there are some confusing and unintuitive behaviours as a result…

Difference between search term behaviours

As mentioned above, different search terms perform differently. Below are the search terms and their approximate SQL equivalents:

Term	Lookup
dataset	LIKE AND
dataset_qc_lte	<=
number	EXACT
date_range	BETWEEN
subject, etc.	LIKE OR

Combinations of terms form a logical AND, for example one.search(subject=['foo', 'bar'], project='baz') searches for sessions where the subject name contains foo OR bar, AND the project contains baz. NB: When dataset_qc_lte which is provided with datasets, sessions are returned where ALL matching datasets have a less than or equal QC value. When dataset_qc_lte is provided alone, sessions are returned where ANY of the datasets have a less than or equal QC value.

Difference between remote mode search terms

Many search terms perform differently between local mode and remote mode, namely in remote mode, search queries are case-insensitive.

The dataset, datasets and dataset_types remote arguments

In remote mode there are two ways to search for datasets:

datasets - an exact, case-sensitive match of one or more datasets. All datasets must be present. If dataset_qc provided, this criterion applies only to these datasets.
dataset_type - an exact, case-sensitive match of one or more dataset types. All dataset types must be present.

Regex systems between modes

Regex searches can be made in remote mode by using special Django queries, for example,

eids = one.search(django='subject__nickname__regex,^FD_04$', query_type='remote')

Regular expression syntax is different between modes, however: in remote mode the regex is parsed by a PostgreSQL database, while in other modes it is done using Python’s re.search.

Searching data with a release tag

Datasets associated with a given paper and/or data release are associated with a tag. You can list the available release tags like so:

[18]:

assert not one.offline, 'ONE must be online to query tags'
tags = one.alyx.rest('tags', 'list')
for tag in tags:
    print('%s: %s' % (tag['name'], tag['description']))

2021_Q1_IBL_et_al_Behaviour: https://doi.org/10.7554/eLife.63711
2021_Q2_PreRelease: https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522/3
2021_Q2_Varol_et_al: https://doi.org/10.1109/ICASSP39728.2021.9414145
2021_Q3_Whiteway_et_al: https://doi.org/10.1371/journal.pcbi.1009439
2022_Q2_IBL_et_al_RepeatedSite: https://doi.org/10.1101/2022.05.09.491042
2022_Q3_IBL_et_al_DAWG: https://doi.org/10.1101/827873
2022_Q4_IBL_et_al_BWM: https://figshare.com/articles/preprint/Data_release_-_Brainwide_map_-_Q4_2022/21400815
2023_Q1_Biderman_Whiteway_et_al:
2023_Q1_Mohammadi_et_al:
2023_Q3_Findling_Hubert_et_al: https://doi.org/10.1101/2023.07.04.547684
2023_Q4_Bruijns_et_al:
2023_Q4_IBL_et_al_BWM_2:
2023_Q4_IBL_et_al_BWM_passive:
2024_Q2_Blau_et_al:
2024_Q2_IBL_et_al_BWM_iblsort: Spike sorting output with ibl-sorter 1.7.0 for BWM
2024_Q2_IBL_et_al_RepeatedSite: https://doi.org/10.1101/2022.05.09.491042
2024_Q3_Pan_Vazquez_et_al:
2025_Q1_IBL_et_al_BWM_wheel_patch: 62 patched sessions with reversed wheel polarity
audio sync FPGA patch: For a number of important ephys sessions the audio was somehow not wired into the FPGA, however
everything else was present and the Bpod recorded these TTLs so we decided to use the bpod2fpga
interpolation to recover the audio TTLs in FPGA time. These were then added to the _spikeglx_sync
object and the trials were re-extracted. These data were patched and the _spikeglx_sync datasets
were protected so that they would not be overwritten in the future.
Brainwidemap:
RepeatedSite:

You can download a cache table for any given release tag, allowing you to restrict your ONE.search queries to a given tag in local mode. See FAQ section ‘How do I download the datasets cache for a specific IBL paper release?’ for more information.

To search for session containing datasets assocaited with a given release tag in remote mode, you can use the following query parameter:

[19]:

tag = '2021_Q1_IBL_et_al_Behaviour'
eids = one.search(django='data_dataset_session_related__tags__name,' + tag, query_type='remote')

Searching insertions

A session may contain multiple insertions recording different brain areas. To find data associated with a specific brain area, it is useful to search by insertion instead of by session.

The OneAlyx.search_insertions method takes similar arguments to the remote search method, and returns a list of probe UUIDs (pIDs), which can be interconverted with session IDs (eIDs) for loading data.

Note.

The search_insertions method is only available in remote mode.

[20]:

one.search_terms('remote', 'insertions')
pids = one.search_insertions(atlas_acronym=['STR', 'CA3'], query_type='remote')

For searching insertions associated with a given release tag, see the method examples by typing help(one.search_insertions).