one.api
Classes for searching, listing and (down)loading ALyx Files.
Module attributes
The number of download threads. |
Functions
ONE API factory. |
Classes
An API for searching and loading data on a local filesystem. |
|
An API for searching and loading data through the Alyx database. |
- ONE(*, mode='remote', wildcards=True, **kwargs)[source]
ONE API factory.
Determine which class to instantiate depending on parameters passed.
- Parameters:
mode (str) – Query mode, options include ‘auto’, ‘local’ (offline) and ‘remote’ (online only). Most methods have a query_type parameter that can override the class mode.
wildcards (bool) – If true all methods use unix shell style pattern matching, otherwise regular expressions are used.
cache_dir (str, pathlib.Path) – Path to the data files. If Alyx parameters have been set up for this location, an OneAlyx instance is returned. If data_dir and base_url are None, the default location is used.
tables_dir (str, pathlib.Path) – An optional location of the cache tables. If None, the tables are assumed to be in the cache_dir.
base_url (str) – An Alyx database URL. The URL must start with ‘http’.
username (str) – An Alyx database login username.
password (str) – An Alyx database password.
cache_rest (str) – If not in ‘local’ mode, this determines which http request types to cache. Default is ‘GET’. Use None to deactivate cache (not recommended).
- Returns:
An One instance if mode is ‘local’, otherwise an OneAlyx instance.
- Return type:
- class One(cache_dir=None, mode='local', wildcards=True, tables_dir=None)[source]
Bases:
ConversionMixin
An API for searching and loading data on a local filesystem.
- uuid_filenames = None
whether datasets on disk have a UUID in their filename.
- Type:
bool
- property offline
True if mode is local or no Web client set.
- Type:
bool
- search_terms(query_type=None) tuple [source]
List the search term keyword args for use in the search method.
- load_cache(tables_dir=None, **kwargs)[source]
Load parquet cache files from a local directory.
- Parameters:
tables_dir (str, pathlib.Path) – An optional directory location of the parquet files, defaults to One._tables_dir.
- save_cache(save_dir=None, force=False)[source]
Save One._cache attribute into parquet tables if recently modified.
- Parameters:
save_dir (str, pathlib.Path) – The directory path into which the tables are saved. Defaults to cache directory.
force (bool) – If True, the cache is saved regardless of modification time.
- refresh_cache(mode='auto')[source]
Check and reload cache tables.
- Parameters:
mode ({'local', 'refresh', 'auto', 'remote'}) – Options are ‘local’ (don’t reload); ‘refresh’ (reload); ‘auto’ (reload if expired); ‘remote’ (don’t reload).
- Returns:
Loaded timestamp.
- Return type:
datetime.datetime
- save_loaded_ids(sessions_only=False, clear_list=True)[source]
Save list of UUIDs corresponding to datasets or sessions where datasets were loaded.
- Parameters:
sessions_only (bool) – If true, save list of experiment IDs, otherwise the full list of dataset IDs.
clear_list (bool) – If true, clear the current list of loaded dataset IDs after saving.
- Returns:
list of str – List of UUIDs.
pathlib.Path – The file path of the saved list.
- search(details=False, query_type=None, **kwargs)[source]
Searches sessions matching the given criteria and returns a list of matching eids
For a list of search terms, use the method
one.search_terms()
For all search parameters, a single value or list may be provided. For dataset, the sessions returned will contain all listed datasets. For the other parameters, the session must contain at least one of the entries.
For all but date_range and number, any field that contains the search string is returned. Wildcards are not permitted, however if wildcards property is True, regular expressions may be used (see notes and examples).
- Parameters:
dataset (str, list) – One or more dataset names. Returns sessions containing all these datasets. A dataset matches if it contains the search string e.g. ‘wheel.position’ matches ‘_ibl_wheel.position.npy’.
dataset_qc_lte (str, int, one.alf.spec.QC) – A dataset QC value, returns sessions with datasets at or below this QC value, including those with no QC set. If dataset not passed, sessions with any passing QC datasets are returned, otherwise all matching datasets must have the QC value or below.
date_range (str, list, datetime.datetime, datetime.date, pandas.timestamp) – A single date to search or a list of 2 dates that define the range (inclusive). To define only the upper or lower date bound, set the other element to None.
lab (str) – A str or list of lab names, returns sessions from any of these labs.
number (str, int) – Number of session to be returned, i.e. number in sequence for a given date.
subject (str, list) – A list of subject nicknames, returns sessions for any of these subjects.
task_protocol (str) – The task protocol name (can be partial, i.e. any task protocol containing that str will be found).
projects (str, list) – The project name(s) (can be partial, i.e. any project containing that str will be found).
details (bool) – If true also returns a dict of dataset details.
query_type (str, None) – Query cache (‘local’) or Alyx database (‘remote’).
- Returns:
list – A list of eids.
(list) – (If details is True) a list of dictionaries, each entry corresponding to a matching session.
Examples
Search for sessions with ‘training’ in the task protocol.
>>> eids = one.search(task='training')
Search for sessions by subject ‘MFD_04’.
>>> eids = one.search(subject='MFD_04')
Do an exact search for sessions by subject ‘FD_04’.
>>> assert one.wildcards is True, 'the wildcards flag must be True for regex expressions' >>> eids = one.search(subject='^FD_04$')
Search for sessions on a given date, in a given lab, containing trials and spike data.
>>> eids = one.search(date='2023-01-01', lab='churchlandlab', dataset=['trials', 'spikes'])
Search for sessions containing trials and spike data where QC for both are WARNING or less.
>>> eids = one.search(dataset_qc_lte='WARNING', dataset=['trials', 'spikes'])
Search for sessions with any datasets that have a QC of PASS or NOT_SET.
>>> eids = one.search(dataset_qc_lte='PASS')
Notes
In default and local mode, most queries are case-sensitive partial matches. When lists are provided, the search is a logical OR, except for datasets, which is a logical AND.
If dataset_qc and datasets are defined, the QC criterion only applies to the provided datasets and all must pass for a session to be returned.
All search terms are true for a session to be returned, i.e. subject matches AND project matches, etc.
In remote mode most queries are case-insensitive partial matches.
In default and local mode, when the one.wildcards flag is True (default), queries are interpreted as regular expressions. To turn this off set one.wildcards to False.
In remote mode regular expressions are only supported using the django argument.
- get_details(eid: str | Path | UUID, full: bool = False)[source]
Return session details for a given session ID
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
full (bool) – If True, returns a DataFrame of session and dataset info
- Returns:
A session record or full DataFrame with dataset information if full is True
- Return type:
pd.Series, pd.DataFrame
- list_subjects() List[str] [source]
List all subjects in database
- Returns:
Sorted list of subject names
- Return type:
list
- list_datasets(eid=None, filename=None, collection=None, revision=None, qc=QC.FAIL, ignore_qc_not_set=False, details=False, query_type=None, default_revisions_only=False, keep_eid_index=False) ndarray | DataFrame [source]
Given an eid, return the datasets for those sessions.
If no eid is provided, a list of all datasets is returned. When details is false, a sorted array of unique datasets is returned (their relative paths).
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
filename (str, dict, list) – Filters datasets and returns only the ones matching the filename. Supports lists asterisks as wildcards. May be a dict of ALF parts.
collection (str, list) – The collection to which the object belongs, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
revision (str) – Filters datasets and returns only the ones matching the revision. Supports asterisks as wildcards.
qc (str, int, one.alf.spec.QC) – Returns datasets at or below this QC level. Integer values should correspond to the QC enumeration NOT the qc category column codes in the pandas table.
ignore_qc_not_set (bool) – When true, do not return datasets for which QC is NOT_SET.
details (bool) – When true, a pandas DataFrame is returned, otherwise a numpy array of relative paths (collection/revision/filename) - see one.alf.spec.describe for details.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’).
default_revisions_only (bool) – When true, only matching datasets that are considered default revisions are returned. If no ‘default_revision’ column is present, and ALFError is raised.
keep_eid_index (bool) – If details is true, this determines whether the returned data frame contains the eid in the index. When false (default) the returned data frame index is the dataset id only, otherwise the index is a MultIndex with levels (eid, id).
- Returns:
Slice of datasets table or numpy array if details is False.
- Return type:
np.ndarray, pd.DataFrame
Examples
List all unique datasets in ONE cache
>>> datasets = one.list_datasets()
List all datasets for a given experiment
>>> datasets = one.list_datasets(eid)
List all datasets for an experiment that match a collection name
>>> probe_datasets = one.list_datasets(eid, collection='*probe*')
List datasets for an experiment that have ‘wheel’ in the filename
>>> datasets = one.list_datasets(eid, filename='*wheel*')
List datasets for an experiment that are part of a ‘wheel’ or ‘trial(s)’ object
>>> datasets = one.list_datasets(eid, {'object': ['wheel', 'trial?']})
- list_collections(eid=None, filename=None, collection=None, revision=None, details=False, query_type=None) ndarray | dict [source]
List the collections for a given experiment.
If no experiment ID is given, all collections are returned.
- Parameters:
eid ([str, UUID, Path, dict]) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path
filename (str, dict, list) – Filters datasets and returns only the collections containing matching datasets. Supports lists asterisks as wildcards. May be a dict of ALF parts.
collection (str, list) – Filter by a given pattern. Supports asterisks as wildcards.
revision (str) – Filters collections and returns only the ones with the matching revision. Supports asterisks as wildcards
details (bool) – If true a dict of pandas datasets tables is returned with collections as keys, otherwise a numpy array of unique collections
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’)
- Returns:
A list of unique collections or dict of datasets tables
- Return type:
list, dict
Examples
List all unique collections in ONE cache
>>> collections = one.list_collections()
List all collections for a given experiment
>>> collections = one.list_collections(eid)
List all collections for a given experiment and revision
>>> revised = one.list_collections(eid, revision='2020-01-01')
List all collections that have ‘probe’ in the name.
>>> collections = one.list_collections(eid, collection='*probe*')
List collections for an experiment that have datasets with ‘wheel’ in the name
>>> collections = one.list_collections(eid, filename='*wheel*')
List collections for an experiment that contain numpy datasets
>>> collections = one.list_collections(eid, {'extension': 'npy'})
- list_revisions(eid=None, filename=None, collection=None, revision=None, details=False, query_type=None)[source]
List the revisions for a given experiment.
If no experiment id is given, all collections are returned.
- Parameters:
eid (str, UUID, Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
filename (str, dict, list) – Filters datasets and returns only the revisions containing matching datasets. Supports lists asterisks as wildcards. May be a dict of ALF parts.
collection (str, list) – Filter by a given collection. Supports asterisks as wildcards.
revision (str, list) – Filter by a given pattern. Supports asterisks as wildcards.
details (bool) – If true a dict of pandas datasets tables is returned with collections as keys, otherwise a numpy array of unique collections.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’).
- Returns:
A list of unique collections or dict of datasets tables.
- Return type:
list, dict
Examples
List all revisions in ONE cache
>>> revisions = one.list_revisions()
List all revisions for a given experiment
>>> revisions = one.list_revisions(eid)
List all revisions for a given experiment that contain the trials object
>>> revisions = one.list_revisions(eid, filename={'object': 'trials'})
List all revisions for a given experiment that start with 2020 or 2021
>>> revisions = one.list_revisions(eid, revision=['202[01]*'])
- load_object(eid: str | Path | UUID, obj: str, collection: str | None = None, revision: str | None = None, query_type: str | None = None, download_only: bool = False, check_hash: bool = True, **kwargs) AlfBunch | List[ALFPath] [source]
Load all attributes of an ALF object from a Session ID and an object name.
Any datasets with matching object name will be loaded.
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
obj (str) – The ALF object to load. Supports asterisks as wildcards.
collection (str) – The collection to which the object belongs, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
revision (str) – The dataset revision (typically an ISO date). If no exact match, the previous revision (ordered lexicographically) is returned. If None, the default revision is returned (usually the most recent revision). Regular expressions/wildcards not permitted.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’).
download_only (bool) – When true the data are downloaded and the file path is returned. NB: The order of the file path list is undefined.
check_hash (bool) – Consider dataset missing if local file hash does not match. In online mode, the dataset will be re-downloaded.
kwargs – Additional filters for datasets, including namespace and timescale. For full list see the
one.alf.spec.describe()
function.
- Returns:
An ALF bunch or if download_only is True, a list of one.alf.path.ALFPath objects.
- Return type:
one.alf.io.AlfBunch, list
Examples
>>> load_object(eid, 'moves') >>> load_object(eid, 'trials') >>> load_object(eid, 'spikes', collection='*probe01') # wildcards is True >>> load_object(eid, 'spikes', collection='.*probe01') # wildcards is False >>> load_object(eid, 'spikes', namespace='ibl') >>> load_object(eid, 'spikes', timescale='ephysClock')
Load specific attributes:
>>> load_object(eid, 'spikes', attribute=['times*', 'clusters'])
- load_dataset(eid: str | Path | UUID, dataset: str, collection: str | None = None, revision: str | None = None, query_type: str | None = None, download_only: bool = False, check_hash: bool = True) Any [source]
Load a single dataset for a given session id and dataset name.
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
dataset (str, dict) – The ALF dataset to load. May be a string or dict of ALF parts. Supports asterisks as wildcards.
collection (str) – The collection to which the object belongs, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
revision (str) – The dataset revision (typically an ISO date). If no exact match, the previous revision (ordered lexicographically) is returned. If None, the default revision is returned (usually the most recent revision). Regular expressions/wildcards not permitted.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’)
download_only (bool) – When true the data are downloaded and the file path is returned.
check_hash (bool) – Consider dataset missing if local file hash does not match. In online mode, the dataset will be re-downloaded.
- Returns:
Dataset or a ALFPath object if download_only is true.
- Return type:
np.ndarray, one.alf.path.ALFPath
Examples
>>> intervals = one.load_dataset(eid, '_ibl_trials.intervals.npy')
Load dataset without specifying extension
>>> intervals = one.load_dataset(eid, 'trials.intervals') # wildcard mode only >>> intervals = one.load_dataset(eid, '.*trials.intervals.*') # regex mode only >>> intervals = one.load_dataset(eid, dict(object='trials', attribute='intervals')) >>> filepath = one.load_dataset(eid, '_ibl_trials.intervals.npy', download_only=True) >>> spike_times = one.load_dataset(eid, 'spikes.times.npy', collection='alf/probe01') >>> old_spikes = one.load_dataset(eid, 'spikes.times.npy', ... collection='alf/probe01', revision='2020-08-31') >>> old_spikes = one.load_dataset(eid, 'alf/probe01/#2020-08-31#/spikes.times.npy')
- Raises:
ValueError – When a relative paths is provided (e.g. ‘collection/#revision#/object.attribute.ext’), the collection and revision keyword arguments must be None.
one.alf.exceptions.ALFObjectNotFound – The dataset was not found in the cache or on disk.
one.alf.exceptions.ALFMultipleCollectionsFound – The dataset provided exists in multiple collections or matched multiple different files. Provide a specific collection to load, and make sure any wildcard/regular expressions are specific enough.
Warning
- UserWarning
When a relative paths is provided (e.g. ‘collection/#revision#/object.attribute.ext’), wildcards/regular expressions must not be used. To use wildcards, pass the collection and revision as separate keyword arguments.
- load_datasets(eid: str | Path | UUID, datasets: List[str], collections: str | None = None, revisions: str | None = None, query_type: str | None = None, assert_present=True, download_only: bool = False, check_hash: bool = True) Any [source]
Load datasets for a given session id.
Returns two lists the length of datasets. The first is the data (or file paths if download_data is false), the second is a list of meta data Bunches. If assert_present is false, missing data will be returned as None.
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
datasets (list of strings) – The ALF datasets to load. May be a string or dict of ALF parts. Supports asterisks as wildcards.
collections (str, list) – The collection(s) to which the object(s) belong, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
revisions (str, list) – The dataset revision (typically an ISO date). If no exact match, the previous revision (ordered lexicographically) is returned. If None, the default revision is returned (usually the most recent revision). Regular expressions/wildcards not permitted.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’)
assert_present (bool) – If true, missing datasets raises and error, otherwise None is returned
download_only (bool) – When true the data are downloaded and the file path is returned.
check_hash (bool) – Consider dataset missing if local file hash does not match. In online mode, the dataset will be re-downloaded.
- Returns:
list – A list of data (or file paths) the length of datasets.
list – A list of meta data Bunches. If assert_present is False, missing data will be None.
Notes
There are three ways the datasets may be formatted: the object.attribute; the file name (including namespace and extension); the ALF components as a dict; the dataset path, relative to the session path, e.g. collection/object.attribute.ext.
When relative paths are provided (e.g. ‘collection/#revision#/object.attribute.ext’), wildcards/regular expressions must not be used. To use wildcards, pass the collection and revision as separate keyword arguments.
To ensure you are loading the correct revision, use the revisions kwarg instead of relative paths.
To load an exact revision (i.e. not the last revision before a given date), pass in a list of relative paths or a data frame.
- Raises:
ValueError – When a relative paths is provided (e.g. ‘collection/#revision#/object.attribute.ext’), the collection and revision keyword arguments must be None.
ValueError – If a list of collections or revisions are provided, they must match the number of datasets passed in.
TypeError – The datasets argument must be a non-string iterable.
one.alf.exceptions.ALFObjectNotFound – One or more of the datasets was not found in the cache or on disk. To suppress this error and return None for missing datasets, use assert_present=False.
one.alf.exceptions.ALFMultipleCollectionsFound – One or more of the dataset(s) provided exist in multiple collections. Provide the specific collections to load, and if using wildcards/regular expressions, make sure the expression is specific enough.
Warning
- UserWarning
When providing a list of relative dataset paths, this warning occurs if one or more of the datasets are not marked as default revisions. Avoid such warnings by explicitly passing in the required revisions with the revisions keyword argument.
- load_dataset_from_id(dset_id: str | UUID, download_only: bool = False, details: bool = False, check_hash: bool = True) Any [source]
Load a dataset given a dataset UUID.
- Parameters:
dset_id (uuid.UUID, str) – A dataset UUID to load.
download_only (bool) – If true the dataset is downloaded (if necessary) and the filepath returned.
details (bool) – If true a pandas Series is returned in addition to the data.
check_hash (bool) – Consider dataset missing if local file hash does not match. In online mode, the dataset will be re-downloaded.
- Returns:
Dataset data (or filepath if download_only) and dataset record if details is True.
- Return type:
np.ndarray, one.alf.path.ALFPath
- load_collection(eid: str | Path | UUID, collection: str, object: str | None = None, revision: str | None = None, query_type: str | None = None, download_only: bool = False, check_hash: bool = True, **kwargs) Bunch | List[ALFPath] [source]
Load all objects in an ALF collection from a Session ID. Any datasets with matching object name(s) will be loaded. Returns a bunch of objects.
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
collection (str) – The collection to which the object belongs, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
object (str) – The ALF object to load. Supports asterisks as wildcards.
revision (str) – The dataset revision (typically an ISO date). If no exact match, the previous revision (ordered lexicographically) is returned. If None, the default revision is returned (usually the most recent revision). Regular expressions/wildcards not permitted.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’)
download_only (bool) – When true the data are downloaded and the file path is returned.
check_hash (bool) – Consider dataset missing if local file hash does not match. In online mode, the dataset will be re-downloaded.
kwargs – Additional filters for datasets, including namespace and timescale. For full list see the one.alf.spec.describe function.
- Returns:
A Bunch of objects or if download_only is True, a list of ALFPath objects.
- Return type:
Bunch of one.alf.io.AlfBunch, list of one.alf.path.ALFPath
Examples
>>> alf_collection = load_collection(eid, 'alf') >>> load_collection(eid, '*probe01', object=['spikes', 'clusters']) # wildcards is True >>> files = load_collection(eid, '', download_only=True) # Base session dir
- Raises:
alferr.ALFError – No datasets exist for the provided session collection.
alferr.ALFObjectNotFound – No datasets match the object, attribute or revision filters for this collection.
- static setup(cache_dir=None, silent=False, **kwargs)[source]
Set up One cache tables for a given data directory.
- Parameters:
cache_dir (pathlib.Path, str) – A path to the ALF data directory.
silent ((False) bool) – When True will prompt for cache_dir, if cache_dir is None, and overwrite cache if any. When False will use cwd for cache_dir, if cache_dir is None, and use existing cache.
kwargs – Optional arguments to pass to one.alf.cache.make_parquet_db.
- Returns:
An instance of One for the provided cache directory.
- Return type:
- class OneAlyx(username=None, password=None, base_url=None, cache_dir=None, mode='remote', wildcards=True, tables_dir=None, **kwargs)[source]
Bases:
One
An API for searching and loading data through the Alyx database.
- load_cache(tables_dir=None, clobber=False, tag=None)[source]
Load parquet cache files. If the local cache is sufficiently old, this method will query the database for the location and creation date of the remote cache. If newer, it will be download and loaded.
Note: Unlike refresh_cache, this will always reload the local files at least once.
- Parameters:
tables_dir (str, pathlib.Path) – An optional directory location of the parquet files, defaults to One._tables_dir.
clobber (bool) – If True, query Alyx for a newer cache even if current (local) cache is recent.
tag (str) – An optional Alyx dataset tag for loading cache tables containing a subset of datasets.
Examples
To load the cache tables for a given release tag >>> one.load_cache(tag=’2022_Q2_IBL_et_al_RepeatedSite’)
To reset the cache tables after loading a tag >>> ONE.cache_clear() … one = ONE()
- property alyx
The Alyx Web client
- Type:
- property cache_dir
The location of the downloaded file cache
- Type:
pathlib.Path
- search_terms(query_type=None, endpoint=None)[source]
Returns a list of search terms to be passed as kwargs to the search method.
- Parameters:
query_type (str) – If ‘remote’, the search terms are largely determined by the REST endpoint used.
endpoint (str) – If ‘remote’, specify the endpoint to search terms for.
- Returns:
Tuple of search strings.
- Return type:
tuple
- describe_dataset(dataset_type=None)[source]
Print a dataset type description.
NB: This requires an Alyx database connection.
- Parameters:
dataset_type (str) – A dataset type or dataset name.
- Returns:
The Alyx dataset type record.
- Return type:
dict
- list_datasets(eid=None, filename=None, collection=None, revision=None, qc=QC.FAIL, ignore_qc_not_set=False, details=False, query_type=None, default_revisions_only=False, keep_eid_index=False) ndarray | DataFrame [source]
Given an eid, return the datasets for those sessions.
If no eid is provided, a list of all datasets is returned. When details is false, a sorted array of unique datasets is returned (their relative paths).
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
filename (str, dict, list) – Filters datasets and returns only the ones matching the filename. Supports lists asterisks as wildcards. May be a dict of ALF parts.
collection (str, list) – The collection to which the object belongs, e.g. ‘alf/probe01’. This is the relative path of the file from the session root. Supports asterisks as wildcards.
revision (str) – Filters datasets and returns only the ones matching the revision. Supports asterisks as wildcards.
qc (str, int, one.alf.spec.QC) – Returns datasets at or below this QC level. Integer values should correspond to the QC enumeration NOT the qc category column codes in the pandas table.
ignore_qc_not_set (bool) – When true, do not return datasets for which QC is NOT_SET.
details (bool) – When true, a pandas DataFrame is returned, otherwise a numpy array of relative paths (collection/revision/filename) - see one.alf.spec.describe for details.
query_type (str) – Query cache (‘local’) or Alyx database (‘remote’).
default_revisions_only (bool) – When true, only matching datasets that are considered default revisions are returned. If no ‘default_revision’ column is present, and ALFError is raised.
keep_eid_index (bool) – If details is true, this determines whether the returned data frame contains the eid in the index. When false (default) the returned data frame index is the dataset id only, otherwise the index is a MultIndex with levels (eid, id).
- Returns:
Slice of datasets table or numpy array if details is False.
- Return type:
np.ndarray, pd.DataFrame
Examples
List all unique datasets in ONE cache
>>> datasets = one.list_datasets()
List all datasets for a given experiment
>>> datasets = one.list_datasets(eid)
List all datasets for an experiment that match a collection name
>>> probe_datasets = one.list_datasets(eid, collection='*probe*')
List datasets for an experiment that have ‘wheel’ in the filename
>>> datasets = one.list_datasets(eid, filename='*wheel*')
List datasets for an experiment that are part of a ‘wheel’ or ‘trial(s)’ object
>>> datasets = one.list_datasets(eid, {'object': ['wheel', 'trial?']})
- list_aggregates(relation: str, identifier: str = None, dataset=None, revision=None, assert_unique=False)[source]
List datasets aggregated over a given relation.
- Parameters:
relation (str) – The thing over which the data were aggregated, e.g. ‘subjects’ or ‘tags’.
identifier (str) – The ID of the datasets, e.g. for data over subjects this would be lab/subject.
dataset (str, dict, list) – Filters datasets and returns only the ones matching the filename. Supports lists asterisks as wildcards. May be a dict of ALF parts.
revision (str) – Filters datasets and returns only the ones matching the revision. Supports asterisks as wildcards.
assert_unique (bool) – When true an error is raised if multiple collections or datasets are found.
- Returns:
The matching aggregate dataset records.
- Return type:
pandas.DataFrame
Examples
List datasets aggregated over a specific subject’s sessions
>>> trials = one.list_aggregates('subjects', 'SP026')
- load_aggregate(relation: str, identifier: str, dataset=None, revision=None, download_only=False)[source]
Load a single aggregated dataset for a given string identifier.
Loads data aggregated over a relation such as subject, project or tag.
- Parameters:
relation (str) – The thing over which the data were aggregated, e.g. ‘subjects’ or ‘tags’.
identifier (str) – The ID of the datasets, e.g. for data over subjects this would be lab/subject.
dataset (str, dict, list) – Filters datasets and returns only the ones matching the filename. Supports lists asterisks as wildcards. May be a dict of ALF parts.
revision (str) – Filters datasets and returns only the ones matching the revision. Supports asterisks as wildcards.
download_only (bool) – When true the data are downloaded and the file path is returned.
- Returns:
Dataset or a ALFPath object if download_only is true.
- Return type:
pandas.DataFrame, one.alf.path.ALFPath
- Raises:
alferr.ALFObjectNotFound – No datasets match the object, attribute or revision filters for this relation and identifier. Matching dataset was not found on disk (neither on the remote repository or locally).
Examples
Load a dataset aggregated over a specific subject’s sessions
>>> trials = one.load_aggregate('subjects', 'SP026', '_ibl_subjectTraining.table')
- pid2eid(pid: str, query_type=None) -> (<class 'str'>, <class 'str'>)[source]
Given an Alyx probe UUID string, returns the session id string and the probe label (i.e. the ALF collection).
NB: Requires a connection to the Alyx database.
- Parameters:
pid (str, uuid.UUID) – A probe UUID.
query_type (str) – Query mode - options include ‘remote’, and ‘refresh’.
- Returns:
str – Experiment ID (eid).
str – Probe label.
- eid2pid(eid, query_type=None, details=False)[source]
Given an experiment UUID (eID), returns the probe IDs and the probe labels (i.e. the ALF collection).
NB: Requires a connection to the Alyx database.
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
query_type (str) – Query mode - options include ‘remote’, and ‘refresh’.
details (bool) – Additionally return the complete Alyx records from insertions endpoint.
- Returns:
list of str – Probe UUIDs (pID).
list of str – Probe labels, e.g. ‘probe00’.
list of dict (optional) – If details is true, returns the Alyx records from insertions endpoint.
- search_insertions(details=False, query_type=None, **kwargs)[source]
Searches insertions matching the given criteria and returns a list of matching probe IDs.
For a list of search terms, use the method
one.search_terms(query_type=’remote’, endpoint=’insertions’)
All of the search parameters, apart from dataset and dataset type require a single value. For dataset and dataset type, a single value or a list can be provided. Insertions returned will contain all listed datasets.
- Parameters:
session (str) – A session eid, returns insertions associated with the session.
name (str) – An insertion label, returns insertions with specified name.
lab (str) – A lab name, returns insertions associated with the lab.
subject (str) – A subject nickname, returns insertions associated with the subject.
task_protocol (str) – A task protocol name (can be partial, i.e. any task protocol containing that str will be found).
project(s) (str) – The project name (can be partial, i.e. any task protocol containing that str will be found).
dataset (str) – A (partial) dataset name. Returns sessions containing matching datasets. A dataset matches if it contains the search string e.g. ‘wheel.position’ matches ‘_ibl_wheel.position.npy’. C.f. datasets argument.
datasets (str, list) – One or more exact dataset names. Returns insertions containing all these datasets.
dataset_qc_lte (int, str, one.alf.spec.QC) – The maximum QC value for associated datasets.
dataset_types (str, list) – One or more dataset_types (exact matching).
details (bool) – If true also returns a dict of dataset details.
query_type (str, None) – Query cache (‘local’) or Alyx database (‘remote’).
limit (int) – The number of results to fetch in one go (if pagination enabled on server).
- Returns:
list – List of probe IDs (pids).
(list of dicts) – If details is True, also returns a list of dictionaries, each entry corresponding to a matching insertion.
Notes
This method does not use the local cache and therefore can not work in ‘local’ mode.
Examples
List the insertions associated with a given data release
>>> tag = '2022_Q2_IBL_et_al_RepeatedSite' ... ins = one.search_insertions(django='datasets__tags__name,' + tag)
- search(details=False, query_type=None, **kwargs)[source]
Searches sessions matching the given criteria and returns a list of matching eids.
For a list of search terms, use the method
one.search_terms(query_type=’remote’)
For all search parameters, a single value or list may be provided. For dataset, the sessions returned will contain all listed datasets. For the other parameters, the session must contain at least one of the entries.
For all but date_range and number, any field that contains the search string is returned. Wildcards are not permitted, however if wildcards property is True, regular expressions may be used (see notes and examples).
- Parameters:
dataset (str) – A (partial) dataset name. Returns sessions containing matching datasets. A dataset matches if it contains the search string e.g. ‘wheel.position’ matches ‘_ibl_wheel.position.npy’. C.f. datasets argument.
date_range (str, list, datetime.datetime, datetime.date, pandas.timestamp) – A single date to search or a list of 2 dates that define the range (inclusive). To define only the upper or lower date bound, set the other element to None.
lab (str, list) – A str or list of lab names, returns sessions from any of these labs (can be partial, i.e. any task protocol containing that str will be found).
number (str, int) – Number of session to be returned, i.e. number in sequence for a given date.
subject (str, list) – A list of subject nicknames, returns sessions for any of these subjects (can be partial, i.e. any task protocol containing that str will be found).
task_protocol (str, list) – The task protocol name (can be partial, i.e. any task protocol containing that str will be found).
project(s) (str, list) – The project name (can be partial, i.e. any task protocol containing that str will be found).
performance_gte (performance_lte /) – Search only for sessions whose performance is less equal or greater equal than a pre-defined threshold as a percentage (0-100).
users (str, list) – A list of users.
location (str, list) – A str or list of lab location (as per Alyx definition) name. Note: this corresponds to the specific rig, not the lab geographical location per se.
dataset_types (str, list) – One or more of dataset_types.
datasets (str, list) – One or more (exact) dataset names. Returns sessions containing all of these datasets.
dataset_qc_lte (int, str, one.alf.spec.QC) – The maximum QC value for associated datasets.
details (bool) – If true also returns a dict of dataset details.
query_type (str, None) – Query cache (‘local’) or Alyx database (‘remote’).
limit (int) – The number of results to fetch in one go (if pagination enabled on server).
- Returns:
list – List of eids.
(list of dicts) – If details is True, also returns a list of dictionaries, each entry corresponding to a matching session.
Examples
Search for sessions with ‘training’ in the task protocol.
>>> eids = one.search(task='training')
Search for sessions by subject ‘MFD_04’.
>>> eids = one.search(subject='MFD_04')
Do an exact search for sessions by subject ‘FD_04’.
>>> assert one.wildcards is True, 'the wildcards flag must be True for regex expressions' >>> eids = one.search(subject='^FD_04$', query_type='local')
Search for sessions on a given date, in a given lab, containing trials and spike data.
>>> eids = one.search(date='2023-01-01', lab='churchlandlab', dataset=['trials', 'spikes'])
Notes
In default and local mode, most queries are case-sensitive partial matches. When lists are provided, the search is a logical OR, except for datasets, which is a logical AND.
All search terms are true for a session to be returned, i.e. subject matches AND project matches, etc.
In remote mode most queries are case-insensitive partial matches.
In default and local mode, when the one.wildcards flag is True (default), queries are interpreted as regular expressions. To turn this off set one.wildcards to False.
In remote mode regular expressions are only supported using the django argument.
- static setup(base_url=None, **kwargs)[source]
Set up OneAlyx for a given database
- Parameters:
base_url (str) – An Alyx database URL. If None, the current default database is used.
kwargs – Optional arguments to pass to one.params.setup.
- Returns:
An instance of OneAlyx for the newly set up database URL
- Return type:
See also
- eid2path(eid, query_type=None) ALFPath | Sequence[ALFPath] [source]
From an experiment ID gets the local session path
- Parameters:
eid (str, UUID, pathlib.Path, dict, list) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
query_type (str) – If set to ‘remote’, will force database connection.
- Returns:
A session path or list of session paths.
- Return type:
one.alf.path.ALFPath, list
- path2eid(path_obj: str | Path, query_type=None) str | Sequence[str] [source]
From a local path, gets the experiment ID
- Parameters:
path_obj (str, pathlib.Path, list) – Local path or list of local paths.
query_type (str) – If set to ‘remote’, will force database connection.
- Returns:
An eid or list of eids.
- Return type:
str, list
- path2url(filepath, query_type=None) str [source]
Given a local file path, returns the URL of the remote file.
- Parameters:
filepath (str, pathlib.Path) – A local file path
query_type (str) – If set to ‘remote’, will force database connection
- Returns:
A URL string
- Return type:
str
- type2datasets(eid, dataset_type, details=False)[source]
Get list of datasets belonging to a given dataset type for a given session
- Parameters:
eid (str, UUID, pathlib.Path, dict) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
dataset_type (str, list) – An Alyx dataset type, e.g. camera.times or a list of dtypes
details (bool) – If True, a datasets DataFrame is returned
- Returns:
A numpy array of data, or DataFrame if details is true
- Return type:
np.ndarray, dict
- dataset2type(dset) str [source]
Return dataset type from dataset.
NB: Requires an Alyx database connection
- Parameters:
dset (str, np.ndarray, tuple) – A dataset name, dataset uuid or dataset integer id
- Returns:
The dataset type
- Return type:
str
- describe_revision(revision, full=False)[source]
Print description of a revision
- Parameters:
revision (str) – The name of the revision (without ‘#’)
full (bool) – If true, returns the matching record
- Returns:
None if full is false or no record found, otherwise returns record as dict
- Return type:
None, dict
- get_details(eid: str, full: bool = False, query_type=None)[source]
Return session details for a given session.
- Parameters:
eid (str, UUID, pathlib.Path, dict, list) – Experiment session identifier; may be a UUID, URL, experiment reference string details dict or Path.
full (bool) – If True, returns a DataFrame of session and dataset info.
query_type ({'local', 'refresh', 'auto', 'remote'}) – The query mode - if ‘local’ the details are taken from the cache tables; if ‘remote’ the details are returned from the sessions REST endpoint; if ‘auto’ uses whichever mode ONE is in; if ‘refresh’ reloads the cache before querying.
- Returns:
in local mode - a session record or full DataFrame with dataset information if full is True; in remote mode - a full or partial session dict.
- Return type:
pd.Series, pd.DataFrame, dict
- Raises:
ValueError – Invalid experiment ID (failed to parse into eid string).
requests.exceptions.HTTPError – [Errno 404] Remote session not found on Alyx.