Loading with ONE
The datasets are organized into directory trees by subject, date and session number. For a given session there are data files grouped by object (e.g. ‘trials’), each with a specific attribute (e.g. ‘rewardVolume’). The dataset name follows the pattern ‘object.attribute’, for example ‘trials.rewardVolume’. For more information, see the ALF documentation.
An experiment ID (eid) is a string that uniquely identifies a session, for example a combinationof subject date and number (e.g. KS023/2019-12-10/001), a file path (e.g. C:\Users\Subjects\KS023\2019-12-10001), or a UUID (e.g. aad23144-0e52-4eac-80c5-c4ee2decb198).
If the data don’t exist locally, they will be downloaded, then loaded.
[36]:
from pprint import pprint
from one.api import ONE
import one.alf.io as alfio
one = ONE(base_url='https://openalyx.internationalbrainlab.org', silent=True)
# To load all the data for a given object, use the load_object method:
eid = 'KS023/2019-12-10/001' # subject/date/number
trials = one.load_object(eid, 'trials') # Returns a dict-like object of numpy arrays
The attributes of returned object mirror the datasets:
[37]:
print(trials.keys())
# The data can be accessed with dot syntax
print(trials.rewardVolume[:5])
# ... or dictionary syntax
print(trials['rewardVolume'][:5])
dict_keys(['contrastLeft', 'intervals', 'response_times', 'stimOff_times', 'goCueTrigger_times', 'itiDuration', 'goCue_times', 'contrastRight', 'intervals_bpod', 'feedbackType', 'stimOn_times', 'choice', 'firstMovement_times', 'rewardVolume', 'feedback_times', 'probabilityLeft'])
[1.5 1.5 1.5 0. 1.5]
[1.5 1.5 1.5 0. 1.5]
All arrays in the object have the same length (the size of the first dimension) and can therefore be converted to a DataFrame:
[38]:
trials.to_df().head()
# For analysis you can assert that the dimensions match using the check_dimensions property:
assert trials.check_dimensions == 0
If we only want to load in certain attributes of an object we can use the following
[39]:
trials = one.load_object(eid, 'trials', attribute=['intervals', 'rewardVolume', 'probabilityLeft'])
print(trials.keys())
dict_keys(['intervals', 'intervals_bpod', 'rewardVolume', 'probabilityLeft'])
Datasets can be individually downloaded using the load_dataset
method. This function takes an experiment ID and a dataset name as positional args.
[40]:
reward_volume = one.load_dataset(eid, '_ibl_trials.rewardVolume.npy') # c.f. load_object, above
We can use the load_datasets method to load multiple datasets at once. This method returns two lists, the first which contains the data for each dataset and the second which contains meta information about the data.
Note.
When the assert_present
flag can be set to false, if a given dataset doesn’t exist a None is returned instead of raising an exception.
[41]:
data, info = one.load_datasets(eid, datasets=['_ibl_trials.rewardVolume.npy',
'_ibl_trials.probabilityLeft.npy'])
pprint(info[0])
{'exists': True,
'file_size': 5256.0,
'hash': '819ae9cc4643cc7ed6cf8453e6cec339',
'id_0': 8593347991464373244,
'id_1': -3444378546711777370,
'rel_path': 'alf/_ibl_trials.rewardVolume.npy',
'revision': '',
'session_path': 'public/cortexlab/Subjects/KS023/2019-12-10/001'}
Collections
For any given session there may be multiple datasets with the same name that are organized into separate subfolders called collections. For example there may be spike times for two probes, one in ‘alf/probe00/spikes.times.npy’, the other in ‘alf/probe01/spikes.times.npy’. In IBL, the ‘alf’ directory (for ALyx Files) contains the main datasets that people use. Raw data is in other directories.
In this case you must specify the collection when multiple matching datasets are found:
[42]:
probe1_spikes = one.load_dataset(eid, 'spikes.times.npy', collection='alf/probe01')
It is also possible to load datasets from different collections at the same time. For example if we want to simultaneously load a trials dataset and a clusters dataset we would type,
[43]:
data, info = one.load_datasets(eid, datasets=['_ibl_trials.rewardVolume.npy', 'clusters.waveforms.npy'],
collections=['alf', 'alf/probe01'])
Revisions
Revisions provide an optional way to organize data by version. The version label is arbitrary, however the folder must start and end with pound signs and is typically an ISO date, e.g. “#2021-01-01#”. Unlike collections, if a specified revision is not found, the previous revision will be returned. The revisions are ordered lexicographically.
intervals = one.load_dataset(eid, 'trials.intervals.npy', revision='2021-03-15a')
Download only
By default the load methods will download any missing data, then load and return the data. When the ‘download_only’ kwarg is true, the data are not loaded. Instead a list of file paths are returned, and any missing datasets are represented by None.
[44]:
files = one.load_object(eid, 'trials', download_only=True)
You can load objects and datasets from a file path
[45]:
trials = one.load_object(files[0], 'trials')
contrast_left = one.load_dataset(files[0], files[0].name)
Advanced loading
The load methods typically require an exact match, therefore when loading ‘_ibl_wheel.position .npy’ one.load_dataset(eid, 'wheel.position.npy')
will raise an exception because the namespace is missing. Likewise one.load_object(eid, 'trial')
will fail because ‘trial’ != ‘trials’.
Loading can be done using unix shell style wildcards, allowing you to load objects and datasets that match a particular pattern, e.g. one.load_dataset(eid, '*wheel.position.npy')
.
By default wildcard mode is on. In this mode, the extension may be omitted, e.g. one.load_dataset(eid, 'spikes.times')
. This is equivalent to ‘spikes.times.*’. Note that an exception will be raised if datasets with more than one extension are found (such as ‘spikes.times.npy’ and ‘spikes.times.csv’). When loading a dataset with extra parts, the extension (or wildcard) is explicitly required: ‘spikes.times.part1.*’.
If you set the wildcards property of One to False, loading will be done using regular expressions, allowing for more powerful pattern matching.
Below is table showing how to express unix style wildcards as a regular expression:
Regex |
Wildcard |
Description |
Example |
---|---|---|---|
.* |
* |
Match zero or more chars |
spikes.times.* |
.? |
? |
Match one char |
timestamps.?sv |
[] |
[] |
Match a range of chars |
obj.attr.part[0-9].npy |
NB: In regex ‘.’ means ‘any character’; to match ‘.’ exactly, escape it with a backslash
Examples: spikes.times.* (regex), spikes.times* (wildcard) matches…
spikes.times.npy
spikes.times
spikes.times_ephysClock.npy
spikes.times.bin
clusters.uuids..?sv (regex), clusters.uuids.?sv (wildcard) matches...
clusters.uuids.ssv
clusters.uuids.csv
alf/probe0[0-5] (regex), alf/probe0[0-5] (wildcard) matches...
alf/probe00
alf/probe01
[...]
alf/probe05
Filtering attributes
To download and load only a subset of attributes, you can provide a list to the attribute kwarg.
[46]:
spikes = one.load_object(eid, 'spikes', collection='alf/probe01', attribute=['time*', 'clusters'])
assert 'amps' not in spikes
Loading with file name parts
You may also specify specific parts of the filename for even more specific filtering. Here a list of options will be treated as a logical OR
Note.
All fields accept wildcards.
[47]:
dataset = dict(object='spikes', attribute='times', extension=['npy', 'bin'])
probe1_spikes = one.load_dataset(eid, dataset, collection='alf/probe01')
More regex examples
one.wildcards = False
Load specific attributes from an object (‘|’ represents a logical OR in regex)
spikes = one.load_object(eid, 'spikes', collection='alf/probe01', attribute='times|clusters')
assert 'amps' not in spikes
Load a dataset ignoring any namespace or extension:
spike_times = one.load_dataset(eid, '.*spikes.times.*', collection='alf/probe01')
List all datasets in any probe collection (matches 0 or more of any number)
dsets = one.list_datasets(eid, collection='alf/probe[0-9]*')
Load object attributes that are not delimited text files (i.e. tsv, ssv, csv, etc.)
files = one.load_object(eid, 'clusters', extension='[^sv]*', download_only=True)
assert not any(str(x).endswith('csv') for x in files)
Load spike times from a probe UUID
pid = 'b749446c-18e3-4987-820a-50649ab0f826'
session, probe = one.pid2eid(pid)
spikes_times = one.load_dataset(session, 'spikes.times.npy', collection=f'alf/{probe}')
List all probes for a session
print([x for x in one.list_collections(session) if 'alf/probe' in x])
Loading with relative paths
You may also the complete dataset path, relative to the session path. When doing this the path must be complete (i.e. without wildcards) and the collection and revision arguments must be None.
Note.
To ensure you’re loading the default revision (usually the most recent and correct data), do not explicitly provide the relative path or revision, and ONE will return the default automatically.
spikes_times = one.load_dataset(eid, 'alf/probe00/spikes.times.npy')
Download all the raw data for a given session*
dsets = one.list_datasets(eid, collection='raw_*_data')
one.load_datasets(eid, dsets, download_only=True)
*NB: This will download all revisions of the same data; for this reason it is better to objects and collections individually, or to provide dataset names instead of relative paths.
Loading with timeseries
For loading a dataset along with its timestamps, alf.io.read_ts can be used. It requires a filepath as input.
[48]:
files = one.load_object(eid, 'spikes', collection='alf/probe01', download_only=True)
ts, clusters = alfio.read_ts(files[1])
Loading collections
You can load whole collections with the load_collection
method. For example to load the spikes and clusters objects for probe01:
[ ]:
probe01 = one.load_collection(eid, '*probe01', object=['spikes', 'clusters'])
probe01.spikes.times[:5]
The download_only flag here provides a simple way to download all datasets within a collection:
one.load_collection(eid, 'alf/probe01', download_only=True)
More information about these methods can be found using the help command
[49]:
help(one.load_dataset)
Help on method load_dataset in module one.api:
load_dataset(eid: Union[str, pathlib.Path, uuid.UUID], dataset: str, collection: Union[str, NoneType] = None, revision: Union[str, NoneType] = None, query_type: Union[str, NoneType] = None, download_only: bool = False, **kwargs) -> Any method of one.api.OneAlyx instance
Load a single dataset for a given session id and dataset name
Parameters
----------
eid : str, UUID, pathlib.Path, dict
Experiment session identifier; may be a UUID, URL, experiment reference string
details dict or Path.
dataset : str, dict
The ALF dataset to load. May be a string or dict of ALF parts. Supports asterisks as
wildcards.
collection : str
The collection to which the object belongs, e.g. 'alf/probe01'.
This is the relative path of the file from the session root.
Supports asterisks as wildcards.
revision : str
The dataset revision (typically an ISO date). If no exact match, the previous
revision (ordered lexicographically) is returned. If None, the default revision is
returned (usually the most recent revision). Regular expressions/wildcards not
permitted.
query_type : str
Query cache ('local') or Alyx database ('remote')
download_only : bool
When true the data are downloaded and the file path is returned.
Returns
-------
Dataset or a Path object if download_only is true.
Examples
--------
intervals = one.load_dataset(eid, '_ibl_trials.intervals.npy')
# Load dataset without specifying extension
intervals = one.load_dataset(eid, 'trials.intervals') # wildcard mode only
intervals = one.load_dataset(eid, '*trials.intervals*') # wildcard mode only
filepath = one.load_dataset(eid '_ibl_trials.intervals.npy', download_only=True)
spike_times = one.load_dataset(eid 'spikes.times.npy', collection='alf/probe01')
old_spikes = one.load_dataset(eid, 'spikes.times.npy',
collection='alf/probe01', revision='2020-08-31')
Loading aggregate datasets
All raw and preprocessed data are stored at the session level, however some datasets are aggregated over a subject, project, or tag (called a ‘relation’). Such datasets can be loaded using the load_aggregate
method.
Note.
NB: This method is only available in ‘remote’ mode.
[ ]:
subject = 'SWC_043'
subject_trials = one.load_aggregate('subjects', subject, '_ibl_subjectTrials.table')