Loading with ONE

The datasets are organized into directory trees by subject, date and session number. For a given session there are data files grouped by object (e.g. ‘trials’), each with a specific attribute (e.g. ‘rewardVolume’). The dataset name follows the pattern ‘object.attribute’, for example ‘trials.rewardVolume’. For more information, see the ALF documentation.

An experiment ID (eid) is a string that uniquely identifies a session, for example a combinationof subject date and number (e.g. KS023/2019-12-10/001), a file path (e.g. C:\Users\Subjects\KS023\2019-12-10001), or a UUID (e.g. aad23144-0e52-4eac-80c5-c4ee2decb198).

If the data don’t exist locally, they will be downloaded, then loaded.

[36]:
from pprint import pprint
from one.api import ONE
import one.alf.io as alfio

one = ONE(base_url='https://openalyx.internationalbrainlab.org', silent=True)

# To load all the data for a given object, use the load_object method:
eid = 'KS023/2019-12-10/001'  # subject/date/number
trials = one.load_object(eid, 'trials')  # Returns a dict-like object of numpy arrays

The attributes of returned object mirror the datasets:

[37]:
print(trials.keys())
# The data can be accessed with dot syntax
print(trials.rewardVolume[:5])
# ... or dictionary syntax
print(trials['rewardVolume'][:5])
dict_keys(['contrastLeft', 'intervals', 'response_times', 'stimOff_times', 'goCueTrigger_times', 'itiDuration', 'goCue_times', 'contrastRight', 'intervals_bpod', 'feedbackType', 'stimOn_times', 'choice', 'firstMovement_times', 'rewardVolume', 'feedback_times', 'probabilityLeft'])
[1.5 1.5 1.5 0.  1.5]
[1.5 1.5 1.5 0.  1.5]

All arrays in the object have the same length (the size of the first dimension) and can therefore be converted to a DataFrame:

[38]:
trials.to_df().head()

# For analysis you can assert that the dimensions match using the check_dimensions property:
assert trials.check_dimensions == 0

If we only want to load in certain attributes of an object we can use the following

[39]:
trials = one.load_object(eid, 'trials', attribute=['intervals', 'rewardVolume', 'probabilityLeft'])
print(trials.keys())
dict_keys(['intervals', 'intervals_bpod', 'rewardVolume', 'probabilityLeft'])

Datasets can be individually downloaded using the load_dataset method. This function takes an experiment ID and a dataset name as positional args.

[40]:
reward_volume = one.load_dataset(eid, '_ibl_trials.rewardVolume.npy')  # c.f. load_object, above

We can use the load_datasets method to load multiple datasets at once. This method returns two lists, the first which contains the data for each dataset and the second which contains meta information about the data.

Note.

When the assert_present flag can be set to false, if a given dataset doesn’t exist a None is returned instead of raising an exception.

[41]:
data, info = one.load_datasets(eid, datasets=['_ibl_trials.rewardVolume.npy',
                                              '_ibl_trials.probabilityLeft.npy'])
pprint(info[0])
{'exists': True,
 'file_size': 5256.0,
 'hash': '819ae9cc4643cc7ed6cf8453e6cec339',
 'id_0': 8593347991464373244,
 'id_1': -3444378546711777370,
 'rel_path': 'alf/_ibl_trials.rewardVolume.npy',
 'revision': '',
 'session_path': 'public/cortexlab/Subjects/KS023/2019-12-10/001'}

Collections

For any given session there may be multiple datasets with the same name that are organized into separate subfolders called collections. For example there may be spike times for two probes, one in ‘alf/probe00/spikes.times.npy’, the other in ‘alf/probe01/spikes.times.npy’. In IBL, the ‘alf’ directory (for ALyx Files) contains the main datasets that people use. Raw data is in other directories.

In this case you must specify the collection when multiple matching datasets are found:

[42]:
probe1_spikes = one.load_dataset(eid, 'spikes.times.npy', collection='alf/probe01')

It is also possible to load datasets from different collections at the same time. For example if we want to simultaneously load a trials dataset and a clusters dataset we would type,

[43]:
data, info = one.load_datasets(eid, datasets=['_ibl_trials.rewardVolume.npy', 'clusters.waveforms.npy'],
                               collections=['alf', 'alf/probe01'])

Revisions

Revisions provide an optional way to organize data by version. The version label is arbitrary, however the folder must start and end with pound signs and is typically an ISO date, e.g. “#2021-01-01#”. Unlike collections, if a specified revision is not found, the previous revision will be returned. The revisions are ordered lexicographically.

intervals = one.load_dataset(eid, 'trials.intervals.npy', revision='2021-03-15a')

Download only

By default the load methods will download any missing data, then load and return the data. When the ‘download_only’ kwarg is true, the data are not loaded. Instead a list of file paths are returned, and any missing datasets are represented by None.

[44]:
files = one.load_object(eid, 'trials', download_only=True)

You can load objects and datasets from a file path

[45]:
trials = one.load_object(files[0], 'trials')
contrast_left = one.load_dataset(files[0], files[0].name)

Advanced loading

The load methods typically require an exact match, therefore when loading ‘_ibl_wheel.position .npy’ one.load_dataset(eid, 'wheel.position.npy') will raise an exception because the namespace is missing. Likewise one.load_object(eid, 'trial') will fail because ‘trial’ != ‘trials’.

Loading can be done using unix shell style wildcards, allowing you to load objects and datasets that match a particular pattern, e.g. one.load_dataset(eid, '*wheel.position.npy').

By default wildcard mode is on. In this mode, the extension may be omitted, e.g. one.load_dataset(eid, 'spikes.times'). This is equivalent to ‘spikes.times.*’. Note that an exception will be raised if datasets with more than one extension are found (such as ‘spikes.times.npy’ and ‘spikes.times.csv’). When loading a dataset with extra parts, the extension (or wildcard) is explicitly required: ‘spikes.times.part1.*’.

If you set the wildcards property of One to False, loading will be done using regular expressions, allowing for more powerful pattern matching.

Below is table showing how to express unix style wildcards as a regular expression:

Regex

Wildcard

Description

Example

.*

*

Match zero or more chars

spikes.times.*

.?

?

Match one char

timestamps.?sv

[]

[]

Match a range of chars

obj.attr.part[0-9].npy

NB: In regex ‘.’ means ‘any character’; to match ‘.’ exactly, escape it with a backslash

Examples: spikes.times.* (regex), spikes.times* (wildcard) matches…

    spikes.times.npy
    spikes.times
    spikes.times_ephysClock.npy
    spikes.times.bin

clusters.uuids..?sv (regex), clusters.uuids.?sv (wildcard) matches...

    clusters.uuids.ssv
    clusters.uuids.csv

alf/probe0[0-5] (regex), alf/probe0[0-5] (wildcard) matches...

    alf/probe00
    alf/probe01
    [...]
    alf/probe05

Filtering attributes

To download and load only a subset of attributes, you can provide a list to the attribute kwarg.

[46]:
spikes = one.load_object(eid, 'spikes', collection='alf/probe01', attribute=['time*', 'clusters'])
assert 'amps' not in spikes

Loading with file name parts

You may also specify specific parts of the filename for even more specific filtering. Here a list of options will be treated as a logical OR

Note.

All fields accept wildcards.

[47]:
dataset = dict(object='spikes', attribute='times', extension=['npy', 'bin'])
probe1_spikes = one.load_dataset(eid, dataset, collection='alf/probe01')

More regex examples

one.wildcards = False

Load specific attributes from an object (‘|’ represents a logical OR in regex)

spikes = one.load_object(eid, 'spikes', collection='alf/probe01', attribute='times|clusters')
assert 'amps' not in spikes

Load a dataset ignoring any namespace or extension:

spike_times = one.load_dataset(eid, '.*spikes.times.*', collection='alf/probe01')

List all datasets in any probe collection (matches 0 or more of any number)

dsets = one.list_datasets(eid, collection='alf/probe[0-9]*')

Load object attributes that are not delimited text files (i.e. tsv, ssv, csv, etc.)

files = one.load_object(eid, 'clusters', extension='[^sv]*', download_only=True)
assert not any(str(x).endswith('csv') for x in files)

Load spike times from a probe UUID

pid = 'b749446c-18e3-4987-820a-50649ab0f826'
session, probe = one.pid2eid(pid)
spikes_times = one.load_dataset(session, 'spikes.times.npy', collection=f'alf/{probe}')

List all probes for a session

print([x for x in one.list_collections(session) if 'alf/probe' in x])

Loading with relative paths

You may also the complete dataset path, relative to the session path. When doing this the path must be complete (i.e. without wildcards) and the collection and revision arguments must be None.

Note.

To ensure you’re loading the default revision (usually the most recent and correct data), do not explicitly provide the relative path or revision, and ONE will return the default automatically.

spikes_times = one.load_dataset(eid, 'alf/probe00/spikes.times.npy')

Download all the raw data for a given session*

dsets = one.list_datasets(eid, collection='raw_*_data')
one.load_datasets(eid, dsets, download_only=True)

*NB: This will download all revisions of the same data; for this reason it is better to objects and collections individually, or to provide dataset names instead of relative paths.

Loading with timeseries

For loading a dataset along with its timestamps, alf.io.read_ts can be used. It requires a filepath as input.

[48]:
files = one.load_object(eid, 'spikes', collection='alf/probe01', download_only=True)
ts, clusters = alfio.read_ts(files[1])

Loading collections

You can load whole collections with the load_collection method. For example to load the spikes and clusters objects for probe01:

[ ]:
probe01 = one.load_collection(eid, '*probe01', object=['spikes', 'clusters'])
probe01.spikes.times[:5]

The download_only flag here provides a simple way to download all datasets within a collection:

one.load_collection(eid, 'alf/probe01', download_only=True)

More information about these methods can be found using the help command

[49]:
help(one.load_dataset)
Help on method load_dataset in module one.api:

load_dataset(eid: Union[str, pathlib.Path, uuid.UUID], dataset: str, collection: Union[str, NoneType] = None, revision: Union[str, NoneType] = None, query_type: Union[str, NoneType] = None, download_only: bool = False, **kwargs) -> Any method of one.api.OneAlyx instance
    Load a single dataset for a given session id and dataset name

    Parameters
    ----------
    eid : str, UUID, pathlib.Path, dict
        Experiment session identifier; may be a UUID, URL, experiment reference string
        details dict or Path.
    dataset : str, dict
        The ALF dataset to load.  May be a string or dict of ALF parts.  Supports asterisks as
        wildcards.
    collection : str
        The collection to which the object belongs, e.g. 'alf/probe01'.
        This is the relative path of the file from the session root.
        Supports asterisks as wildcards.
    revision : str
        The dataset revision (typically an ISO date).  If no exact match, the previous
        revision (ordered lexicographically) is returned.  If None, the default revision is
        returned (usually the most recent revision).  Regular expressions/wildcards not
        permitted.
    query_type : str
        Query cache ('local') or Alyx database ('remote')
    download_only : bool
        When true the data are downloaded and the file path is returned.

    Returns
    -------
    Dataset or a Path object if download_only is true.

    Examples
    --------
    intervals = one.load_dataset(eid, '_ibl_trials.intervals.npy')
    # Load dataset without specifying extension
    intervals = one.load_dataset(eid, 'trials.intervals')  # wildcard mode only
    intervals = one.load_dataset(eid, '*trials.intervals*')  # wildcard mode only
    filepath = one.load_dataset(eid '_ibl_trials.intervals.npy', download_only=True)
    spike_times = one.load_dataset(eid 'spikes.times.npy', collection='alf/probe01')
    old_spikes = one.load_dataset(eid, 'spikes.times.npy',
                                  collection='alf/probe01', revision='2020-08-31')

Loading aggregate datasets

All raw and preprocessed data are stored at the session level, however some datasets are aggregated over a subject, project, or tag (called a ‘relation’). Such datasets can be loaded using the load_aggregate method.

Note.

NB: This method is only available in ‘remote’ mode.

[ ]:
subject = 'SWC_043'
subject_trials = one.load_aggregate('subjects', subject, '_ibl_subjectTrials.table')