Listing with ONE
ONE contains a number of list methods that can be used to explore the datasets available.
To list all available datasets we can use
[1]:
from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')
# All datasets in the database
dsets = one.list_datasets()
Note.
Calling list_* methods with no arguments in remote mode will not hit the database
If you are connected to a database (e.g not using ONE with a local cache directory) you can find out information about a specific dataset by typing,
[2]:
one.describe_dataset(dsets[3])
contrast of left-side stimulus (0...1) nan if trial is on other side
Out[2]:
{'id': '979f9f7c-7d67-48d5-9042-a9000a8e66a2',
'name': 'trials.contrastLeft',
'created_by': None,
'description': 'contrast of left-side stimulus (0...1) nan if trial is on other side',
'filename_pattern': '*trials.contrastLeft.*'}
To find the datasets associated with a specific experiment we can pass in an eid argument
[3]:
eid = 'KS023/2019-12-10/001'
# All datasets for specific session
one.list_datasets(eid)
Out[3]:
['alf/_ibl_bodyCamera.times.npy',
'alf/_ibl_leftCamera.times.npy',
'alf/_ibl_rightCamera.times.npy',
'alf/_ibl_trials.choice.npy',
'alf/_ibl_trials.contrastLeft.npy',
'alf/_ibl_trials.contrastRight.npy',
'alf/_ibl_trials.feedbackType.npy',
'alf/_ibl_trials.feedback_times.npy',
'alf/_ibl_trials.firstMovement_times.npy',
'alf/_ibl_trials.goCueTrigger_times.npy',
'alf/_ibl_trials.goCue_times.npy',
'alf/_ibl_trials.intervals.npy',
'alf/_ibl_trials.intervals_bpod.npy',
'alf/_ibl_trials.itiDuration.npy',
'alf/_ibl_trials.probabilityLeft.npy',
'alf/_ibl_trials.response_times.npy',
'alf/_ibl_trials.rewardVolume.npy',
'alf/_ibl_trials.stimOff_times.npy',
'alf/_ibl_trials.stimOn_times.npy',
'alf/_ibl_wheel.position.npy',
'alf/_ibl_wheel.times.npy',
'alf/_ibl_wheel.timestamps.npy',
'alf/_ibl_wheelMoves.intervals.npy',
'alf/_ibl_wheelMoves.peakAmplitude.npy',
'alf/probe01/_kilosort_whitening.matrix.npy',
'alf/probe01/_phy_spikes_subset.channels.npy',
'alf/probe01/_phy_spikes_subset.spikes.npy',
'alf/probe01/_phy_spikes_subset.waveforms.npy',
'alf/probe01/channels.brainLocationIds_ccf_2017.npy',
'alf/probe01/channels.localCoordinates.npy',
'alf/probe01/channels.mlapdv.npy',
'alf/probe01/channels.rawInd.npy',
'alf/probe01/clusters.amps.npy',
'alf/probe01/clusters.brainLocationAcronyms_ccf_2017.npy',
'alf/probe01/clusters.brainLocationIds_ccf_2017.npy',
'alf/probe01/clusters.channels.npy',
'alf/probe01/clusters.depths.npy',
'alf/probe01/clusters.metrics.pqt',
'alf/probe01/clusters.mlapdv.npy',
'alf/probe01/clusters.peakToTrough.npy',
'alf/probe01/clusters.uuids.csv',
'alf/probe01/clusters.waveforms.npy',
'alf/probe01/clusters.waveformsChannels.npy',
'alf/probe01/spikes.amps.npy',
'alf/probe01/spikes.clusters.npy',
'alf/probe01/spikes.depths.npy',
'alf/probe01/spikes.samples.npy',
'alf/probe01/spikes.templates.npy',
'alf/probe01/spikes.times.npy',
'alf/probe01/templates.amps.npy',
'alf/probe01/templates.waveforms.npy',
'alf/probe01/templates.waveformsChannels.npy',
'alf/probes.description.json',
'alf/probes.trajectory.json',
'raw_behavior_data/_iblrig_ambientSensorData.raw.jsonable',
'raw_behavior_data/_iblrig_codeFiles.raw.zip',
'raw_behavior_data/_iblrig_encoderEvents.raw.ssv',
'raw_behavior_data/_iblrig_encoderPositions.raw.ssv',
'raw_behavior_data/_iblrig_encoderTrialInfo.raw.ssv',
'raw_behavior_data/_iblrig_taskData.raw.jsonable',
'raw_behavior_data/_iblrig_taskSettings.raw.json',
'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.cbin',
'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.ch',
'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.meta',
'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.wiring.json',
'raw_ephys_data/_spikeglx_sync.channels.npy',
'raw_ephys_data/_spikeglx_sync.polarities.npy',
'raw_ephys_data/_spikeglx_sync.times.npy',
'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityAP.freqs.npy',
'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityAP.power.npy',
'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityLF.freqs.npy',
'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityLF.power.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.rms.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.timestamps.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.rms.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.timestamps.npy',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.cbin',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.ch',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.meta',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.cbin',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.ch',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.meta',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.sync.npy',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.timestamps.npy',
'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.wiring.json',
'raw_ephys_data/probe01/_spikeglx_sync.channels.probe01.npy',
'raw_ephys_data/probe01/_spikeglx_sync.polarities.probe01.npy',
'raw_ephys_data/probe01/_spikeglx_sync.times.probe01.npy',
'raw_video_data/_iblrig_bodyCamera.raw.mp4',
'raw_video_data/_iblrig_bodyCamera.timestamps.ssv',
'raw_video_data/_iblrig_leftCamera.raw.mp4',
'raw_video_data/_iblrig_leftCamera.timestamps.ssv',
'raw_video_data/_iblrig_rightCamera.raw.mp4',
'raw_video_data/_iblrig_rightCamera.timestamps.ssv',
'spike_sorters/ks2_matlab/probe01/_kilosort_raw.output.tar']
We can also list the available collections by using,
[4]:
# All collections in database
collections = one.list_collections()
# All collections for specific session
one.list_collections(eid)
Out[4]:
['raw_ephys_data/probe01',
'alf',
'alf/probe01',
'raw_video_data',
'raw_behavior_data',
'raw_ephys_data',
'spike_sorters/ks2_matlab/probe01']
Revisions can be listed in a similar way,
[5]:
# All revisions in database
revisions = one.list_revisions()
# All revisions for specific session
revisions = one.list_revisions(eid)
The final useful list method allows you to search for subjects in the database,
[6]:
# All subjects in the database
subjects = one.list_subjects()
For more examples of file organization, including the use of dataset revisions, see this guide.
Filtering lists
list_collections
, list_revisions
and list_datasets
can be called with various filter arguments. Collections and datasets may be filtered using wildcards, or regular expressions when the wildcard flag is set to False. For more information on using wildcards, see the ONE load guide.
Below are some examples of filtering with wildcards:
[7]:
# All datasets for specific session in alf/probe01 collection:
datasets = one.list_datasets(eid, collection='alf/probe01')
# All datasets for a specific session in any probe collection:
datasets = one.list_datasets(eid, collection='*probe*')
# All collections that contain datasets with 'spikes' in the name:
collections = one.list_collections(eid, filename='*spikes*')
# All datasets with 'raw' in the name:
datasets = one.list_datasets(eid, '*raw*')
# All datasets with a QC value less than or equal to 'WARNING' (i.e. includes 'PASS', 'NOT_SET' also):
datasets = one.list_datasets(eid, qc='WARNING')
# All QC'd datasets with a value less than or equal to 'WARNING' (i.e. 'WARNING' or 'PASS'):
datasets = one.list_datasets(eid, qc='WARNING', ignore_qc_not_set=True)
Note that for list_datasets
and list_collections
a provided revision name can’t include wildcards.
For list_datasets
, the resulting list will include only the datasets in the given revision, or the previous revision (alphabetically) if the provided revision doesn’t exist.
For list_collections
, the resulting list will include only the collections containing the given revision, or the previous revision (alphabetically) if the provided revision doesn’t exist.
For examples, to list collections that start with the word ‘raw’ and contain revisions on or before ‘2020-01-01’
[8]:
collections = one.list_collections(eid, collection='raw*', revision='2020-01-01')
For list_revisions
, the provided revision keyword arg works like the collections arg and may include wildcards.
For example, to list revisions for a given session that begin with ‘2020’ or ‘2021’:
[9]:
revisions = one.list_revisions(eid, revision='202[01]')
The dataset and collection filters may be a list of strings, constituting a logical OR. For example to list datasets containing either ‘spikes’ or ‘clusters’:
[10]:
one.list_datasets(eid, ['*spikes*', '*clusters*'])
Out[10]:
['alf/probe01/_phy_spikes_subset.channels.npy',
'alf/probe01/_phy_spikes_subset.spikes.npy',
'alf/probe01/_phy_spikes_subset.waveforms.npy',
'alf/probe01/clusters.amps.npy',
'alf/probe01/clusters.brainLocationAcronyms_ccf_2017.npy',
'alf/probe01/clusters.brainLocationIds_ccf_2017.npy',
'alf/probe01/clusters.channels.npy',
'alf/probe01/clusters.depths.npy',
'alf/probe01/clusters.metrics.pqt',
'alf/probe01/clusters.mlapdv.npy',
'alf/probe01/clusters.peakToTrough.npy',
'alf/probe01/clusters.uuids.csv',
'alf/probe01/clusters.waveforms.npy',
'alf/probe01/clusters.waveformsChannels.npy',
'alf/probe01/spikes.amps.npy',
'alf/probe01/spikes.clusters.npy',
'alf/probe01/spikes.depths.npy',
'alf/probe01/spikes.samples.npy',
'alf/probe01/spikes.templates.npy',
'alf/probe01/spikes.times.npy']
Dataset/filename filters can either be a string or a dict of ALF parts, each containing either a string or list of strings. This allows very specific part matching. For example, to filter datasets that have either the ‘intervals’ or ‘timestamps’ attributes, and are npy files:
[11]:
one.list_datasets(eid, filename={'attribute': ['timestamps', 'intervals'], 'extension': 'npy'})
Out[11]:
['alf/_ibl_trials.intervals.npy',
'alf/_ibl_trials.intervals_bpod.npy',
'alf/_ibl_wheel.timestamps.npy',
'alf/_ibl_wheelMoves.intervals.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.timestamps.npy',
'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.timestamps.npy']
Combining with load methods
The list methods are useful in combination with the load methods. For example, the output of the list_datasets
method can be a direct input of the load_datasets
method. Here we load all spike and cluster datasets where the QC is either PASS or NOT_SET:
[12]:
datasets = one.list_datasets(eid, ['*spikes*', '*clusters*'], qc='PASS', ignore_qc_not_set=False)
data, records = one.load_datasets(eid, datasets)
100%|██████████| 3/3.0 [00:02<00:00, 1.02it/s]
Likewise with collections, for example to load all data within the ‘alf/probe*’ collections:
[14]:
collections = one.list_collections(eid, collection='alf/probe*')
# Build a dictionary of collections containing bunches of objects
data = {key: one.load_collection(eid, key) for key in collections}
print(data.keys())
print(data['alf/probe01'].keys())
100%|██████████| 3/3.0 [00:03<00:00, 1.09s/it]
dict_keys(['alf/probe01'])
dict_keys(['spikes', 'channels', 'whitening', 'clusters', 'templates', 'spikes_subset'])
Here we load the spike data for a collection that also contains cluster data
[ ]:
collections = one.list_collections(eid, filename='clusters*')
spikes = one.load_object(eid, 'spikes', collection=collections[0])
Listing aggregate datasets
All raw and preprocessed data are stored at the session level, however some datasets are aggregated over a subject, project, or tag (called a ‘relation’). Such datasets are known as aggregates and can be listed and filtered using the list_aggregates
method. Unlike list_datasets
, list_aggregates
always returns a pandas DataFrame object.
Note.
NB: This method is only available in ‘remote’ mode.
[ ]:
subject = 'SWC_043'
subject_aggregates = one.list_aggregates('subjects', subject)