Datasets and their types
A dataset typically contains a single signal or data source, either values or times. When creating a new dataset, first familiarize yourself with the ALF specification.
[ ]:
from pprint import pprint
from one.alf import spec
from one.alf.path import ALFPath
Datasets
Print information about ALF objects
[14]:
spec.describe('object')
(lab/Subjects/)subject/date/number/(collection/)(#revision#/)_namespace_object.attribute_timescale.extra.extension
^^^^^^
OBJECT
Every file describing a given object has the same number of rows (i.e. the 1st dimension of an npy
file, number of frames in a video file, etc). You can therefore think of the files for an object
as together defining a table, with column headings given by the attribute in the file names, and
values given by the file contents. Object names should be in Haskell case and pluralized, e.g.
"wheelMoves", "sparseNoise", "trials".
Encoding of relations between objects can be achieved by a simplified relational model. If the
attribute name of one file matches the object name of a second, then the first file is guaranteed
to contain integers referring to the rows of the second. For example, "spikes.clusters.npy" would
contain integer references to the rows of "clusters.brain_location.json" and "clusters.probes.npy";
and "clusters.probes.npy" would contain integer references to "probes.insertion.json".
Be careful of plurals ("clusters.probe.npy" would not correspond to "probes.insertion.json") and
remember we count arrays starting from 0.
Check the file name is ALF compliant
[15]:
assert spec.is_valid('spikes.times.npy')
Safely construct an ALF dataset using the ‘to_alf’ function. This will ensure the correct case and format
[16]:
filename = spec.to_alf('spikes', 'times', 'npy',
namespace='ibl', timescale='ephys clock', extra='raw')
Parsing a new file into its constituent parts ensures the dataset is correct
[ ]:
parts = ALFPath('_ibl_spikes.times_ephysClock.raw.npy').parse_alf_name()
pprint(parts)
OrderedDict([('namespace', 'ibl'),
('object', 'spikes'),
('attribute', 'times'),
('timescale', 'ephysClock'),
('extra', 'raw'),
('extension', 'npy')])
Dataset types
Note.
Dataset types are only necessary when using a remote Alyx database
A dataset type includes wildcards in the name so that you can search over datasets with the same content but different formats, etc. For example you could create a new dataset type called ‘raw log’ with the filename pattern *log.raw*
When you register a file such as _rig1_log.raw.txt
or log.raw.rtf
it will automatically be part of the ‘raw log’ dataset type. The main purpose of this is to use the dataset type description field to document what the files are and how to work with them.
Warning.
When registering files they must match exactly 1 dataset type.
[18]:
from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')
one.describe_dataset('spikes.times') # Requires online version (an Alyx database connection)
[nspi]. Times of spikes (seconds, relative to experiment onset). Note this includes spikes from all probes, merged together
Out[18]:
{'id': '1427b6ba-6535-4f8f-9058-e3df63f0261e',
'name': 'spikes.times',
'created_by': None,
'description': '[nspi]. Times of spikes (seconds, relative to experiment onset). Note this includes spikes from all probes, merged together',
'filename_pattern': 'spikes.times*.npy'}
Datasets and their types can be interconverted using the following functions (online mode only):
[19]:
eid = 'KS023/2019-12-10/001'
dataset_type = one.dataset2type('_ibl_leftCamera.times.npy')
datasets = one.type2datasets(eid, 'camera.times')
dset_list = '", "'.join(datasets)
print(f'the dataset type "{dataset_type}" for {eid} comprises the datasets: \n"{dset_list}"')
the dataset type "camera.times" for KS023/2019-12-10/001 comprises the datasets:
"alf/_ibl_bodyCamera.times.npy", "alf/_ibl_leftCamera.times.npy", "alf/_ibl_rightCamera.times.npy"