ALyx Filenames (ALF)

This package is concerned with parsing and loading files that follow the Alyx file name specification. Files should be organized in folders by subject name, date and session number, for example:

mouse_001/2021-05-27/001

Optionally the lab may also be present in the folder structure, for example:

lab_name/Subjects/mouse_001/2021-05-27/001

The file names themselves should have at least two components; an object and attribute, separated by a period. For example a file called trials.intervals represents the trials object with an intervals attribute. The full file path would be:

lab_name/Subjects/mouse_001/2021-05-27/001/trials.intervals

Objects and attributes should be in Haskell case for example sparseNoise.xyPos but supports acronyms, e.g. RFMapStim.intervals, ROIMotionEnergy.position. Underscores, hyphens and spaces are not supported, except with ‘times’, ‘timestamps’ and ‘intervals’, which have a special meaning:

trials.goCue_times

Optional components

There are other optional parts to the file path that are used to convey other information.

Collections

Within a session folder the data may be placed in any number of sub-folders, each one is referred to as a collection and these may be used to sort identical datasets by device or preprocessing software. For example spikes collected on two different probes maybe in different numbered probe collections:

mouse_001/2021-05-27/001/probe00/spikes.times.npy
mouse_001/2021-05-27/001/probe01/spikes.times.npy

Perhaps for analysis the spikes were sorted using two different spike sorters, one with Kilosort, the other with Yass:

mouse_001/2021-05-27/001/probe00/ks2.1/spikes.times.npy
mouse_001/2021-05-27/001/probe01/yass/spikes.times.npy

Revisions

If the data require pre-processing in a different manner, a revision folder may be used so that the original data is not overwritten. This can be used as a form of versioning and should be a dated folder surrounded by pound signs, e.g.

mouse_001/2021-05-27/001/#2021-06-01#/spikes.times.npy

Unlike collections these can be searched in lexicographical order such that a users can load a revision before or after a certain date. If multiple revisions exist for a given date, letters may be appended to preserve ordering:

mouse_001/2021-05-27/001/#2021-06-01#/spikes.times.npy
mouse_001/2021-05-27/001/#2021-06-01a#/spikes.times.npy
mouse_001/2021-05-27/001/#2021-06-01b#/spikes.times.npy

Namespace

For datasets that are not intended to be standard in the community, a namespace may be added to the start of the filename. This must be surrounded by underscores:

_ibl_wheel.position
_ss_gratingID.laserOn.npy

Timescale

Datasets containing timestamp data are expected to be in the same common timescale (usually seconds from experiment start). For datasets in a different timescale, the clock name should be appended to the attribute part with an underscore:

spikes.times_ephysClock.npy
trials.intervals_bpod.ssv

Extension

The extension should be self-explanatory. Although they are optional in the ALF spec, it’s preferable to include the format in the filename, and to use formats that are well supported in MATLAB and Python:

spikes.times.npy
spikes.times.csv
spikes.times.mat

Extra

Any number of extra parts, separated by periods, can be added after the attribute. Examples include UUIDs for ensuring the filename is unique or parts for splitting datasets into parts. NB: The text after the final period is expected to be the file extension.

trials.intervals.9198edcd-e8a4-4e8a-994f-d68a2e300380.npy
2p.raw.part01.tiff
2p.raw.part02.tiff

Relations

Alf objects can be related through their attributes. If the attribute name of one file matches the object name of a second, then the first file is guaranteed to contain integers referring to the rows of the second. For example, spikes.clusters.npy would contain integer references to the rows of clusters.brain_location.json and clusters.probes.npy; and clusters.probes.npy would contain integer references to probes.insertion.json.

Glossary

Dataset name

A filename with at least an object and attribute. Some examples of valid ALF datasets:

spikes.times
spikes.times.npy
_ibl_trials.goCue_times_bpodClock.csv

Dataset type

In Alyx datasets are grouped by a type. Datasets should belong to exactly one dataset type. The group is determined by a filename pattern. Dataset types group datasets with the same content but different formats, etc. and include a description of the dataset. For example, the following datasets belong to the ‘spikes.times’ dataset type:

spikes.times
_spikeglx_spikes.times_ephysClock.npy
spikes.times.9198edcd-e8a4-4e8a-994f-d68a2e300380.npy
spikes.times.cbin

Session path

The part of the path that includes the subject name, date and number. Optionally a lab name may also be part of the session path:

mouse_001/2021-05-27/001
cortexlab/Subjects/mouse_001/2021-05-27/1

Relative path

Everything that comes after the session path. In other words the filename and optional collections and revision folders:

alf/probe00/spikes.times.npy
trials.intervals.npy
#2021-06-01#/trials.intervals.npy

ALF path

The full file path, including the session path and relative path, e.g.

cortexlab/Subjects/mouse_001/2021-05-27/1/alf/probe00/spikes.times.npy