Get to know the datasets and folder structure
Useful links
If you first want to quickly view and browse what datasets are in a recording session, you can easily do so in a web browser via this Flatiron web page.
Note:
The size of each dataset can be easily viewed under the
Size
column in the web browser. For example, the raw video data is particularly heavy (several GB).
A detailed explanation of datasets definition and format are described in this sheet and document.
What is the folder structure for experiment sessions
Generally speaking, all the data for a single experiment session fits into a folder, which name is characterised by: it’s lab name / a folder Subjects / the name of the subject (i.e. the mouse nickname) / the date / the session number.
For example mainenlab/Subjects/ZFM-01576/2020-12-01/001
, which data can be browsed through here.
Note:
A lab can host multiple subjects, e.g. the Churchland lab hosts
CSHL047
andCSHL049
.There can be multiple sessions done in one day per subject, in such case the number of session
001
would increase to002
,003
etc.Sometimes, the valuable data is found only in a later session in the day (in the case of a restart for example), so it is not uncommon to see sessions for which only the
003
folder is saved for example.
How are data files organised within a session folder
Overview
Within a session folder (such as mainenlab/Subjects/ZFM-01576/2020-12-01/001/) there will be multiple folders containing different kinds of data as explained below. An example layout of the folder structure is shown here:
subject/
├─ 2020-12-01/
│ ├─ 001/
│ │ ├─ alf/
│ │ │ ├─ probeXX/
│ │ │ ├─ pykilosort/
│ │ ├─ raw_ephys_data
│ │ │ ├─ probeXX/
│ │ ├─ raw_video_data
Generally speaking, the following subfolders will contain:
alf/
: The extracted data, to be used in analysis (notably the spike sorting, trials and DLC data).raw_ephys_data/
: The raw ephys data (in this case, Neuropixels data)raw_video_data/
: The raw video dataraw_passive_data/
: The raw passive data (events that occur during the replay of task stimuli)raw_behavior_data/
: The raw behavior data (events that occur during a trials)spike_sorters/
: The raw output data for each spike sorter usedlogs/
: logged information
Note:
Analysis is conducted mainly on the data contained in the first three subfolders, i.e.
alf/
,raw_ephys_data/
andraw_video_data/
, which the content of is detailed below.Some data (e.g.
raw_passive_data/
) may not be available for all sessions.
Processed data: alf folder
Everything contained within the alf
folder is processed data, with all times synchronised to a common clock. This folder contains the data that is required for the majority of all analysis. All files in the alf folders follow the Alyx File naming convention, where related datasets are grouped into a common object e.g trials or wheel.
The alf/
folder notably contains:
the probe folder (
probe00/
orprobe01/
), in which are the processed output of the spike sorting to be used for analysisthe processed behavior trials data
the processed DLC data (for each camera used, here
body
,left
andright
)the processed wheel data
the processed passive protocol data
Below is breakdown of the dataset objects contained in each folder.
Download an example alf folder
An example of the files contained in a sample alf folder can be downloaded by clicking here
Datasets in alf
The alf
folder contains the processed behaviour, wheel and video data. Browse the documentation detailing these datasets and how to load them:
trials -> behavioural task related data
wheel, wheelMoves -> wheel data recorded during task
leftCamera, rightCamera, bodyCamera -> extracted dlc features during task
Datasets in alf/probeXX/pykilosort
The alf/probeXX/
folder contains the spike sorted data. Typically, only one version of spike sorting output is available, and stored in a folder named pykilosort
(see example).
Note:
A folder
alf/probeXX/
can contain the output of multiple spike sorters.In such a case, it would contain a first spike sorter output directly into the folder itself (see e.g. if there is the
cluster
datasets directly in the folderalf/probeXX/
), and a secondary version under a subfolder (here the subfolder namedpykilosort
).In the case of multiple spike sorting version being available, the data loading methods use the default version from
pykilosort
(see loading example).
Browse the documentation detailing the spike sorting datasets and how to load them:
spikes, clusters, channels -> spikesorted output of neuropixel data
spikes_subset -> sample of waveforms extracted during spikesorting
Raw data
Data stored in folders with the prefix raw
, contain original data collected from each recording device (e.g Neuropixel probe or camera). Data in these folders are in the clock of the recording device and are not synchronised.
Notably, the raw electrophysiology data that has been used to obtain the spikesorting and the raw video data that has been used to extract DLC features are available for most sessions.
A summary of the data contained in each folder is given below.
Datasets in raw_ephys_data
The raw_ephys_data
folder contains the synchronisation data recorded from the NIDAQ device via the software SpikeGLX.
Note:
For recordings obtained with 3A Neuropixel probes this folder is empty and synchronisation pulses are stored in the raw_ephys_data/probeXX folder
Browse the documentation detailing the SpikeGLX datasets and how to load them:
_spikeglx_ephysData*.nidq -> raw synchronisation pulses from NIDAQ recorded using spikeglx
_spikeglx_sync -> extracted synchronisation pulses from NIDAQ
Datasets in raw_ephys_data/probeXX
The raw_ephys_data/probeXX
folder contains the raw electrophysiology data acquired on a given probe.
Note:
These datasets have a large data size !
It is possible to download only (smaller-sized) chunks of the raw ephys data, rather than the whole file at once (cf loading example below)
Browse the documentation detailing the raw ephys datasets and how to load them:
_spikeglx_ephysData*.ap, _spikeglx_ephysData*.lf -> raw ephys data in AP and LFP band recorded using spikeglx
ephysTimeRmsAP, ephysTimeRmsLF -> rms noise in AP and LFP band across recording
ephysSpectralDensityAP, ephysSpectralDensityLF -> power spectrum in AP and LFP band
Datasets in raw_video_data
The raw_video_data
folder contains the raw camera data for each of the camera (e.g. Left
, Right
, Body
).
Note:
These dataset have a large data size !
It is possible to download only selected frames of the raw video data, rather than the whole file at once (cf loading example below)
You can view the raw video data in the browser by clicking on it, e.g. _iblrig_leftCamera.raw.
Browse the documentation detailing the raw video datasets and how to load them:
bodyCamera, leftCamera, rightCamera - raw video files