Get to know the datasets and folder structure

What is the folder structure for experiment sessions

Generally speaking, all the data for a single experiment session fits into a folder, which name is characterised by: it’s lab name / a folder Subjects / the name of the subject (i.e. the mouse nickname) / the date / the session number.

For example mainenlab/Subjects/ZFM-01576/2020-12-01/001, which data can be browsed through here.

Note:

  • A lab can host multiple subjects, e.g. the Churchland lab hosts CSHL047 and CSHL049.

  • There can be multiple sessions done in one day per subject, in such case the number of session 001 would increase to 002, 003 etc.

  • Sometimes, the valuable data is found only in a later session in the day (in the case of a restart for example), so it is not uncommon to see sessions for which only the 003 folder is saved for example.

How are data files organised within a session folder

Overview

Within a session folder (such as mainenlab/Subjects/ZFM-01576/2020-12-01/001/) there will be multiple folders containing different kinds of data as explained below. An example layout of the folder structure is shown here:

subject/
├─ 2020-12-01/
│  ├─ 001/
│  │  ├─ alf/
│  │  │  ├─ probeXX/
│  │  │     ├─ pykilosort/
│  │  ├─ raw_ephys_data
│  │  │  ├─ probeXX/
│  │  ├─ raw_video_data

Generally speaking, the following subfolders will contain:

  • alf/ : The extracted data, to be used in analysis (notably the spike sorting, trials and DLC data).

  • raw_ephys_data/ : The raw ephys data (in this case, Neuropixels data)

  • raw_video_data/: The raw video data

  • raw_passive_data/: The raw passive data (events that occur during the replay of task stimuli)

  • raw_behavior_data/ : The raw behavior data (events that occur during a trials)

  • spike_sorters/ : The raw output data for each spike sorter used

  • logs/ : logged information

Note:

  • Analysis is conducted mainly on the data contained in the first three subfolders, i.e. alf/ , raw_ephys_data/ and raw_video_data/, which the content of is detailed below.

  • Some data (e.g. raw_passive_data/) may not be available for all sessions.

Processed data: alf folder

Everything contained within the alf folder is processed data, with all times synchronised to a common clock. This folder contains the data that is required for the majority of all analysis. All files in the alf folders follow the Alyx File naming convention, where related datasets are grouped into a common object e.g trials or wheel.

The alf/ folder notably contains:

  • the probe folder (probe00/ or probe01/), in which are the processed output of the spike sorting to be used for analysis

  • the processed behavior trials data

  • the processed DLC data (for each camera used, here body, left and right)

  • the processed wheel data

  • the processed passive protocol data

Below is breakdown of the dataset objects contained in each folder.

Download an example alf folder

An example of the files contained in a sample alf folder can be downloaded by clicking here

Datasets in alf

The alf folder contains the processed behaviour, wheel and video data. Browse the documentation detailing these datasets and how to load them:

Datasets in alf/probeXX/pykilosort

The alf/probeXX/ folder contains the spike sorted data. Typically, only one version of spike sorting output is available, and stored in a folder named pykilosort (see example).

Note:

  • A folder alf/probeXX/ can contain the output of multiple spike sorters.

  • In such a case, it would contain a first spike sorter output directly into the folder itself (see e.g. if there is the cluster datasets directly in the folder alf/probeXX/), and a secondary version under a subfolder (here the subfolder named pykilosort).

  • In the case of multiple spike sorting version being available, the data loading methods use the default version from pykilosort (see loading example).

Browse the documentation detailing the spike sorting datasets and how to load them:

Raw data

Data stored in folders with the prefix raw, contain original data collected from each recording device (e.g Neuropixel probe or camera). Data in these folders are in the clock of the recording device and are not synchronised.

Notably, the raw electrophysiology data that has been used to obtain the spikesorting and the raw video data that has been used to extract DLC features are available for most sessions.

A summary of the data contained in each folder is given below.

Datasets in raw_ephys_data

The raw_ephys_data folder contains the synchronisation data recorded from the NIDAQ device via the software SpikeGLX.

Note:

  • For recordings obtained with 3A Neuropixel probes this folder is empty and synchronisation pulses are stored in the raw_ephys_data/probeXX folder

Browse the documentation detailing the SpikeGLX datasets and how to load them:

Datasets in raw_ephys_data/probeXX

The raw_ephys_data/probeXX folder contains the raw electrophysiology data acquired on a given probe.

Note:

  • These datasets have a large data size !

  • It is possible to download only (smaller-sized) chunks of the raw ephys data, rather than the whole file at once (cf loading example below)

Browse the documentation detailing the raw ephys datasets and how to load them:

Datasets in raw_video_data

The raw_video_data folder contains the raw camera data for each of the camera (e.g. Left, Right, Body).

Note:

  • These dataset have a large data size !

  • It is possible to download only selected frames of the raw video data, rather than the whole file at once (cf loading example below)

  • You can view the raw video data in the browser by clicking on it, e.g. _iblrig_leftCamera.raw.

Browse the documentation detailing the raw video datasets and how to load them: