Data Release - Reproducible Ephys Paper

Understanding whole-brain-scale electrophysiological recordings will rely on the collective work of multiple labs. Because two labs recording from the same brain area often reach different conclusions, it is critical to quantify and control for features that decrease reproducibility. To address these issues, we formed a multi-lab collaboration using a shared, open-source behavioral task and experimental apparatus. We repeatedly inserted Neuropixels multi-electrode probes targeting the same brain locations (called the repeated site, including posterior parietal cortex, hippocampus, and thalamus) in mice performing the behavioral task. For a full description of the study, please see the associated article.

Overview of the Data

As part of this paper we have released data from 60 Neuropixel recording sessions, across 8 different laboratories, obtained at the repeated site location during the IBL task. The data comprises both processed and raw data that are arranged into collections (folders).

Metadata for an additional 23 sessions has also been released. These contain information that was used for the histology targetting analysis. The processed and raw data for these sessions have not been released as the recordings did not pass our quality control metrics.

For sessions that contain data, an example of the folder structure of the data is shown below.

├─ 2021-06-30/
│  ├─ 001/
│  │  ├─ alf/
│  │  │  ├─ probeXX/
│  │  │     ├─ pykilosort/
│  │  ├─ raw_ephys_data
│  │  │  ├─ probeXX/
│  │  ├─ raw_video_data

An example of the files contained in a sample alf folder can be downloaded by clicking here

Processed Data

Everything contained within the alf folder is processed data, with all times synchronised to a common clock. This folder contains the data that is required for the majority of all analysis. All files in the alf folders follow the Alyx File naming convention, where related datasets are grouped into a common object e.g trials or wheel.

Below is breakdown of the objects contained in each folder.


Processed behaviour, wheel and video data


Spikesorted data

Raw data

Data stored in folders with the prefix raw, contain original data collected from each recording device (e.g Neuropixel probe or camera). Data in these folders are in the clock of the recording device and are not synchronised. For this paper we have released the raw electrophysiology data that has been used to obtain the spikesorting and the raw video data that has been used to extract DLC features.

A summary of the data contained in each folder is given below.


Synchronisation data recorded from NIDAQ (Note, for recordings obtained with 3A Neuropixel probes this folder is empty and synchronisation pulses are stored in the raw_ephys_data/probeXX folder)


Electrophysiology data


Camera data



To use IBL data you will need a python environment with python > 3.7. To create a new environment from scratch you can install anaconda and follow the instructions below to create a new python environment (more information can also be found here)

conda create --name ibl python=3.9

Make sure to always activate this environment before installing or working with the IBL data

conda activate ibl

Install packages

To use IBL data you will need to install the ONE-api package. We also recommend installing ibllib. These can be installed via pip.

pip install ONE-api
pip install ibllib

Setting up credentials

Credentials can be setup in a python terminal in the following way

[ ]:
from one.api import ONE
pw = 'international'
one = ONE(base_url='', password=pw, silent=True)

Getting Started

Exploring data

To get a feel for the structure of the data we recommended first downloading the alf data for a single repeated site session and exploring how the data is stored locally on disk. An example alf folder can be downloaded from here or alternatively, we can use the ONE-api to search for sessions released as part of this paper and download the data this way.

The sessions that contain data released as part of the reproducible ephys data paper can be found in the following way

[ ]:
from one.api import ONE
one = ONE(base_url='')

# Find sessions that have data and are tagged for the repeated site paper
rep_site_sessions ='sessions', 'list', dataset_types='spikes.times', tag='2022_Q2_IBL_et_al_RepeatedSite')
# Take the first session
example_sess = rep_site_sessions[0]
# Each session has a unique experiment id
eid = example_sess['id']

We can download all files in the alf collection

[ ]:
# Download all data in alf collection
files = one.load_collection(eid, 'alf', download_only=True)

# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')

To download the spikesorting data we need to find out which probe label (probeXX above) was used for this session. This can be done by finding the probe insertion associated with this session

[ ]:
insertion ='insertions', 'list', session=eid)[0]
probe_label = insertion['name']
files = one.load_collection(eid, f'alf/{probe_label}/pykilosort', download_only=True)

# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')

To load in the data we can use some of the following loading methods

[ ]:
# Load in all trials datasets
trials = one.load_object(eid, 'trials', collection='alf')

# Load in a single wheel dataset
wheel_times = one.load_dataset(eid, '_ibl_wheel.timestamps.npy')

Loading different objects

Examples for loading different objects can be found in the following tutorials here

More information on ONE

To get a better understading of the ONE-api and the various methods available we recommend working through these tutorials

Advanced examples

Example 1: Searching for sessions from a specific lab

If you want to use data associated to a given lab only, you could simply query for the whole dataset as shown above, and filter rep_site_sessions for the key “lab” of a given value, for example:

[ ]:
lab_name = 'mrsicflogellab'
sessions_lab = [item for item in rep_site_sessions if item['lab'] == lab_name]

However, if you wanted to query only the data for a given lab, it might be most judicious to first know the list of all labs available, select an arbitrary lab name from it, and query the specific sessions from it.

To get this list, use

[ ]:
# List of labs (and all metadata information associated)
labs ='labs', 'list',
# Note the change in the django filter compared to searching over 'sessions'

# Example lab name
lab_name = labs[0]['name']  # e.g. 'mrsicflogellab'

# Searching for RS sessions with specific lab name
sessions_lab ='sessions', 'list', dataset_types='spikes.times', lab=lab_name,

Information and troubleshooting