Installation
Environment
To use IBL data you will need a python environment with python > 3.10, although Python 3.12 is recommended. To create a new environment from scratch you can install anaconda and follow the instructions below to create a new python environment (more information can also be found here)
conda create --name ibl python=3.12
Make sure to always activate this environment before installing or working with the IBL data
conda activate ibl
Install packages
To use IBL data you will need to install the ONE-api package. We also recommend installing ibllib. These can be installed via pip.
pip install ONE-api
pip install ibllib
Setting up credentials
Credentials can be setup in a python terminal in the following way
[2]:
from one.api import ONE
ONE.setup(base_url='https://openalyx.internationalbrainlab.org', silent=True)
one = ONE(password='international')
Connected to https://openalyx.internationalbrainlab.org as user "intbrainlab"
Explore and download data using the ONE-api
Useful links
To get a good understanding of the ONE-api and the various methods available we recommend working through these tutorials.
Further examples are given below.
Launch the ONE-api
Prior to do any searching / downloading, you need to instantiate ONE :
[3]:
from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')
List all sessions available
Once ONE is instantiated, you can use the REST ONE-api to list all sessions publicly available:
[4]:
sessions = one.search()
Each session is given a unique identifier (eID); this eID is what you will use to download data for a given session:
[5]:
# Each session is represented by a unique experiment id (eID)
print(sessions[0],)
c3f58136-2198-4a39-bde0-e2a8cf112a56
Find recordings of a specific brain region
If we are interested in a given brain region, we can use the search_insertions
method to find all recordings associated with that region. For example, to find all recordings associated with the Rhomboid Nucleus (RH) region of the thalamus.
[6]:
# this is the query that yields the few recordings for the Rhomboid Nucleus (RH) region
insertions_rh = one.search_insertions(atlas_acronym='RH', datasets='spikes.times.npy', project='brainwide')
# if we want to extend the search to all thalamic regions, we can do the following
insertions_th = one.search_insertions(atlas_acronym='TH', datasets='spikes.times.npy', project='brainwide')
# the Allen brain regions parcellation is hierarchical, and searching for Thalamus will return all child Rhomboid Nucleus (RH) regions
assert set(insertions_rh).issubset(set(insertions_th))
Find a session that has a dataset of interest
Not all sessions will have all the datasets available. As such, it may be important for you to filter and search for only sessions with particular datasets of interest. The detailed list of datasets can be found in this document.
In the example below, we want to find all sessions that have spikes.times
data:
[7]:
# Find sessions that have spikes.times datasets
sessions_with_spikes = one.search(project='brainwide', dataset='spikes.times')
Click here for a complete guide to searching using ONE.
Find data associated with a release or publication
Datasets are often associated to a publication, and are tagged as such to facilitate reproducibility of analysis. You can list all tags and their associated publications like this:
[8]:
# List and print all tags in the public database
tags = {t['name']: t['description'] for t in one.alyx.rest('tags', 'list') if t['public']}
for key, value in tags.items():
print(f"{key}\n{value}\n")
2021_Q1_IBL_et_al_Behaviour
https://doi.org/10.7554/eLife.63711
2021_Q2_PreRelease
https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522/3
2021_Q2_Varol_et_al
https://doi.org/10.1109/ICASSP39728.2021.9414145
2021_Q3_Whiteway_et_al
https://doi.org/10.1371/journal.pcbi.1009439
2022_Q2_IBL_et_al_RepeatedSite
https://doi.org/10.1101/2022.05.09.491042
2022_Q3_IBL_et_al_DAWG
https://doi.org/10.1101/827873
2022_Q4_IBL_et_al_BWM
https://figshare.com/articles/preprint/Data_release_-_Brainwide_map_-_Q4_2022/21400815
2023_Q1_Biderman_Whiteway_et_al
2023_Q1_Mohammadi_et_al
2023_Q3_Findling_Hubert_et_al
https://doi.org/10.1101/2023.07.04.547684
2023_Q4_Bruijns_et_al
2023_Q4_IBL_et_al_BWM_2
2023_Q4_IBL_et_al_BWM_passive
2024_Q2_Blau_et_al
2024_Q2_IBL_et_al_BWM_iblsort
Spike sorting output with ibl-sorter 1.7.0 for BWM
2024_Q2_IBL_et_al_RepeatedSite
https://doi.org/10.1101/2022.05.09.491042
2024_Q3_Pan_Vazquez_et_al
Brainwidemap
RepeatedSite
You can use the tag to restrict your searches to a specific data release and as a filter when browsing the public database:
[9]:
%%capture
# Note that tags are associated with datasets originally
# You can load a local index of sessions and datasets associated with a specific data release
one.load_cache(tag='2022_Q2_IBL_et_al_RepeatedSite')
sessions_rep_site = one.search() # All sessions used in the repeated site paper
# Find insertions that are tagged
# (you do not have access to the tag endpoint from the insertion list, so you need to create a django query)
ins_str_query = 'datasets__tags__name,2022_Q2_IBL_et_al_RepeatedSite'
insertions_rep_site = one.alyx.rest('insertions', 'list', django=ins_str_query)
# To return to the full cache containing an index of all IBL experiments
ONE.cache_clear()
one = ONE(base_url='https://openalyx.internationalbrainlab.org')
Downloading data using the ONE-api
Once sessions of interest are identified with the unique identifier (eID), all files ready for analysis are found in the alf collection:
[10]:
# Find an example session with data
eid, *_ = one.search(project='brainwide', dataset='alf/')
# List datasets associated with a session, in the alf collection
datasets = one.list_datasets(eid, collection='alf*')
# Download all data in alf collection
files = one.load_collection(eid, 'alf', download_only=True)
# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')
Files downloaded to /home/runner/Downloads/ONE/openalyx.internationalbrainlab.org/churchlandlab_ucla/Subjects/MFD_09/2023-10-20/001/alf
To download the spike sorting data we need to find out which probe label (probeXX
) was used for this session. This can be done by finding the probe insertion associated with this session.
[11]:
# Find an example session with spike data
# Note: Restricting by task and project makes searching for data much quicker
eid, *_ = one.search(project='brainwide', dataset='spikes', task='ephys')
# Data for each probe insertion are stored in the alf/probeXX folder.
datasets = one.list_datasets(eid, collection='alf/probe*')
probe_labels = set(d.split('/')[1] for d in datasets) # List the insertions
# You can find full details of a session's insertions using the following database query:
insertions = one.alyx.rest('insertions', 'list', session=eid)
probe_labels = [ins['name'] for ins in insertions]
files = one.load_collection(eid, f'alf/{probe_labels[0]}/pykilosort', download_only=True)
# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/one/util.py:543: ALFWarning: Multiple revisions: "", "2024-05-06"
warnings.warn(f'Multiple revisions: {rev_list}', alferr.ALFWarning)
Files downloaded to /home/runner/Downloads/ONE/openalyx.internationalbrainlab.org/churchlandlab_ucla/Subjects/MFD_09/2023-10-19/001/alf/probe00/pykilosort
Loading different objects
To load in the data we can use some of the following loading methods.
[12]:
# Load in all trials datasets
trials = one.load_object(eid, 'trials', collection='alf')
# Load in a single wheel dataset
wheel_times = one.load_dataset(eid, '_ibl_wheel.timestamps.npy')
Examples for loading different objects can be found in the following tutorials here.
Advanced examples
Example 1: Searching for sessions from a specific lab
Let’s imagine you are interested in obtaining the data from a given lab, that was part of the Reproducible Ephys data release. If you want to use data associated to a given lab only, you could simply query for the whole dataset as shown above, and filter sessions_rep_site
for the key “lab” of a given value, for example:
[13]:
%%capture
one.load_cache(tag='2022_Q2_IBL_et_al_RepeatedSite')
sessions_lab = one.search(lab='mrsicflogellab')
However, if you wanted to query only the data for a given lab, it might be most judicious to first know the list of all labs available, select an arbitrary lab name from it, and query the specific sessions from it.
[14]:
# List details of all sessions (returns a list of dictionaries)
_, det = one.search(details=True)
labs = set(d['lab'] for d in det) # Get the set of unique labs
# Example lab name
lab_name = list(labs)[0]
# Searching for RS sessions with specific lab name
sessions_lab = one.search(dataset='spikes', lab=lab_name)
You can also get this list, using one.alyx.rest, however it is a little slower.
[15]:
# List of labs (and all metadata information associated)
labs = one.alyx.rest('labs', 'list',
django='session__data_dataset_session_related__tags__name,2022_Q2_IBL_et_al_RepeatedSite')
# Note the change in the django filter compared to searching over 'sessions'
# Example lab name
lab_name = labs[0]['name'] # e.g. 'mrsicflogellab'
# Searching for RS sessions with specific lab name
sessions_lab = one.alyx.rest('sessions', 'list', dataset_types='spikes.times', lab=lab_name,
tag='2022_Q2_IBL_et_al_RepeatedSite')