# Data Release - Spike sorting benchmark datasets

Spike sorting is the process by which one extracts the spikes information (times, waveforms) from the raw electrophysiology signals. In the case of our Brainwide map dataset, such electrophysiology signals are heterogeneous, changing from one brain region to another. By looking at recordings in widespread anatomical regions we discovered that the current spike sorting method has a limited range of applicability. Specifically, we uncovered two major issues. Firstly, there is no one-size-fit-all: an algorithm that is well tailored to extract signals for a given region or electrode type may perform poorly for another. Secondly, and most importantly, we had very limited means to conclude on the performance of such algorithms beyond eyes on the data and qualitative judgements.

In order to facilitate the development of spike sorting algorithms, we aim to provide benchmarks datasets (for a full explanation, see our proposal Spike Net). Here, we explain how to download these datasets, taken to represent various anatomical regions from our Brainwide map.

## Overview of the Data

We have selected 13 recordings for our benchmarks.

The insertion IDs are :

pids = [
'1a276285-8b0e-4cc9-9f0a-a3a002978724',
'1e104bf4-7a24-4624-a5b2-c2c8289c0de7',
'5d570bf6-a4c6-4bf1-a14b-2c878c84ef0e',
'5f7766ce-8e2e-410c-9195-6bf089fea4fd',
'6638cfb3-3831-4fc2-9327-194b76cf22e1',
'749cb2b7-e57e-4453-a794-f6230e4d0226',
'd7ec0892-0a6c-4f4f-9d8f-72083692af5c',
'da8dfec1-d265-44e8-84ce-6ae9c109b8bd',
'dab512bd-a02d-4c1f-8dbc-9155a163efc0',
'dc7e9403-19f7-409f-9240-05ee57cb7aea',
'e8f9fba4-d151-4b00-bee7-447f0f3e752c',
'eebcaf65-7fa4-4118-869d-a084e84530e2',
'fe380793-8035-414e-b000-09bfe5ece92a',
]


(Coming soon) In 2023, we will provide 20-30min chunks of raw electrophysiology data processed in a standardised fashion, with manual annotations of spikes and quality metrics for spike detection recall. These will serve as benchmarks for spike sorting algorithm development. In the meantime, you can familiarise yourself with the data heterogeneity by looking at the whole recordings.

## View the data

You can view the whole electrophysiology data:

The organisation of the data follows the standard IBL data structure.

• These instructions to download an example dataset for one session, and get familiarised with the data structure

• Note that probe insertion ids pid are provided here, not session ids eid.

• Note that you will be most interested in the folders raw_ephys_data / probeXX for the raw ephys data, and alf/probeXX/pykilosort if using the pyKilosort spike sorting output.

• These instructions to learn how to use the ONE-api to search and download the released datasets