Using ONE with local filesystem

ONE can be set up for searching files on your local filesystem (as opposed to searching a remote database or http file server). In doing so a couple of cache tables are generated. You can share these tables when publishing your data so other users can search and load the data easily.

Before building the cache tables, you need to organize your data files into directories with a specific naming convention:

lab/subject/date/number

For example ‘UCL/mouse_001/2021-03-30/001’. The file names should also follow the ALyx File (ALF) convention by following the pattern ‘object.attribute’, e.g. ‘trials.intervals’. To make loading files easier, there should be a file extension also. For more details on the ALF specification, click here.

To build the cache tables run One.setup:

from one.api import One
One.setup('path/to/my/data/folder')

The below cells demonstrate this by downloading an example IBL dataset (a zip file of numpy files), building the cache tables, then loading some data through the ONE API.

[ ]:
from pathlib import Path

from one.api import One
from one.params import CACHE_DIR_DEFAULT

# Data locations:
# The data_url is the location of the remote example dataset.  This will be downloaded so we
# have something to build a cache from on our local computer.
data_url = 'https://ndownloader.figshare.com/files/21623715'

# The cache_dir is the location of the example dataset.  By default this will be
# ~/Downloads/ONE/my_example but could be set to anything.
cache_dir = Path(CACHE_DIR_DEFAULT, 'my_example')

The following cell will download some example data from FigShare. The dataset is around 74,000 behaviour files (~218MB) from the beta data release for the IBL behaviour paper. The code in this cell is not important.

[ ]:
# Imports for downloading example data
import requests
from io import BytesIO
import zipfile

# Download data if necessary
if not (cache_dir.exists() and any(cache_dir.iterdir())):
    cache_dir.parent.mkdir(exist_ok=True, parents=True)  # Create destination dir
    print(f'Downloading data from {data_url.split(".", maxsplit=1)[-1]}...')
    request = requests.get(data_url)  # Download data into memory (~300MB)
    with zipfile.ZipFile(BytesIO(request.content)) as zipped:
        print(f'Extracting into {cache_dir}...')
        zipped.extractall(path=cache_dir.parent)  # Decompress into destination dir
    Path(cache_dir.parent, 'ibl-behavioral-data-Dec2019').rename(cache_dir)  # Rename
    cache_dir.joinpath('one_example.py').unlink(missing_ok=True)  # Delete outdated example
    del request  # Free resources

The following cell will index the files and build the cache (~3 min). This only needs doing once. For a data release these small cache files would typically be provided with the data.

By default a hash of each file is stored in the cache table to detect whether a dataset has changed. This process is slow so for this example we will skip the file hashing by setting hash_files=False.

[ ]:
print('Building ONE cache from filesystem...')
One.setup(cache_dir, hash_files=False)

Now the files can be searched and loaded via the API

[ ]:
one = One(cache_dir=cache_dir)

## Searching for behaviour experiment by subject
subject = 'NYU-01'
eids = one.search(subject=subject, dataset=['_ibl_trials.intervals.npy',])
print(f'There are {len(eids)} behaviour sessions for subject "{subject}"')

## Regex is supported in the search
subject = 'NYU-.*'
eids = one.search(subject=subject, dataset=['_ibl_trials.intervals.npy',])
print(f'There are {len(eids)} behaviour sessions with {subject} subjects')

[ ]:
# Loading the data
eid, *eids = eids  # Assign first experiment ID in list to `eid` var

# Load the trials object
trials = one.load_object(eid, 'trials')

# Load an individual dataset
goCue_times = one.load_dataset(eid, '_ibl_trials.goCue_times.npy')