one.alf.io

I/O functions for ALyx Files.

Provides support for time-series reading and interpolation as per the specifications For a full overview of the scope of the format, see:

https://int-brain-lab.github.io/ONE/alf_intro.html

Functions

check_dimensions

Test for consistency of dimensions as per ALF specs in a dictionary.

dataframe

Converts an Bunch conforming to size conventions into a pandas DataFrame.

exists

Test if ALF object and optionally specific attributes exist in the given path

filter_by

Given a path and optional filters, returns all ALF files and their associated parts.

find_variants

Find variant datasets.

iter_datasets

Iterate over all files in a session, and yield relative dataset paths.

iter_sessions

Recursively iterate over session paths in a given directory.

load_file_content

Returns content of files.

load_object

Reads all files sharing the same object name.

next_num_folder

Return the next number for a session given a session_date_folder.

read_ts

Load time-series from ALF format

remove_empty_folders

Iteratively remove any empty child folders.

remove_uuid_file

(DEPRECATED) Renames a file without the UUID and returns the new pathlib.Path object.

remove_uuid_recursive

(DEPRECATED) Within a folder, recursive renaming of all files to remove UUID.

save_metadata

Writes a meta data file matching a current ALF file object.

save_object_npy

Saves a dictionary in ALF format using object as object name and dictionary keys as attribute names.

ts2vec

Interpolate a continuous timeseries of the shape (2, 2)

Classes

AlfBunch

A dict-like object that supports dot indexing and conversion to DataFrame

class AlfBunch(*args, **kwargs)[source]

Bases: Bunch

A dict-like object that supports dot indexing and conversion to DataFrame

property check_dimensions

0 for consistent dimensions, 1 for inconsistent dimensions

Type:

int

append(b, inplace=False)[source]

Appends one bunch to another, key by key

Parameters:
  • b (Bunch, dict) – A Bunch of data to append

  • inplace (bool) – If true, the data are appended in place, otherwise a copy is returned

Returns:

An ALFBunch with the data appended, or None if inplace is True

Return type:

ALFBunch, None

to_df() DataFrame[source]

Return DataFrame with data keys as columns

static from_df(df) AlfBunch[source]
dataframe(adict)[source]

Converts an Bunch conforming to size conventions into a pandas DataFrame. For 2-D arrays, stops at 10 columns per attribute.

Parameters:

adict (dict, Bunch) – A dict-like object of data to convert to DataFrame

Returns:

A pandas DataFrame of data

Return type:

pd.DataFrame

read_ts(filename)[source]

Load time-series from ALF format

Parameters:

filename (str, pathlib.Path) – An ALF path whose values to load

Returns:

  • numpy.ndarray – An array of timestamps belonging to the ALF path object

  • numpy.ndarray – An array of values in filename

Examples

>>> t, d = read_ts(filename)
ts2vec(ts: ndarray, n_samples: int) ndarray[source]

Interpolate a continuous timeseries of the shape (2, 2)

Parameters:
  • ts (numpy.array) – a 2x2 numpy array of the form (sample, ts)

  • n_samples (int) – Number of samples; i.e. the size of the resulting vector

Returns:

A vector of interpolated timestamps

Return type:

numpy.ndarray

check_dimensions(dico)[source]

Test for consistency of dimensions as per ALF specs in a dictionary.

Alf broadcasting rules: only accepts consistent dimensions for a given axis a dimension is consistent with another if it’s empty, 1, or equal to the other arrays dims [a, 1], [1, b] and [a, b] are all consistent, [c, 1] is not

Parameters:

dico (ALFBunch, dict) – Dictionary containing data

Returns:

Status 0 for consistent dimensions, 1 for inconsistent dimensions

Return type:

int

load_file_content(fil)[source]

Returns content of files. Designed for very generic file formats: so far supported contents are json, npy, csv, (h)tsv, ssv, jsonable

Parameters:

fil (str, pathlib.Path) – File to read

Returns:

Array/json/pandas dataframe depending on format

Return type:

Any

iter_sessions(root_dir, pattern='*')[source]

Recursively iterate over session paths in a given directory.

Parameters:
  • root_dir (str, pathlib.Path) – The folder to look for sessions.

  • pattern (str) – Glob pattern to use. Default searches all folders. Providing a more specific pattern makes this more performant (see examples).

Yields:

pathlib.Path – The next session path in lexicographical order.

Examples

Efficient iteration when root_dir contains <lab>/Subjects folders

>>> sessions = list(iter_sessions(root_dir, pattern='*/Subjects/*/????-??-??/*'))

Efficient iteration when root_dir contains subject folders

>>> sessions = list(iter_sessions(root_dir, pattern='*/????-??-??/*'))
iter_datasets(session_path)[source]

Iterate over all files in a session, and yield relative dataset paths.

Parameters:

session_path (str, pathlib.Path) – The folder to look for datasets.

Yields:

pathlib.Path – The next dataset path (relative to the session path) in lexicographical order.

exists(alfpath, object, attributes=None, **kwargs) bool[source]

Test if ALF object and optionally specific attributes exist in the given path

Parameters:
  • alfpath (str, pathlib.Path) – The folder to look into

  • object (str) – ALF object name

  • attributes (str, list) – Wanted attributes

  • wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions

  • kwargs – Other ALF parts to filter by

Returns:

For multiple attributes, returns True only if all attributes are found

Return type:

bool

load_object(alfpath, object=None, short_keys=False, **kwargs)[source]

Reads all files sharing the same object name.

For example, if the file provided to the function is spikes.times, the function will load spikes.times, spikes.clusters, spikes.depths, spike.amps in a dictionary whose keys will be times, clusters, depths, amps

Full Reference here: https://int-brain-lab.github.io/ONE/alf_intro.html

Simplified example: _namespace_object.attribute_timescale.part1.part2.extension

Parameters:
  • alfpath (str, pathlib.Path, list) – Any ALF path pertaining to the object OR directory containing ALFs OR list of paths.

  • object (str, list, None) – The ALF object(s) to filter by. If a directory is provided and object is None, all valid ALF files returned.

  • short_keys (bool) – By default, the output dictionary keys will be compounds of attributes, timescale and any eventual parts separated by a dot. Use True to shorten the keys to the attribute and timescale.

  • wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions.

  • kwargs – Other ALF parts to filter by.

Returns:

An ALFBunch (dict-like) of all attributes pertaining to the object.

Return type:

AlfBunch

Examples

Load ‘spikes’ object

>>> spikes = load_object('full/path/to/my/alffolder/', 'spikes')

Load ‘trials’ object under the ‘ibl’ namespace

>>> trials = load_object('/subject/2021-01-01/001', 'trials', namespace='ibl')
save_object_npy(alfpath, dico, object, parts=None, namespace=None, timescale=None) list[source]

Saves a dictionary in ALF format using object as object name and dictionary keys as attribute names. Dimensions have to be consistent.

Simplified ALF example: _namespace_object.attribute.part1.part2.extension

Parameters:
  • alfpath (str, pathlib.Path) – Path of the folder to save data to

  • dico (dict) – Dictionary to save to npy; keys correspond to ALF attributes

  • object (str) – Name of the object to save

  • parts (str, list, None) – Extra parts to the ALF name

  • namespace (str, None) – The optional namespace of the object

  • timescale (str, None) – The optional timescale of the object

Returns:

List of written files

Return type:

list

Examples

>>> spikes = {'times': np.arange(50), 'depths': np.random.random(50)}
>>> files = save_object_npy('/path/to/my/alffolder/', spikes, 'spikes')
save_metadata(file_alf, dico) None[source]

Writes a meta data file matching a current ALF file object.

For example given an alf file clusters.ccfLocation.ssv this will write a dictionary in JSON format in clusters.ccfLocation.metadata.json

Reserved keywords:
  • columns: column names for binary tables.

  • row: row names for binary tables.

  • unit

Parameters:
  • file_alf (str, pathlib.Path) – Full path to the alf object

  • dico (dict, ALFBunch) – Dictionary containing meta-data

remove_uuid_file(file_path, dry=False) Path[source]

(DEPRECATED) Renames a file without the UUID and returns the new pathlib.Path object.

Parameters:
  • file_path (str, pathlib.Path) – An ALF path containing a UUID in the file name.

  • dry (bool) – If False, the file is not renamed on disk.

Returns:

The new file path without the UUID in the file name.

Return type:

pathlib.Path

remove_uuid_recursive(folder, dry=False) None[source]

(DEPRECATED) Within a folder, recursive renaming of all files to remove UUID.

Parameters:
  • folder (str, pathlib.Path) – A folder to recursively iterate, removing UUIDs from the file names.

  • dry (bool) – If False renames the files on disk.

next_num_folder(session_date_folder: str | Path) str[source]

Return the next number for a session given a session_date_folder.

remove_empty_folders(folder: str | Path) None[source]

Iteratively remove any empty child folders.

filter_by(alf_path, wildcards=True, **kwargs)[source]

Given a path and optional filters, returns all ALF files and their associated parts.

The filters constitute a logical AND. For all but extra, if a list is provided, one or more elements must match (a logical OR).

Parameters:
  • alf_path (str, pathlib.Path) – A path to a folder containing ALF datasets

  • wildcards (bool) – If true, kwargs are matched as unix-style patterns, otherwise as regular expressions

  • object (str, list) – Filter by a given object (e.g. ‘spikes’)

  • attribute (str, list) – Filter by a given attribute (e.g. ‘intervals’)

  • extension (str, list) – Filter by extension (e.g. ‘npy’)

  • namespace (str, list) – Filter by a given namespace (e.g. ‘ibl’) or None for files without one

  • timescale (str, list) – Filter by a given timescale (e.g. ‘bpod’) or None for files without one

  • extra (str, list) – Filter by extra parameters (e.g. ‘raw’) or None for files without extra parts NB: Wild cards not permitted here.

Returns:

  • alf_files (str) – A Path to a directory containing ALF files

  • attributes (list of dicts) – A list of parsed file parts

Examples

Filter files with universal timescale

>>> filter_by(alf_path, timescale=None)

Filter files by a given ALF object

>>> filter_by(alf_path, object='wheel')

Filter using wildcard, e.g. ‘wheel’ and ‘wheelMoves’ ALF objects

>>> filter_by(alf_path, object='wh*')

Filter all intervals that are in bpod time

>>> filter_by(alf_path, attribute='intervals', timescale='bpod')

Filter all files containing either ‘intervals’ OR ‘timestamps’ attributes

>>> filter_by(alf_path, attribute=['intervals', 'timestamps'])

Filter all files using a regular expression

>>> filter_by(alf_path, object='^wheel.*', wildcards=False)
>>> filter_by(alf_path, object=['^wheel$', '.*Moves'], wildcards=False)
find_variants(file_list, namespace=True, timescale=True, extra=True, extension=True)[source]

Find variant datasets.

Finds any datasets on disk that are considered a variant of the input datasets. At minimum, a dataset is uniquely defined by session path, collection, object and attribute. Therefore, datasets with the same name and collection in a different revision folder are considered a variant. If any of the keyword arguments are set to False, those parts are ignored when comparing datasets.

Parameters:
  • file_list (list of str, list of pathlib.Path) – A list of ALF paths to find variants of.

  • namespace (bool) – If true, treat datasets with a different namespace as unique.

  • timescale (bool) – If true, treat datasets with a different timescale as unique.

  • extra (bool) – If true, treat datasets with a different extra parts as unique.

  • extension (bool) – If true, treat datasets with a different extension as unique.

Returns:

A map of input file paths to a list variant dataset paths.

Return type:

Dict[pathlib.Path, list of pathlib.Path]

Raises:

ValueError – One or more input file paths are not valid ALF datasets.

Examples

Find all datasets with an identical name and collection in a different revision folder

>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'])
{Path('/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'): [
    Path('/sub/2020-10-01/001/alf/obj.attr.npy')
]}

Find all datasets with different namespace or revision

>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'], namespace=False)
{Path('/sub/2020-10-01/001/#2020-01-01#/obj.attr.npy'): [
    Path('/sub/2020-10-01/001/#2020-01-01#/_ns_obj.attr.npy'),
    Path('/sub/2020-10-01/001/obj.attr.npy'),
]}