one.alf.io

I/O functions for ALyx Files.

Provides support for time-series reading and interpolation as per the specifications For a full overview of the scope of the format, see:

https://int-brain-lab.github.io/ONE/alf_intro.html

Functions

`check_dimensions`	Test for consistency of dimensions as per ALF specs in a dictionary.
`dataframe`	Convert an Bunch conforming to size conventions into a pandas DataFrame.
`exists`	Test if ALF object and optionally specific attributes exist in the given path.
`filter_by`	Given a path and optional filters, returns all ALF files and their associated parts.
`find_variants`	Find variant datasets.
`iter_datasets`	Iterate over all files in a session, and yield relative dataset paths.
`iter_sessions`	Recursively iterate over session paths in a given directory.
`load_file_content`	Return content of a file path.
`load_object`	Reads all files sharing the same object name.
`next_num_folder`	Return the next number for a session given a session_date_folder.
`read_ts`	Load time-series from ALF format.
`remove_empty_folders`	Iteratively remove any empty child folders.
`save_metadata`	Writes a meta data file matching a current ALF file object.
`save_object_npy`	Save dictionary in ALF format using dictionary keys as attribute names.
`ts2vec`	Interpolate a continuous timeseries of the shape (2, 2).

Classes

AlfBunch

A dict-like object that supports dot indexing and conversion to DataFrame.

class AlfBunch(*args, **kwargs)[source]

Bases: Bunch

A dict-like object that supports dot indexing and conversion to DataFrame.

property check_dimensions

0 for consistent dimensions, 1 for inconsistent dimensions.

Type:: int

append(b, inplace=False)[source]

Appends one bunch to another, key by key.

Parameters:

b (Bunch, dict) – A Bunch of data to append
inplace (bool) – If true, the data are appended in place, otherwise a copy is returned

Returns:

An ALFBunch with the data appended, or None if inplace is True

Return type:

ALFBunch, None

to_df() → DataFrame[source]: Return DataFrame with data keys as columns.

static from_df(df) → AlfBunch[source]

dataframe(adict)[source]

Convert an Bunch conforming to size conventions into a pandas DataFrame.

For 2-D arrays, stops at 10 columns per attribute.

Parameters:: adict (dict, Bunch) – A dict-like object of data to convert to DataFrame
Returns:: A pandas DataFrame of data
Return type:: pd.DataFrame

read_ts(filename)[source]

Load time-series from ALF format.

Parameters:

filename (str, pathlib.Path) – An ALF path whose values to load

Returns:

numpy.ndarray – An array of timestamps belonging to the ALF path object
numpy.ndarray – An array of values in filename

Examples

>>> t, d = read_ts(filename)

ts2vec(ts: ndarray, n_samples: int) → ndarray[source]

Interpolate a continuous timeseries of the shape (2, 2).

Parameters:

ts (numpy.array) – a 2x2 numpy array of the form (sample, ts)
n_samples (int) – Number of samples; i.e. the size of the resulting vector

Returns:

A vector of interpolated timestamps

Return type:

numpy.ndarray

check_dimensions(dico)[source]

Test for consistency of dimensions as per ALF specs in a dictionary.

Alf broadcasting rules: only accepts consistent dimensions for a given axis a dimension is consistent with another if it’s empty, 1, or equal to the other arrays dims [a, 1], [1, b] and [a, b] are all consistent, [c, 1] is not

Parameters:: dico (ALFBunch, dict) – Dictionary containing data
Returns:: Status 0 for consistent dimensions, 1 for inconsistent dimensions
Return type:: int

load_file_content(fil)[source]

Return content of a file path.

Designed for very generic data file formats such as json, npy, csv, (h)tsv, ssv.

Parameters:: fil (str, pathlib.Path) – File to read
Returns:: Array/json/pandas dataframe depending on format
Return type:: Any

iter_sessions(root_dir, pattern='*')[source]

Recursively iterate over session paths in a given directory.

Parameters:

root_dir (str, pathlib.Path) – The folder to look for sessions.
pattern (str) – Glob pattern to use. Default searches all folders. Providing a more specific pattern makes this more performant (see examples).

Yields:

pathlib.Path – The next session path in lexicographical order.

Examples

Efficient iteration when root_dir contains <lab>/Subjects folders

>>> sessions = list(iter_sessions(root_dir, pattern='*/Subjects/*/????-??-??/*'))

Efficient iteration when root_dir contains subject folders

>>> sessions = list(iter_sessions(root_dir, pattern='*/????-??-??/*'))

iter_datasets(session_path)[source]

Iterate over all files in a session, and yield relative dataset paths.

Parameters:: session_path (str, pathlib.Path) – The folder to look for datasets.
Yields:: one.alf.path.ALFPath – The next dataset path (relative to the session path) in lexicographical order.

exists(alfpath, object, attributes=None, **kwargs) → bool[source]

Test if ALF object and optionally specific attributes exist in the given path.

Parameters:

alfpath (str, pathlib.Path) – The folder to look into
object (str) – ALF object name
attributes (str, list) – Wanted attributes
wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions
kwargs – Other ALF parts to filter by

Returns:

For multiple attributes, returns True only if all attributes are found

Return type:

bool

load_object(alfpath, object=None, short_keys=False, **kwargs)[source]

Reads all files sharing the same object name.

For example, if the file provided to the function is spikes.times, the function will load spikes.times, spikes.clusters, spikes.depths, spike.amps in a dictionary whose keys will be times, clusters, depths, amps

Full Reference here: https://int-brain-lab.github.io/ONE/alf_intro.html

Simplified example: _namespace_object.attribute_timescale.part1.part2.extension

Parameters:

alfpath (str, pathlib.Path, list) – Any ALF path pertaining to the object OR directory containing ALFs OR list of paths.
object (str, list, None) – The ALF object(s) to filter by. If a directory is provided and object is None, all valid ALF files returned.
short_keys (bool) – By default, the output dictionary keys will be compounds of attributes, timescale and any eventual parts separated by a dot. Use True to shorten the keys to the attribute and timescale.
wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions.
kwargs – Other ALF parts to filter by.

Returns:

An ALFBunch (dict-like) of all attributes pertaining to the object.

Return type:

AlfBunch

Examples

Load ‘spikes’ object

>>> spikes = load_object('full/path/to/my/alffolder/', 'spikes')

Load ‘trials’ object under the ‘ibl’ namespace

>>> trials = load_object('/subject/2021-01-01/001', 'trials', namespace='ibl')

save_object_npy(alfpath, dico, object, parts=None, namespace=None, timescale=None) → list[source]

Save dictionary in ALF format using dictionary keys as attribute names.

Dimensions have to be consistent.

Simplified ALF example: _namespace_object.attribute.part1.part2.extension.

Parameters:

alfpath (str, pathlib.Path) – Path of the folder to save data to.
dico (dict) – Dictionary to save to npy; keys correspond to ALF attributes.
object (str) – Name of the object to save.
parts (str, list, None) – Extra parts to the ALF name.
namespace (str, None) – The optional namespace of the object.
timescale (str, None) – The optional timescale of the object.

Returns:

List of written files.

Return type:

list of one.alf.path.ALFPath

Examples

>>> spikes = {'times': np.arange(50), 'depths': np.random.random(50)}
>>> files = save_object_npy('/path/to/my/alffolder/', spikes, 'spikes')

save_metadata(file_alf, dico) → ALFPath[source]

Writes a meta data file matching a current ALF file object.

For example given an alf file clusters.ccfLocation.ssv this will write a dictionary in JSON format in clusters.ccfLocation.metadata.json

Reserved keywords:

columns: column names for binary tables.
row: row names for binary tables.
unit

Parameters:

file_alf (str, pathlib.Path) – Full path to the alf object
dico (dict, ALFBunch) – Dictionary containing meta-data

Returns:

The saved metadata file path.

Return type:

one.alf.path.ALFPath

next_num_folder(session_date_folder: str | Path) → str[source]: Return the next number for a session given a session_date_folder.

remove_empty_folders(folder: str | Path) → None[source]: Iteratively remove any empty child folders.

filter_by(alf_path, wildcards=True, **kwargs)[source]

Given a path and optional filters, returns all ALF files and their associated parts.

The filters constitute a logical AND. For all but extra, if a list is provided, one or more elements must match (a logical OR).

Parameters:

alf_path (str, pathlib.Path) – A path to a folder containing ALF datasets.
wildcards (bool) – If true, kwargs are matched as unix-style patterns, otherwise as regular expressions.
object (str, list) – Filter by a given object (e.g. ‘spikes’).
attribute (str, list) – Filter by a given attribute (e.g. ‘intervals’).
extension (str, list) – Filter by extension (e.g. ‘npy’).
namespace (str, list) – Filter by a given namespace (e.g. ‘ibl’) or None for files without one.
timescale (str, list) – Filter by a given timescale (e.g. ‘bpod’) or None for files without one.
extra (str, list) – Filter by extra parameters (e.g. ‘raw’) or None for files without extra parts NB: Wild cards not permitted here.

Returns:

alf_files (list of one.alf.path.ALFPath) – A Path to a directory containing ALF files.
attributes (list of dicts) – A list of parsed file parts.

Examples

Filter files with universal timescale

>>> filter_by(alf_path, timescale=None)

Filter files by a given ALF object

>>> filter_by(alf_path, object='wheel')

Filter using wildcard, e.g. ‘wheel’ and ‘wheelMoves’ ALF objects

>>> filter_by(alf_path, object='wh*')

Filter all intervals that are in bpod time

>>> filter_by(alf_path, attribute='intervals', timescale='bpod')

Filter all files containing either ‘intervals’ OR ‘timestamps’ attributes

>>> filter_by(alf_path, attribute=['intervals', 'timestamps'])

Filter all files using a regular expression

>>> filter_by(alf_path, object='^wheel.*', wildcards=False)
>>> filter_by(alf_path, object=['^wheel$', '.*Moves'], wildcards=False)

find_variants(file_list, namespace=True, timescale=True, extra=True, extension=True)[source]

Find variant datasets.

Finds any datasets on disk that are considered a variant of the input datasets. At minimum, a dataset is uniquely defined by session path, collection, object and attribute. Therefore, datasets with the same name and collection in a different revision folder are considered a variant. If any of the keyword arguments are set to False, those parts are ignored when comparing datasets.

Parameters:

file_list (list of str, list of pathlib.Path) – A list of ALF paths to find variants of.
namespace (bool) – If true, treat datasets with a different namespace as unique.
timescale (bool) – If true, treat datasets with a different timescale as unique.
extra (bool) – If true, treat datasets with a different extra parts as unique.
extension (bool) – If true, treat datasets with a different extension as unique.

Returns:

A map of input file paths to a list variant dataset paths.

Return type:

Dict[pathlib.Path, list of pathlib.Path]

Raises:

ValueError – One or more input file paths are not valid ALF datasets.

Examples

Find all datasets with an identical name and collection in a different revision folder

>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'])
{Path('/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'): [
    Path('/sub/2020-10-01/001/alf/obj.attr.npy')
]}

Find all datasets with different namespace or revision

>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'], namespace=False)
{Path('/sub/2020-10-01/001/#2020-01-01#/obj.attr.npy'): [
    Path('/sub/2020-10-01/001/#2020-01-01#/_ns_obj.attr.npy'),
    Path('/sub/2020-10-01/001/obj.attr.npy'),
]}