one.alf.io
I/O functions for ALyx Files.
Provides support for time-series reading and interpolation as per the specifications For a full overview of the scope of the format, see:
https://int-brain-lab.github.io/ONE/alf_intro.html
Functions
Test for consistency of dimensions as per ALF specs in a dictionary. |
|
Converts an Bunch conforming to size conventions into a pandas DataFrame. |
|
Test if ALF object and optionally specific attributes exist in the given path |
|
Given a path and optional filters, returns all ALF files and their associated parts. |
|
Find variant datasets. |
|
Iterate over all files in a session, and yield relative dataset paths. |
|
Recursively iterate over session paths in a given directory. |
|
Returns content of files. |
|
Reads all files sharing the same object name. |
|
Return the next number for a session given a session_date_folder. |
|
Load time-series from ALF format |
|
Iteratively remove any empty child folders. |
|
(DEPRECATED) Within a folder, recursive renaming of all files to remove UUID. |
|
Writes a meta data file matching a current ALF file object. |
|
Saves a dictionary in ALF format using object as object name and dictionary keys as attribute names. |
|
Interpolate a continuous timeseries of the shape (2, 2) |
Classes
A dict-like object that supports dot indexing and conversion to DataFrame |
- class AlfBunch(*args, **kwargs)[source]
Bases:
Bunch
A dict-like object that supports dot indexing and conversion to DataFrame
- property check_dimensions
0 for consistent dimensions, 1 for inconsistent dimensions
- Type:
int
- append(b, inplace=False)[source]
Appends one bunch to another, key by key
- Parameters:
b (Bunch, dict) – A Bunch of data to append
inplace (bool) – If true, the data are appended in place, otherwise a copy is returned
- Returns:
An ALFBunch with the data appended, or None if inplace is True
- Return type:
ALFBunch, None
- dataframe(adict)[source]
Converts an Bunch conforming to size conventions into a pandas DataFrame. For 2-D arrays, stops at 10 columns per attribute.
- Parameters:
adict (dict, Bunch) – A dict-like object of data to convert to DataFrame
- Returns:
A pandas DataFrame of data
- Return type:
pd.DataFrame
- read_ts(filename)[source]
Load time-series from ALF format
- Parameters:
filename (str, pathlib.Path) – An ALF path whose values to load
- Returns:
numpy.ndarray – An array of timestamps belonging to the ALF path object
numpy.ndarray – An array of values in filename
Examples
>>> t, d = read_ts(filename)
- ts2vec(ts: ndarray, n_samples: int) ndarray [source]
Interpolate a continuous timeseries of the shape (2, 2)
- Parameters:
ts (numpy.array) – a 2x2 numpy array of the form (sample, ts)
n_samples (int) – Number of samples; i.e. the size of the resulting vector
- Returns:
A vector of interpolated timestamps
- Return type:
numpy.ndarray
- check_dimensions(dico)[source]
Test for consistency of dimensions as per ALF specs in a dictionary.
Alf broadcasting rules: only accepts consistent dimensions for a given axis a dimension is consistent with another if it’s empty, 1, or equal to the other arrays dims [a, 1], [1, b] and [a, b] are all consistent, [c, 1] is not
- Parameters:
dico (ALFBunch, dict) – Dictionary containing data
- Returns:
Status 0 for consistent dimensions, 1 for inconsistent dimensions
- Return type:
int
- load_file_content(fil)[source]
Returns content of files. Designed for very generic file formats: so far supported contents are json, npy, csv, (h)tsv, ssv, jsonable
- Parameters:
fil (str, pathlib.Path) – File to read
- Returns:
Array/json/pandas dataframe depending on format
- Return type:
Any
- iter_sessions(root_dir, pattern='*')[source]
Recursively iterate over session paths in a given directory.
- Parameters:
root_dir (str, pathlib.Path) – The folder to look for sessions.
pattern (str) – Glob pattern to use. Default searches all folders. Providing a more specific pattern makes this more performant (see examples).
- Yields:
pathlib.Path – The next session path in lexicographical order.
Examples
Efficient iteration when root_dir contains <lab>/Subjects folders
>>> sessions = list(iter_sessions(root_dir, pattern='*/Subjects/*/????-??-??/*'))
Efficient iteration when root_dir contains subject folders
>>> sessions = list(iter_sessions(root_dir, pattern='*/????-??-??/*'))
- iter_datasets(session_path)[source]
Iterate over all files in a session, and yield relative dataset paths.
- Parameters:
session_path (str, pathlib.Path) – The folder to look for datasets.
- Yields:
one.alf.path.ALFPath – The next dataset path (relative to the session path) in lexicographical order.
- exists(alfpath, object, attributes=None, **kwargs) bool [source]
Test if ALF object and optionally specific attributes exist in the given path
- Parameters:
alfpath (str, pathlib.Path) – The folder to look into
object (str) – ALF object name
attributes (str, list) – Wanted attributes
wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions
kwargs – Other ALF parts to filter by
- Returns:
For multiple attributes, returns True only if all attributes are found
- Return type:
bool
- load_object(alfpath, object=None, short_keys=False, **kwargs)[source]
Reads all files sharing the same object name.
For example, if the file provided to the function is spikes.times, the function will load spikes.times, spikes.clusters, spikes.depths, spike.amps in a dictionary whose keys will be times, clusters, depths, amps
Full Reference here: https://int-brain-lab.github.io/ONE/alf_intro.html
Simplified example: _namespace_object.attribute_timescale.part1.part2.extension
- Parameters:
alfpath (str, pathlib.Path, list) – Any ALF path pertaining to the object OR directory containing ALFs OR list of paths.
object (str, list, None) – The ALF object(s) to filter by. If a directory is provided and object is None, all valid ALF files returned.
short_keys (bool) – By default, the output dictionary keys will be compounds of attributes, timescale and any eventual parts separated by a dot. Use True to shorten the keys to the attribute and timescale.
wildcards (bool) – If true uses unix shell style pattern matching, otherwise uses regular expressions.
kwargs – Other ALF parts to filter by.
- Returns:
An ALFBunch (dict-like) of all attributes pertaining to the object.
- Return type:
Examples
Load ‘spikes’ object
>>> spikes = load_object('full/path/to/my/alffolder/', 'spikes')
Load ‘trials’ object under the ‘ibl’ namespace
>>> trials = load_object('/subject/2021-01-01/001', 'trials', namespace='ibl')
- save_object_npy(alfpath, dico, object, parts=None, namespace=None, timescale=None) list [source]
Saves a dictionary in ALF format using object as object name and dictionary keys as attribute names. Dimensions have to be consistent.
Simplified ALF example: _namespace_object.attribute.part1.part2.extension.
- Parameters:
alfpath (str, pathlib.Path) – Path of the folder to save data to.
dico (dict) – Dictionary to save to npy; keys correspond to ALF attributes.
object (str) – Name of the object to save.
parts (str, list, None) – Extra parts to the ALF name.
namespace (str, None) – The optional namespace of the object.
timescale (str, None) – The optional timescale of the object.
- Returns:
List of written files.
- Return type:
list of one.alf.path.ALFPath
Examples
>>> spikes = {'times': np.arange(50), 'depths': np.random.random(50)} >>> files = save_object_npy('/path/to/my/alffolder/', spikes, 'spikes')
- save_metadata(file_alf, dico) ALFPath [source]
Writes a meta data file matching a current ALF file object.
For example given an alf file clusters.ccfLocation.ssv this will write a dictionary in JSON format in clusters.ccfLocation.metadata.json
- Reserved keywords:
columns: column names for binary tables.
row: row names for binary tables.
unit
- Parameters:
file_alf (str, pathlib.Path) – Full path to the alf object
dico (dict, ALFBunch) – Dictionary containing meta-data
- Returns:
The saved metadata file path.
- Return type:
- remove_uuid_recursive(folder, dry=False) None [source]
(DEPRECATED) Within a folder, recursive renaming of all files to remove UUID.
- Parameters:
folder (str, pathlib.Path) – A folder to recursively iterate, removing UUIDs from the file names.
dry (bool) – If False renames the files on disk.
- next_num_folder(session_date_folder: str | Path) str [source]
Return the next number for a session given a session_date_folder.
- filter_by(alf_path, wildcards=True, **kwargs)[source]
Given a path and optional filters, returns all ALF files and their associated parts.
The filters constitute a logical AND. For all but extra, if a list is provided, one or more elements must match (a logical OR).
- Parameters:
alf_path (str, pathlib.Path) – A path to a folder containing ALF datasets.
wildcards (bool) – If true, kwargs are matched as unix-style patterns, otherwise as regular expressions.
object (str, list) – Filter by a given object (e.g. ‘spikes’).
attribute (str, list) – Filter by a given attribute (e.g. ‘intervals’).
extension (str, list) – Filter by extension (e.g. ‘npy’).
namespace (str, list) – Filter by a given namespace (e.g. ‘ibl’) or None for files without one.
timescale (str, list) – Filter by a given timescale (e.g. ‘bpod’) or None for files without one.
extra (str, list) – Filter by extra parameters (e.g. ‘raw’) or None for files without extra parts NB: Wild cards not permitted here.
- Returns:
alf_files (list of one.alf.path.ALFPath) – A Path to a directory containing ALF files.
attributes (list of dicts) – A list of parsed file parts.
Examples
Filter files with universal timescale
>>> filter_by(alf_path, timescale=None)
Filter files by a given ALF object
>>> filter_by(alf_path, object='wheel')
Filter using wildcard, e.g. ‘wheel’ and ‘wheelMoves’ ALF objects
>>> filter_by(alf_path, object='wh*')
Filter all intervals that are in bpod time
>>> filter_by(alf_path, attribute='intervals', timescale='bpod')
Filter all files containing either ‘intervals’ OR ‘timestamps’ attributes
>>> filter_by(alf_path, attribute=['intervals', 'timestamps'])
Filter all files using a regular expression
>>> filter_by(alf_path, object='^wheel.*', wildcards=False) >>> filter_by(alf_path, object=['^wheel$', '.*Moves'], wildcards=False)
- find_variants(file_list, namespace=True, timescale=True, extra=True, extension=True)[source]
Find variant datasets.
Finds any datasets on disk that are considered a variant of the input datasets. At minimum, a dataset is uniquely defined by session path, collection, object and attribute. Therefore, datasets with the same name and collection in a different revision folder are considered a variant. If any of the keyword arguments are set to False, those parts are ignored when comparing datasets.
- Parameters:
file_list (list of str, list of pathlib.Path) – A list of ALF paths to find variants of.
namespace (bool) – If true, treat datasets with a different namespace as unique.
timescale (bool) – If true, treat datasets with a different timescale as unique.
extra (bool) – If true, treat datasets with a different extra parts as unique.
extension (bool) – If true, treat datasets with a different extension as unique.
- Returns:
A map of input file paths to a list variant dataset paths.
- Return type:
Dict[pathlib.Path, list of pathlib.Path]
- Raises:
ValueError – One or more input file paths are not valid ALF datasets.
Examples
Find all datasets with an identical name and collection in a different revision folder
>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy']) {Path('/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'): [ Path('/sub/2020-10-01/001/alf/obj.attr.npy') ]}
Find all datasets with different namespace or revision
>>> find_variants(['/sub/2020-10-01/001/alf/#2020-01-01#/obj.attr.npy'], namespace=False) {Path('/sub/2020-10-01/001/#2020-01-01#/obj.attr.npy'): [ Path('/sub/2020-10-01/001/#2020-01-01#/_ns_obj.attr.npy'), Path('/sub/2020-10-01/001/obj.attr.npy'), ]}