one.alf.spec

The complete ALF specification descriptors and validators.

Module attributes

`SPEC_DESCRIPTION`	The ALF part names and their definitions.
`SESSION_SPEC`	The session specification pattern
`COLLECTION_SPEC`	The collection and revision specification pattern
`FILE_SPEC`	The filename specification pattern
`REL_PATH_SPEC`	The collection, revision and filename specification pattern
`FULL_SPEC`	The full ALF path specification pattern

Functions

`describe`	Print a description of an ALF part.
`is_session_path`	Checks if the syntax corresponds to a session path.
`is_uuid`	Bool test for randomly generated hexadecimal uuid validity.
`is_uuid_string`	Bool test for randomly generated hexadecimal uuid validity.
`is_valid`	Returns a True for a given file name if it is an ALF file, otherwise returns False.
`path_pattern`	Returns a template string representing the where the ALF parts lie in an ALF path.
`readableALF`	Convert camel case string to space separated string.
`regex`	Construct a regular expression pattern for parsing or validating an ALF.
`to_alf`	Given a set of ALF file parts, return a valid ALF file name.

Classes

`QC`	Data QC outcomes.

SPEC_DESCRIPTION = {'Subjects': 'An optional directory to indicate that the experiment data are divided by subject. If organizing by lab, this directory is required.', 'attribute': 'Together with the object, the attribute represents the type of data in the file, for example "times", "amplitudes", "clusters". The names should be in Haskell case, however the following three attributes may be separated by an underscore, e.g. "stimOn_times".\nThe attribute "times" is reserved for discrete event times and comprises a numerical array containing times of the events in seconds, relative to a universal timescale common to all files.\nThe attribute "intervals" should have two columns, indicating the start and end times of each interval relative to the universal timescale.\nContinuous timeseries are represented by the "timestamps" attribute. The file may contain a vector of times in universal seconds if unevenly sampled, or two rows each representing a synchronization point, the first column giving the sample number (counting from 0), and the second column giving the corresponding time in universal seconds. The times corresponding to all samples are then found by linear interpolation. NB: the "timestamps" file is an exception to the rule that all files representing a continuous timeseries object must have one row per sample, as it will often have substantially less.', 'collection': 'An optional folder to group data by modality, device, etc. This is necessary when a session contains multiple measurements of the same type, from example spike times from multiple probes. Label examples include "probe00", "raw_video_data".', 'date': 'The date on which the experiment session took place, in ISO format, i.e. yyyy-mm-dd', 'extension': 'ALF can deal with any sort of file, as long as it has a concept of a number of rows (or primary dimension). The type of file is recognized by its extension. \nPreferred choices:\n\n.npy: numpy array file. This is recommended over flat binary since datatype and shape is stored in the file. If you have an array of 3 or more dimensions, the first dimension counts as the number of rows.\n\n.tsv: tab-delimited text file. This is recommended over comma-separated filessince text fields often have commas in. All rows should have the same number of columns. The first row contains tab-separated names for each column.\n\n.bin: flat binary file. It’s better to use .npy for storing binary data but some recording systems save in flat binary. Rather than convert them, you can ALFize a flat binary file by adding a metadata file, which specifies the number of columns (as the size of the "columns" array) and the binary datatype as a top-level key "dtype", using numpy naming conventions.', 'extra': 'File names could have as many optional parts as you like: "object.attribute.x1.x2.[…].xN.extension". The extra name parts play no formal role, but can serve several additional purposes. For example, it could be a UUID or file hash for archiving purposes. If there are multiple files with the same object, attribute, and extensions but different extra parts, these should be treated as files to be concatenated, for example to allow multiple-part tif files as produced by scanimage to be encoded in ALF. The concatenation would happen in hierarchical lexicographical order: i.e. by lexicographic order of x1, then x2, etc.', 'lab': 'The name of the lab where the data were collected (optional).', 'namespace': 'An option filename prefix for data that are not not expected to be a community standard, for example task specific events. The namespace may also be used to indicate data unique to a given piece of hardware or software, and is identified by underscores, e.g. "_iblrig_", "_phy_".', 'number': 'The sequential session number of the day, optionally zero-padded to be three numbers, e.g. 001, 002, etc.', 'object': 'Every file describing a given object has the same number of rows (i.e. the 1st dimension of an npy file, number of frames in a video file, etc). You can therefore think of the files for an object as together defining a table, with column headings given by the attribute in the file names, and values given by the file contents. Object names should be in Haskell case and pluralized, e.g. "wheelMoves", "sparseNoise", "trials".\nEncoding of relations between objects can be achieved by a simplified relational model. If the attribute name of one file matches the object name of a second, then the first file is guaranteed to contain integers referring to the rows of the second. For example, "spikes.clusters.npy" would contain integer references to the rows of "clusters.brain_location.json" and "clusters.probes.npy"; and "clusters.probes.npy" would contain integer references to "probes.insertion.json". \nBe careful of plurals ("clusters.probe.npy" would not correspond to "probes.insertion.json") and remember we count arrays starting from 0.', 'revision': 'An optional folder to organize data by version. The version label is arbitrary, however the folder must start and end with pound signs, e.g. "#v1.0.0#". Unlike collections, if a specified revision is not found, the previous revision will be returned. The revisions are ordered lexicographically.', 'subject': 'The subject name, typically an arbitrary label', 'timescale': 'If you want to represent times relative to another (non-universal) timescale, a timescale can be appended after an underscore e.g. "spikes.times_ephysClock.npy", "trials.intervals_nidaq", "wheel.timestamps_bpod.csv".'}

The ALF part names and their definitions.

Type:: dict

SESSION_SPEC = '({lab}/Subjects/)?{subject}/{date}/{number}'

The session specification pattern

Type:: str

COLLECTION_SPEC = '({collection}/)?(#{revision}#/)?'

The collection and revision specification pattern

Type:: str

FILE_SPEC = '_?{namespace}?_?{object}\\.{attribute}(?:_{timescale})?(?:\\.{extra})*\\.{extension}$'

The filename specification pattern

Type:: str

REL_PATH_SPEC = '({collection}/)?(#{revision}#/)?_?{namespace}?_?{object}\\.{attribute}(?:_{timescale})?(?:\\.{extra})*\\.{extension}$'

The collection, revision and filename specification pattern

Type:: str

FULL_SPEC = '({lab}/Subjects/)?{subject}/{date}/{number}/({collection}/)?(#{revision}#/)?_?{namespace}?_?{object}\\.{attribute}(?:_{timescale})?(?:\\.{extra})*\\.{extension}$'

The full ALF path specification pattern

Type:: str

class QC(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: IntEnum

Data QC outcomes.

This enumeration is used by the Alyx database. NB: Pandas cache tables use different codes.

CRITICAL = 50: Dataset practically unusable, e.g. clock can’t be aligned; data missing or inaccurate.

FAIL = 40: Dataset does not meet expected standards, e.g. trial event timings different to protocol.

WARNING = 30: Dataset has minor quality issues, e.g. relatively high SNR, that should not affect most analyses.

NOT_SET = 0: Dataset quality has not been assessed.

PASS = 10: Dataset considered ‘gold-standard’, e.g. tight trial event timings, low recorded SNR.

static validate(v)[source]

Validate QC input and return equivalent enumeration.

Parameters:: v (int, str, QC) – A QC enumeration, or equivalent int value or name.
Returns:: The enumeration.
Return type:: QC
Raises:: ValueError – An invalid QC value was passed.

path_pattern() → str[source]

Returns a template string representing the where the ALF parts lie in an ALF path.

Brackets denote optional parts. This is used for documentation purposes only.

describe(part=None, width=99)[source]

Print a description of an ALF part.

Prints the path pattern along with a description of the given ALF part (or all parts if None).

Parameters:

part (str) – ALF part to describe. One from SPEC_DESCRIPTION.keys(). If None, all parts are described.
width (int) – The max line length.

Return type:

None

Examples

>>> describe()
>>> describe('collection')
>>> describe('extension', width=120)

regex(spec: str = '({lab}/Subjects/)?{subject}/{date}/{number}/({collection}/)?(#{revision}#/)?_?{namespace}?_?{object}\\.{attribute}(?:_{timescale})?(?:\\.{extra})*\\.{extension}$', **kwargs) → Pattern[source]

Construct a regular expression pattern for parsing or validating an ALF.

Parameters:

spec (str) – The spec string to construct the regular expression from
kwargs (dict[str]) – Optional patterns to replace the defaults

Returns:

A regular expression Pattern object

Return type:

re.Pattern

Examples

Regex for a filename

>>> pattern = regex(spec=FILE_SPEC)

Regex for a complete path (including root)

>>> pattern = '.*' + regex(spec=FULL_SPEC).pattern

Regex pattern for specific object name

>>> pattern = regex(object='trials')

is_valid(filename)[source]

Returns a True for a given file name if it is an ALF file, otherwise returns False.

Parameters:: filename (str) – The name of the file to evaluate
Returns:: True if filename is valid ALF
Return type:: bool

Examples

>>> is_valid('trials.feedbackType.npy')
True
>>> is_valid('_ns_obj.attr1.2622b17c-9408-4910-99cb-abf16d9225b9.metadata.json')
True
>>> is_valid('spike_train.npy')
False
>>> is_valid('channels._phy_ids.csv')  # WARNING: attribute level namespaces are deprecated
True

is_session_path(path_object)[source]

Checks if the syntax corresponds to a session path.

Note that there is no physical check about existence nor contents.

Parameters:: path_object (str, pathlib.Path) – The path object to validate
Returns:: True if session path a valid ALF session path
Return type:: bool

is_uuid_string(string: str) → bool[source]

Bool test for randomly generated hexadecimal uuid validity.

NB: unlike is_uuid, is_uuid_string checks that uuid is correctly hyphen separated

is_uuid(uuid: str | int | bytes | UUID, versions=(4, 3)) → bool[source]

Bool test for randomly generated hexadecimal uuid validity.

By default only uuid versions 3 and 4 are considered valid. Version 4 uuids are generated by Alyx while version 3 uuids are generated by one.alf.cache.

Unlike is_uuid_string, this function accepts UUID objects and compatible representations.

Parameters:

uuid (str, int, bytes, UUID) – An object to test for UUID validity.
versions (tuple of int) – The UUID versions to considered valid, by default (4, 3).

Returns:

True if the input can be cast to a UUID object and is of the specified version(s).

Return type:

bool

to_alf(object, attribute, extension, namespace=None, timescale=None, extra=None)[source]

Given a set of ALF file parts, return a valid ALF file name.

Essential periods and underscores are added by the function.

Parameters:

object (str) – The ALF object name
attribute (str) – The ALF object attribute name
extension (str) – The file extension
namespace (str) – An optional namespace
timescale (str, tuple) – An optional timescale
extra (str, tuple) – One or more optional extra ALF attributes

Returns:

A file name string built from the ALF parts

Return type:

str

Examples

>>> to_alf('spikes', 'times', 'ssv')
'spikes.times.ssv'
>>> to_alf('spikes', 'times', 'ssv', namespace='ibl')
'_ibl_spikes.times.ssv'
>>> to_alf('spikes', 'times', 'ssv', namespace='ibl', timescale='ephysClock')
'_ibl_spikes.times_ephysClock.ssv'
>>> to_alf('spikes', 'times', 'ssv', namespace='ibl', timescale=('ephys clock', 'minutes'))
'_ibl_spikes.times_ephysClock_minutes.ssv'
>>> to_alf('spikes', 'times', 'npy', namespace='ibl', timescale='ephysClock', extra='raw')
'_ibl_spikes.times_ephysClock.raw.npy'
>>> to_alf('wheel', 'timestamps', 'npy', 'ibl', 'bpod', ('raw', 'v12'))
'_ibl_wheel.timestamps_bpod.raw.v12.npy'

readableALF(name: str, capitalize: bool = False) → str[source]

Convert camel case string to space separated string.

Given an ALF object name or attribute, return a string where the camel case words are space separated. Acronyms/initialisms are preserved.

Parameters:

name (str) – The ALF part to format (e.g. object name or attribute).
capitalize (bool) – If true, return with the first letter capitalized.

Returns:

The name formatted for display, with spaces and capitalization.

Return type:

str

Examples

>>> readableALF('sparseNoise') == 'sparse noise'
>>> readableALF('someROIDataset') == 'some ROI dataset'
>>> readableALF('someROIDataset', capitalize=True) == 'Some ROI dataset'