one.alf.cache
Construct Parquet database from local file system.
NB: If using a remote Alyx instance it is advisable to generate the cache via the Alyx one_cache management command, otherwise the resulting cache UUIDs will not match those on the database.
Examples
>>> from one.api import One
>>> cache_dir = 'path/to/data'
>>> make_parquet_db(cache_dir)
>>> one = One(cache_dir=cache_dir)
Functions
Given a data directory, index the ALF datasets and save the generated cache tables. |
|
Remove dataset files and session folders that are not in the provided cache. |
- make_parquet_db(root_dir, out_dir=None, hash_ids=True, hash_files=False, lab=None)[source]
Given a data directory, index the ALF datasets and save the generated cache tables.
- Parameters:
root_dir (str, pathlib.Path) – The file directory to index.
out_dir (str, pathlib.Path) – Optional output directory to save cache tables. If None, the files are saved into the root directory.
hash_ids (bool) – If True, experiment and dataset IDs will be UUIDs generated from the system and relative paths (required for use with ONE API).
hash_files (bool) – If True, an MD5 hash is computed for each dataset and stored in the datasets table. This will substantially increase cache generation time.
lab (str) – An optional lab name to associate with the data. If the folder structure contains ‘lab/Subjects’, the lab name will be taken from the folder name.
- Returns:
pathlib.Path – The full path of the saved sessions parquet table.
pathlib.Path – The full path of the saved datasets parquet table.
- remove_missing_datasets(cache_dir, tables=None, remove_empty_sessions=True, dry=True)[source]
Remove dataset files and session folders that are not in the provided cache.
NB: This does not remove entries from the cache tables that are missing on disk. Non-ALF files are not removed. Empty sessions that exist in the sessions table are not removed.
- Parameters:
cache_dir (str, pathlib.Path)
tables (dict[str, pandas.DataFrame], optional) – A dict with keys (‘sessions’, ‘datasets’), containing the cache tables as DataFrames.
remove_empty_sessions (bool) – Attempt to remove session folders that are empty and not in the sessions table.
dry (bool) – If true, do not remove anything.
- Returns:
A sorted list of paths to be removed.
- Return type:
list