{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Ke27mmuJ_i0R" }, "source": [ "# Recording data access\n", "When working with huge data repositories it can be worthwhile to record the subset of data used\n", "for a given analysis. ONE can keep track of which datasets were loaded via the `load_*` methods.\n", "\n", "Only datasets that were successfully loaded are recorded; missing datasets are ignored.\n", "\n", "## How to set up and save\n", "At the top of your analysis script, after instantiating ONE, simply set the `record_loaded`\n", "attribute to True:\n", "```python\n", "one.record_loaded = True\n", "```\n", "\n", "At the end of your analysis script, you can save the data by calling `one.save_loaded_ids()`.\n", "By default this will save the dataset UUIDs to a CSV file in the root of your cache directory\n", "and will clear the list of dataset UUIDs. The `sessions_only` kwarg will save the\n", "[eids](./experiment_ids) instead.\n", "\n", "
\n", "Note.\n", "\n", "Within a Python session, calling ONE again with the same arguments (from any location) will return\n", "the previous object, therefore if you want to stop recording dataset UUIDs you must explicitly set\n", "`record_loaded` to False, e.g. `ONE().record_loaded = False`.\n", "
\n", "\n", "## Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "F:\\FlatIron\\openalyx.internationalbrainlab.org\\2022-02-24T13-37-07_loaded_dataset_uuids.csv\n", " dataset_uuid\n", "0 0bc9607d-0a72-4c5c-8b9d-e239a575ff67\n", "1 16c81eaf-a032-49cd-9823-09c0c7350fd2\n", "2 2f4cc220-55b9-4fb3-9692-9aaa5362288f\n", "3 4ee1110f-3ff3-4e26-87b0-41b687f75ce3\n", "4 63aa7dea-1ee2-4a0c-88bc-00b5cba6b8b0\n", "5 69236a5d-1e4a-4bea-85e9-704492756848\n", "6 6b94f568-9bb6-417c-9423-a84559f403d5\n", "7 82237144-41bb-4e7f-9ef4-cabda4381d9f\n", "8 91f08c6d-7ee0-487e-adf5-9c751769af06\n", "9 b77d2665-876e-41e7-ac57-aa2854c5d5cd\n", "10 c14d8683-3706-4e44-a8d2-cd0e2bfd4579\n", "11 c8cd43a7-b443-4342-8c37-aa93a2067447\n", "12 d078bfc8-214d-4682-8621-390ad74dd6d5\n", "13 d11d7b33-3a96-4ea6-849f-5448a97d3fc1\n", "14 d73f567a-5799-4051-9bc8-6f0fd6bb478b\n", "15 e1793e9d-cd96-4cb6-9fd7-a6b662c41971\n", "16 fceb8cfe-77b4-4177-a6af-44fbf51b33d0\n", "\n", "F:\\FlatIron\\openalyx.internationalbrainlab.org\\2022-02-24T13-37-07_loaded_session_uuids.csv\n", " session_uuid\n", "0 4b7fbad4-f6de-43b4-9b15-c7c7ef44db4b\n", "1 aad23144-0e52-4eac-80c5-c4ee2decb198\n" ] } ], "source": [ "import pandas as pd\n", "from one.api import ONE\n", "one = ONE(base_url='https://openalyx.internationalbrainlab.org')\n", "\n", "# Turn on recording of loaded dataset UUIDs\n", "one.record_loaded = True\n", "\n", "# Load some trials data\n", "eid = 'KS023/2019-12-10/001'\n", "dsets = one.load_object(eid, 'trials')\n", "\n", "# Load another dataset\n", "eid = 'CSHL049/2020-01-08/001'\n", "dset = one.load_dataset(eid, 'probes.description')\n", "\n", "# Save the dataset IDs to file\n", "dataset_uuids, filename = one.save_loaded_ids(clear_list=False)\n", "print(filename)\n", "print(pd.read_csv(filename), end='\\n\\n')\n", "\n", "# Save the session IDs\n", "session_uuids, filename = one.save_loaded_ids(sessions_only=True)\n", "print(filename)\n", "print(pd.read_csv(filename))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data reproducibility\n", "\n", "The Alyx database may be periodically updated with revised datasets. Data revisions occur when a session is preprocessed with a better algorithm. Typically the newest revision will be considered the default one to load. Therefore, to ensure the results of an analysis don't change unexpectedly, either of the below two methods can be used to 'freeze' the data.\n", "\n", "## Saving the data access tables\n", "\n", "When search and load queries are made, the results are stored in memory. These access tables can be saved to disk then used by ONE in [local mode](./one_modes).\n", "First, run your analysis code with the various search and load queries to build up the access tables in memory, then save them using the `save_cache` method.\n", "\n", "
\n", "Note.\n", "\n", "Unlike the `record_loaded` option above, these access tables contain the results of all session and dataset queries, not just a list of loaded datasets.\n", "
\n", "\n", "### Example 1: Saving the access tables" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from one.api import ONE\n", "one = ONE()\n", "\n", "... # Run one.search and one.load_* methods here\n", "\n", "# Save the tables to disk\n", "one.save_cache()\n", "\n", "# To use these tables in a new session, initialize ONE in local mode\n", "one = ONE(mode='local')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the access tables are saved in the default location they will be loaded by default each time ONE is instantiated, however they won't be used unless either mode is set to 'local', or a method is called with `query_type='local'`. The access tables can be reset at any time by calling `one.reset_cache`. Additionally, when saving, if tables already exist on disk, they will be merged and updated. To fully overwrite the tables on disk, use `clobber=True` in the `save_cache` method.\n", "\n", "### Example 2: Save tables to different location" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from one.api import ONE\n", "one = ONE()\n", "\n", "... # Run one.search and one.load_* methods here\n", "\n", "# Save the tables to disk, overwriting any existing tables\n", "one.save_cache(one.cache_dir / 'my_analysis_tables', clobber=True)\n", "\n", "# To use these tables in a new session, initialize ONE in local mode\n", "one = ONE(mode='local') # tables_dir may also be passed as a kwarg here\n", "# The tables must be manually loaded as they are not in the default location\n", "one.load_cache(one.cache_dir / 'my_analysis_tables')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The saved access tables are .pqt files that can also be manually shared with other users.\n", "\n", "
\n", "Note.\n", "\n", "If a dataset is deleted from the database it will only be loadable if the file still exists locally.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting a revision date\n", "\n", "Another way to reproduce data access on a certain date is to use the `ONE_REVISION_LAST_BEFORE` environment variable. With this variable set to an ISO formatted date,\n", "any datasets with a revision newer than this date are ignored. Similarly, a user can pass this date into the load methods using the `revision` kwarg.\n", "\n", "### Example:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['alf/probe00/pykilosort/#2024-05-06#/drift.times.npy', 'alf/probe00/pykilosort/drift.times.npy']\n", "alf\\probe00\\pykilosort\\drift.times.npy\n" ] } ], "source": [ "import os\n", "os.environ['ONE_REVISION_LAST_BEFORE'] = '2024-04-01'\n", "\n", "from one.api import ONE\n", "one = ONE()\n", "\n", "eid = 'b52182e7-39f6-4914-9717-136db589706e'\n", "dsets = one.list_datasets(eid, filename='drift.times.npy', collection='alf/probe00/pykilosort')\n", "print(dsets) # shows at least two versions of this dataset\n", "\n", "file = one.load_dataset(eid, 'drift.times.npy', collection='alf/probe00/pykilosort', download_only=True)\n", "print(file.relative_to_session()) # shows the older version of this dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Warning.\n", "\n", "If a dataset's QC is set to CRITICAL it will no longer be loadable, even with the `ONE_REVISION_LAST_BEFORE`. Additionally, if a dataset is removed from the database it will be no longer loadable. In these cases, saving data access tables is more robust.\n", "\n", "
" ] } ], "metadata": { "colab": { "name": "Accessing and sharing data with ONE local mode", "provenance": [] }, "kernelspec": { "display_name": "iblenv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 0 }