{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Accessing and sharing data with ONE local mode", "provenance": [] }, "kernelspec": { "name": "python3", "language": "python", "display_name": "Python 3 (ipykernel)" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "Ke27mmuJ_i0R" }, "source": [ "# Releasing data with ONE\n", "ONE can operate in [two modes](./one_modes.html). For very large collections, such as the main IBL\n", "data, it operates in 'remote mode', downloading data from a remote store only as required.\n", "However it can also be used in 'local mode', in which all data files are stored on the user's\n", "local machine. This is simpler, and allows access with no internet connection.\n", "To access data in local mode, a user uses exactly the same commands as in remote mode.\n", "\n", "ONE stores an index of the local filesystem in a couple of parquet files in the root directory,\n", "called 'sessions.pqt' and 'datasets.pqt'.\n", "\n", "The easiest way for data producers to release ONE-standard data is thus to create a directory\n", "containing your data files, collect them all into a .zip file, and put it on a website along\n", "with the index files.\n", "\n", "Users can then download and unzip your data files, and access them using ONE local mode.\n", "\n", "For information about creating and validating ONE-standard data (a.k.a. ALFs) see the [guide on\n", "datasets](./datasets_and_types.html).\n", "\n", "This guide is for releasing data without an Alyx database. If using an Alyx database instance and\n", "wish to share dataset UUIDs with others, see [recording data access](./recording_data_access.html).\n", "\n", "In this tutorial we will show how to create the index files.\n", "\n", "First [install ONE](../one_installation.html):\n", "\n", "```\n", "pip install ONE-api\n", "```\n", "\n", "## Downloading example data\n", "Next we are going to download an example collection of behavioural data files. The following\n", "commands download a zip file from figshare into a cache directory, and unzip it. It is of course\n", "also possible to download and unzip the file manually, in any directory of the users choice.\n", "This should take around a minute.\n", "\n", "The dataset is around 74,000 behaviour files (~218MB) from the beta data release for the [IBL\n", "behaviour paper](https://elifesciences.org/articles/63711). The code in this cell is not important.\n", "\n", "
\n", "Note.\n", "\n", "The zip file is for demonstrating how to prepare data for ONE and therefore doesn't yet contain\n", "index files. Normally when releasing data you would include the index files. It is also not\n", "a complete set of behaviour sessions used in the accompanying paper.\n", "
" ] }, { "cell_type": "code", "metadata": { "id": "XtHUmp7I7lpy" }, "source": [ "from pathlib import Path\n", "\n", "from one.api import One\n", "from one.params import CACHE_DIR_DEFAULT\n", "import requests\n", "from io import BytesIO\n", "import zipfile\n", "\n", "# Data locations:\n", "# The data_url is the location of the remote example dataset. This will be downloaded so we\n", "# have something to build a cache from on our local computer.\n", "data_url = 'https://ndownloader.figshare.com/files/21623715'\n", "\n", "# The cache_dir is the location of the example dataset. By default this will be\n", "# ~/Downloads/ONE/my_example but could be set to anything.\n", "cache_dir = Path(CACHE_DIR_DEFAULT, 'my_example')\n", "\n", "# Download data if not already downloaded\n", "if not (cache_dir.exists() and any(cache_dir.iterdir())):\n", " cache_dir.parent.mkdir(exist_ok=True, parents=True) # Create destination dir\n", " print(f'Downloading data from {data_url.split(\".\", maxsplit=1)[-1]}...')\n", " request = requests.get(data_url) # Download data into memory (~300MB)\n", " with zipfile.ZipFile(BytesIO(request.content)) as zipped:\n", " print(f'Extracting into {cache_dir}...')\n", " zipped.extractall(path=cache_dir.parent) # Decompress into destination dir\n", " Path(cache_dir.parent, 'ibl-behavioral-data-Dec2019').rename(cache_dir) # Rename\n", " cache_dir.joinpath('one_example.py').unlink() # Delete outdated example\n", " del request # Free resources\n" ], "execution_count": 1, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading data from figshare.com/files/21623715...\n", "Extracting into C:\\Users\\User\\Downloads\\ONE\\my_example...\n" ] } ] }, { "cell_type": "markdown", "metadata": { "id": "Iwd2--7yx10s" }, "source": [ "## Validating your data\n", "Before building the cache tables, you need to organize your data files into directories with a\n", "specific naming convention:\n", "\n", "`lab/Subjects/subject/date/number`\n", "\n", "
\n", "Note.\n", "\n", "If releasing data from a single lab, the `lab/Subjects` directories are not necessary. The\n", "`subject/date/number` structure, however, is always required.\n", "
\n", "\n", "Now let's have a look inside the data directory. For the first recording made from Zador lab\n", "subject `CSH_ZAD_003` on 11 August 2019, the files are in the path\n", "`zadorlab/Subjects/CSH_ZAD_003/2019-08-11/001/alf`.\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GM5QM6O678HX", "outputId": "ac668bdc-c160-40cd-d1f2-45507ba25c65" }, "source": [ "import os\n", "print(os.listdir(cache_dir))\n", "session_path = cache_dir.joinpath('zadorlab/Subjects/CSH_ZAD_003/2019-08-11/001/alf')\n", "print(os.listdir(session_path))" ], "execution_count": 8, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['angelakilab', 'churchlandlab', 'cortexlab', 'danlab', 'datasets.pqt', 'hoferlab', 'mainenlab', 'mrsicflogellab', 'README.txt', 'sessions.pqt', 'wittenlab', 'zadorlab']\n", "['_ibl_trials.choice.npy', '_ibl_trials.contrastLeft.npy', '_ibl_trials.contrastRight.npy', '_ibl_trials.feedbackType.npy', '_ibl_trials.feedback_times.npy', '_ibl_trials.goCueTrigger_times.npy', '_ibl_trials.goCue_times.npy', '_ibl_trials.included.npy', '_ibl_trials.intervals.npy', '_ibl_trials.probabilityLeft.npy', '_ibl_trials.repNum.npy', '_ibl_trials.response_times.npy', '_ibl_trials.rewardVolume.npy', '_ibl_trials.stimOnTrigger_times.npy', '_ibl_trials.stimOn_times.npy']\n" ] } ] }, { "cell_type": "markdown", "source": [ "Within this directory, the datasets are named with standard [ONE file naming conventions](../alf_intro.html).\n", "Files that are not withing the right directory structure or that don't have a valid filename\n", "will not be added to the cache tables. Before building the cache you should check that your\n", "datasets are ALF compliant:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 9, "outputs": [], "source": [ "from one.alf.spec import is_valid\n", "assert all(is_valid(x) for x in os.listdir(session_path))" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "metadata": { "id": "p3b_stuRy21_" }, "source": [ "## Generating the cache\n", "Now let's see how you would release data in ONE standard if you are a data producer. Before\n", "zipping and uploading, you need to create the index files, which is done with one line of code.\n", "This should take about 3 minutes for the current behavioral data.\n", "\n", "
\n", "Note.\n", "\n", "The option `hash_files` creates a hash of each file which allows ONE to detect when files have\n", "changed - this is good for dynanmic data stored on an Alyx but is not necessary for release of\n", "zipped data on a website.\n", "
" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TShKLIL67vxY", "outputId": "d61c9dca-902d-4a73-a2f0-42a055e33a4a" }, "source": [ "print('Building ONE cache from filesystem...')\n", "One.setup(cache_dir, hash_files=False)" ], "execution_count": 3, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Building ONE cache from filesystem...\n" ] }, { "data": { "text/plain": "One (offline, C:\\Users\\User\\Downloads\\ONE\\my_example)" }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ] }, { "cell_type": "markdown", "source": [ "## Checking the cache\n", "Now the files are indexed you can use ONE search and load commands as you would with\n", "remote data. Let's search for all sessions of subject NYU-01, that have behavioral trial data,\n", "then make a plot of reaction time vs. trial number which we obtain by subtracting the time of the\n", "go cue onset from the feedback (i.e. reward or failure) time.\n", "\n", "For more information on listing, searching and loading data see [the API guides](../index.html#basic-usage)" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 6, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 98 behaviour sessions for subject \"NYU-01\"\n" ] }, { "data": { "text/plain": "
", "image/png": "\n" }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "one = One(cache_dir=cache_dir)\n", "\n", "# Searching for behaviour experiment by subject\n", "subject = 'NYU-01'\n", "eids = one.search(subject=subject, dataset=['trials.goCue_times', 'trials.feedback_times'])\n", "print(f'There are {len(eids)} behaviour sessions for subject \"{subject}\"')\n", "\n", "# Load the trials object for this session\n", "trials = one.load_object(eids.pop(), 'trials')\n", "\n", "# Make the plot\n", "plt.plot(trials.feedback_times - trials.goCue_times, '.')\n", "plt.ylabel('reaction time / s')\n", "plt.xlabel('trial #');" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Now to release the data, just zip up the directory and upload to your favourite web hosting\n", "service!\n", "\n", "
\n", "Note.\n", "\n", "If you are using an Alyx database instance, this is not the correct way to release data or generate\n", "cache files as the UUIDs will not match. Instead, generate the cache through the database\n", "`one_cache` management command. If you are using Alyx and wish to share accessed dataset UUIDs\n", "with others, see [recording data access](./recording_data_access.html).\n", "
" ], "metadata": { "collapsed": false } } ] }