{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "pycharm": { "name": "#%% md\n" } }, "source": [ "# Datasets and their types\n", "A dataset typically contains a single signal or data source, either values or times. When\n", "creating a new dataset, first familiarize yourself with the [ALF specification](../alf_intro.html)." ] }, { "cell_type": "code", "execution_count": 13, "outputs": [], "source": [ "from pprint import pprint\n", "from one.alf import spec\n", "from one.alf.files import filename_parts" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## Datasets\n", "\n", "Print information about ALF objects" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 14, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(lab/Subjects/)subject/date/number/(collection/)(#revision#/)_namespace_object.attribute_timescale.extra.extension\n", " ^^^^^^ \n", "\n", "OBJECT\n", "Every file describing a given object has the same number of rows (i.e. the 1st dimension of an npy\n", "file, number of frames in a video file, etc). You can therefore think of the files for an object\n", "as together defining a table, with column headings given by the attribute in the file names, and\n", "values given by the file contents. Object names should be in Haskell case and pluralized, e.g.\n", "\"wheelMoves\", \"sparseNoise\", \"trials\".\n", "Encoding of relations between objects can be achieved by a simplified relational model. If the\n", "attribute name of one file matches the object name of a second, then the first file is guaranteed\n", "to contain integers referring to the rows of the second. For example, \"spikes.clusters.npy\" would\n", "contain integer references to the rows of \"clusters.brain_location.json\" and \"clusters.probes.npy\";\n", "and \"clusters.probes.npy\" would contain integer references to \"probes.insertion.json\".\n", "Be careful of plurals (\"clusters.probe.npy\" would not correspond to \"probes.insertion.json\") and\n", "remember we count arrays starting from 0.\n" ] } ], "source": [ "spec.describe('object')" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Check the file name is ALF compliant" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 15, "outputs": [], "source": [ "assert spec.is_valid('spikes.times.npy')" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Safely construct an ALF dataset using the 'to_alf' function. This will ensure the correct\n", "case and format" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 16, "outputs": [], "source": [ "filename = spec.to_alf('spikes', 'times', 'npy',\n", " namespace='ibl', timescale='ephys clock', extra='raw')" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Parsing a new file into its constituent parts ensures the dataset is correct" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 17, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OrderedDict([('namespace', 'ibl'),\n", " ('object', 'spikes'),\n", " ('attribute', 'times'),\n", " ('timescale', 'ephysClock'),\n", " ('extra', 'raw'),\n", " ('extension', 'npy')])\n" ] } ], "source": [ "parts = filename_parts('_ibl_spikes.times_ephysClock.raw.npy', as_dict=True, assert_valid=True)\n", "pprint(parts)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "## Dataset types\n", "