{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Listing with ONE\n",
    "ONE contains a number of list methods that can be used to explore the datasets available."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "To list all available datasets we can use"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "from one.api import ONE\n",
    "one = ONE(base_url='https://openalyx.internationalbrainlab.org')\n",
    "\n",
    "# All datasets in the database\n",
    "dsets = one.list_datasets()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<div class=\"alert alert-info\">\n",
    "Note.\n",
    "\n",
    "Calling list_* methods with no arguments in remote mode will not hit the database\n",
    "</div>\n",
    "\n",
    "If you are connected to a database (e.g not using ONE with a local cache directory) you can find out\n",
    "information about a specific dataset by typing,"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "contrast of left-side stimulus (0...1) nan if trial is on other side\n"
     ]
    },
    {
     "data": {
      "text/plain": "{'id': '979f9f7c-7d67-48d5-9042-a9000a8e66a2',\n 'name': 'trials.contrastLeft',\n 'created_by': None,\n 'description': 'contrast of left-side stimulus (0...1) nan if trial is on other side',\n 'filename_pattern': '*trials.contrastLeft.*'}"
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "one.describe_dataset(dsets[3])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "To find the datasets associated with a specific experiment we can pass in an eid argument"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "data": {
      "text/plain": "['alf/_ibl_bodyCamera.times.npy',\n 'alf/_ibl_leftCamera.times.npy',\n 'alf/_ibl_rightCamera.times.npy',\n 'alf/_ibl_trials.choice.npy',\n 'alf/_ibl_trials.contrastLeft.npy',\n 'alf/_ibl_trials.contrastRight.npy',\n 'alf/_ibl_trials.feedbackType.npy',\n 'alf/_ibl_trials.feedback_times.npy',\n 'alf/_ibl_trials.firstMovement_times.npy',\n 'alf/_ibl_trials.goCueTrigger_times.npy',\n 'alf/_ibl_trials.goCue_times.npy',\n 'alf/_ibl_trials.intervals.npy',\n 'alf/_ibl_trials.intervals_bpod.npy',\n 'alf/_ibl_trials.itiDuration.npy',\n 'alf/_ibl_trials.probabilityLeft.npy',\n 'alf/_ibl_trials.response_times.npy',\n 'alf/_ibl_trials.rewardVolume.npy',\n 'alf/_ibl_trials.stimOff_times.npy',\n 'alf/_ibl_trials.stimOn_times.npy',\n 'alf/_ibl_wheel.position.npy',\n 'alf/_ibl_wheel.times.npy',\n 'alf/_ibl_wheel.timestamps.npy',\n 'alf/_ibl_wheelMoves.intervals.npy',\n 'alf/_ibl_wheelMoves.peakAmplitude.npy',\n 'alf/probe01/_kilosort_whitening.matrix.npy',\n 'alf/probe01/_phy_spikes_subset.channels.npy',\n 'alf/probe01/_phy_spikes_subset.spikes.npy',\n 'alf/probe01/_phy_spikes_subset.waveforms.npy',\n 'alf/probe01/channels.brainLocationIds_ccf_2017.npy',\n 'alf/probe01/channels.localCoordinates.npy',\n 'alf/probe01/channels.mlapdv.npy',\n 'alf/probe01/channels.rawInd.npy',\n 'alf/probe01/clusters.amps.npy',\n 'alf/probe01/clusters.brainLocationAcronyms_ccf_2017.npy',\n 'alf/probe01/clusters.brainLocationIds_ccf_2017.npy',\n 'alf/probe01/clusters.channels.npy',\n 'alf/probe01/clusters.depths.npy',\n 'alf/probe01/clusters.metrics.pqt',\n 'alf/probe01/clusters.mlapdv.npy',\n 'alf/probe01/clusters.peakToTrough.npy',\n 'alf/probe01/clusters.uuids.csv',\n 'alf/probe01/clusters.waveforms.npy',\n 'alf/probe01/clusters.waveformsChannels.npy',\n 'alf/probe01/spikes.amps.npy',\n 'alf/probe01/spikes.clusters.npy',\n 'alf/probe01/spikes.depths.npy',\n 'alf/probe01/spikes.samples.npy',\n 'alf/probe01/spikes.templates.npy',\n 'alf/probe01/spikes.times.npy',\n 'alf/probe01/templates.amps.npy',\n 'alf/probe01/templates.waveforms.npy',\n 'alf/probe01/templates.waveformsChannels.npy',\n 'alf/probes.description.json',\n 'alf/probes.trajectory.json',\n 'raw_behavior_data/_iblrig_ambientSensorData.raw.jsonable',\n 'raw_behavior_data/_iblrig_codeFiles.raw.zip',\n 'raw_behavior_data/_iblrig_encoderEvents.raw.ssv',\n 'raw_behavior_data/_iblrig_encoderPositions.raw.ssv',\n 'raw_behavior_data/_iblrig_encoderTrialInfo.raw.ssv',\n 'raw_behavior_data/_iblrig_taskData.raw.jsonable',\n 'raw_behavior_data/_iblrig_taskSettings.raw.json',\n 'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.cbin',\n 'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.ch',\n 'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.meta',\n 'raw_ephys_data/_spikeglx_ephysData_g0_t0.nidq.wiring.json',\n 'raw_ephys_data/_spikeglx_sync.channels.npy',\n 'raw_ephys_data/_spikeglx_sync.polarities.npy',\n 'raw_ephys_data/_spikeglx_sync.times.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityAP.freqs.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityAP.power.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityLF.freqs.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysSpectralDensityLF.power.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.rms.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.timestamps.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.rms.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.timestamps.npy',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.cbin',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.ch',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.ap.meta',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.cbin',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.ch',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.lf.meta',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.sync.npy',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.timestamps.npy',\n 'raw_ephys_data/probe01/_spikeglx_ephysData_g0_t0.imec1.wiring.json',\n 'raw_ephys_data/probe01/_spikeglx_sync.channels.probe01.npy',\n 'raw_ephys_data/probe01/_spikeglx_sync.polarities.probe01.npy',\n 'raw_ephys_data/probe01/_spikeglx_sync.times.probe01.npy',\n 'raw_video_data/_iblrig_bodyCamera.raw.mp4',\n 'raw_video_data/_iblrig_bodyCamera.timestamps.ssv',\n 'raw_video_data/_iblrig_leftCamera.raw.mp4',\n 'raw_video_data/_iblrig_leftCamera.timestamps.ssv',\n 'raw_video_data/_iblrig_rightCamera.raw.mp4',\n 'raw_video_data/_iblrig_rightCamera.timestamps.ssv',\n 'spike_sorters/ks2_matlab/probe01/_kilosort_raw.output.tar']"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eid = 'KS023/2019-12-10/001'\n",
    "\n",
    "# All datasets for specific session\n",
    "one.list_datasets(eid)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "We can also list the available collections by using,"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [
    {
     "data": {
      "text/plain": "['raw_ephys_data/probe01',\n 'alf',\n 'alf/probe01',\n 'raw_video_data',\n 'raw_behavior_data',\n 'raw_ephys_data',\n 'spike_sorters/ks2_matlab/probe01']"
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# All collections in database\n",
    "collections = one.list_collections()\n",
    "\n",
    "# All collections for specific session\n",
    "one.list_collections(eid)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Revisions can be listed in a similar way,"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [],
   "source": [
    "# All revisions in database\n",
    "revisions = one.list_revisions()\n",
    "\n",
    "# All revisions for specific session\n",
    "revisions = one.list_revisions(eid)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The final useful list method allows you to search for subjects in the database,"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [],
   "source": [
    "# All subjects in the database\n",
    "subjects = one.list_subjects()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "For more examples of file organization, including the use of dataset revisions, see [this guide](../alyx_files).\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Filtering lists\n",
    "`list_collections`, `list_revisions` and `list_datasets` can be called with various filter\n",
    "arguments. Collections and datasets may be filtered using wildcards, or regular expressions when\n",
    "the wildcard flag is set to False.  For more information on using wildcards,\n",
    "see the [ONE load guide](../one_load/one_load.html#Advanced-loading).\n",
    "\n",
    "Below are some examples of filtering with wildcards:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [],
   "source": [
    "# All datasets for specific session in alf/probe01 collection:\n",
    "datasets = one.list_datasets(eid, collection='alf/probe01')\n",
    "\n",
    "# All datasets for a specific session in any probe collection:\n",
    "datasets = one.list_datasets(eid, collection='*probe*')\n",
    "\n",
    "# All collections that contain datasets with 'spikes' in the name:\n",
    "collections = one.list_collections(eid, filename='*spikes*')\n",
    "\n",
    "# All datasets with 'raw' in the name:\n",
    "datasets = one.list_datasets(eid, '*raw*')\n",
    "\n",
    "# All datasets with a QC value less than or equal to 'WARNING' (i.e. includes 'PASS', 'NOT_SET' also):\n",
    "datasets = one.list_datasets(eid, qc='WARNING')\n",
    "\n",
    "# All QC'd datasets with a value less than or equal to 'WARNING' (i.e. 'WARNING' or 'PASS'):\n",
    "datasets = one.list_datasets(eid, qc='WARNING', ignore_qc_not_set=True)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Note that for `list_datasets` and `list_collections` a provided revision name can't include\n",
    "wildcards.\n",
    "\n",
    "For `list_datasets`, the resulting list will include only the datasets in the given\n",
    "revision, or the previous revision (alphabetically) if the provided revision doesn't exist.\n",
    "\n",
    "For `list_collections`, the resulting list will include only the collections containing the given\n",
    "revision, or the previous revision (alphabetically) if the provided revision doesn't exist.\n",
    "\n",
    "For examples, to list collections that start with the word 'raw' and contain revisions on or before\n",
    "'2020-01-01'"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [],
   "source": [
    "collections = one.list_collections(eid, collection='raw*', revision='2020-01-01')"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "For `list_revisions`, the provided revision keyword arg works like the collections arg and may\n",
    "include wildcards.\n",
    "\n",
    "For example, to list revisions for a given session that begin with '2020' or '2021':"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [],
   "source": [
    "revisions = one.list_revisions(eid, revision='202[01]')"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The dataset and collection filters may be a list of strings, constituting a logical OR.  For\n",
    "example to list datasets containing either 'spikes' or 'clusters':"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [
    {
     "data": {
      "text/plain": "['alf/probe01/_phy_spikes_subset.channels.npy',\n 'alf/probe01/_phy_spikes_subset.spikes.npy',\n 'alf/probe01/_phy_spikes_subset.waveforms.npy',\n 'alf/probe01/clusters.amps.npy',\n 'alf/probe01/clusters.brainLocationAcronyms_ccf_2017.npy',\n 'alf/probe01/clusters.brainLocationIds_ccf_2017.npy',\n 'alf/probe01/clusters.channels.npy',\n 'alf/probe01/clusters.depths.npy',\n 'alf/probe01/clusters.metrics.pqt',\n 'alf/probe01/clusters.mlapdv.npy',\n 'alf/probe01/clusters.peakToTrough.npy',\n 'alf/probe01/clusters.uuids.csv',\n 'alf/probe01/clusters.waveforms.npy',\n 'alf/probe01/clusters.waveformsChannels.npy',\n 'alf/probe01/spikes.amps.npy',\n 'alf/probe01/spikes.clusters.npy',\n 'alf/probe01/spikes.depths.npy',\n 'alf/probe01/spikes.samples.npy',\n 'alf/probe01/spikes.templates.npy',\n 'alf/probe01/spikes.times.npy']"
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "one.list_datasets(eid, ['*spikes*', '*clusters*'])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Dataset/filename filters can either be a string or a dict of ALF parts, each containing either\n",
    " a string or list of strings.  This allows very specific part matching.  For example, to filter\n",
    " datasets that have either the 'intervals' or 'timestamps' attributes, and are npy files:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [
    {
     "data": {
      "text/plain": "['alf/_ibl_trials.intervals.npy',\n 'alf/_ibl_trials.intervals_bpod.npy',\n 'alf/_ibl_wheel.timestamps.npy',\n 'alf/_ibl_wheelMoves.intervals.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsAP.timestamps.npy',\n 'raw_ephys_data/probe01/_iblqc_ephysTimeRmsLF.timestamps.npy']"
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "one.list_datasets(eid, filename={'attribute': ['timestamps', 'intervals'], 'extension': 'npy'})"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Combining with load methods\n",
    "The list methods are useful in combination with the load methods.  For example, the output of\n",
    "the `list_datasets` method can be a direct input of the `load_datasets` method. Here we load all\n",
    "spike and cluster datasets where the QC is either PASS or NOT_SET:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3/3.0 [00:02<00:00,  1.02it/s]\n"
     ]
    }
   ],
   "source": [
    "datasets = one.list_datasets(eid, ['*spikes*', '*clusters*'], qc='PASS', ignore_qc_not_set=False)\n",
    "data, records = one.load_datasets(eid, datasets)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Likewise with collections, for example to load all data within the 'alf/probe*' collections:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 3/3.0 [00:03<00:00,  1.09s/it]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['alf/probe01'])\n",
      "dict_keys(['spikes', 'channels', 'whitening', 'clusters', 'templates', 'spikes_subset'])\n"
     ]
    }
   ],
   "source": [
    "collections = one.list_collections(eid, collection='alf/probe*')\n",
    "# Build a dictionary of collections containing bunches of objects\n",
    "data = {key: one.load_collection(eid, key) for key in collections}\n",
    "\n",
    "print(data.keys())\n",
    "print(data['alf/probe01'].keys())"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Here we load the spike data for a collection that also contains cluster data"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "collections = one.list_collections(eid, filename='clusters*')\n",
    "spikes = one.load_object(eid, 'spikes', collection=collections[0])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Listing aggregate datasets\n",
    "All raw and preprocessed data are stored at the session level, however some datasets are aggregated\n",
    "over a subject, project, or tag (called a 'relation'). Such datasets are known as aggregates and\n",
    "can be listed and filtered using the `list_aggregates` method. Unlike `list_datasets`, `list_aggregates`\n",
    "always returns a pandas DataFrame object.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "Note.\n",
    "\n",
    "NB: This method is only available in 'remote' mode.\n",
    "</div>"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "subject = 'SWC_043'\n",
    "subject_aggregates = one.list_aggregates('subjects', subject)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}