ibllib.oneibl.data_handlers

Downloading of task dependent datasets and registration of task output datasets.

The DataHandler class is used by the pipes.tasks.Task class to ensure dependent datasets are present and to register and upload the output datasets. For examples on how to run a task using specific data handlers, see ibllib.pipes.tasks().

Classes

DataHandler

LocalDataHandler

RemoteAwsDataHandler

RemoteGlobusDataHandler

Data handler for running tasks on remote compute node.

RemoteHttpDataHandler

SDSCDataHandler

Data handler for running tasks on SDSC compute node

ServerDataHandler

ServerGlobusDataHandler

class DataHandler(session_path, signature, one=None)[source]

Bases: ABC

setUp()[source]

Function to optionally overload to download required data to run task.

getData(one=None)[source]

Finds the datasets required for task based on input signatures.

getOutputFiles()[source]
uploadData(outputs, version)[source]

Function to optionally overload to upload and register data

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

cleanUp()[source]

Function to optionally overload to clean up files after running task.

class LocalDataHandler(session_path, signatures, one=None)[source]

Bases: DataHandler

class ServerDataHandler(session_path, signatures, one=None)[source]

Bases: DataHandler

uploadData(outputs, version, clobber=False, **kwargs)[source]

Upload and/or register output data.

This is typically called by ibllib.pipes.tasks.Task.register_datasets().

Parameters:
  • outputs (list of pathlib.Path) – A set of ALF paths to register to Alyx.

  • version (str, list of str) – The version of ibllib used to generate these output files.

  • clobber (bool) – If True, re-upload outputs that have already been passed to this method.

  • kwargs – Optional keyword arguments for one.registration.RegistrationClient.register_files.

Returns:

A list of newly created Alyx dataset records or the registration data if dry.

Return type:

list of dicts, dict

cleanUp()[source]

Empties and returns the processed dataset mep.

class ServerGlobusDataHandler(session_path, signatures, one=None)[source]

Bases: DataHandler

setUp()[source]

Function to download necessary data to run tasks using globus-sdk.

uploadData(outputs, version, **kwargs)[source]

Function to upload and register data of completed task

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

output info of registered datasets

cleanUp()[source]

Clean up, remove the files that were downloaded from Globus once task has completed.

class RemoteHttpDataHandler(session_path, signature, one=None)[source]

Bases: DataHandler

setUp()[source]

Function to download necessary data to run tasks using ONE :return:

uploadData(outputs, version, **kwargs)[source]

Function to upload and register data of completed task via FTP patcher

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

output info of registered datasets

class RemoteAwsDataHandler(task, session_path, signature, one=None)[source]

Bases: DataHandler

setUp()[source]

Function to download necessary data to run tasks using AWS boto3.

uploadData(outputs, version, **kwargs)[source]

Function to upload and register data of completed task via FTP patcher

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

output info of registered datasets

cleanUp()[source]

Clean up, remove the files that were downloaded from globus once task has completed.

class RemoteGlobusDataHandler(session_path, signature, one=None)[source]

Bases: DataHandler

Data handler for running tasks on remote compute node. Will download missing data using Globus.

Parameters:
  • session_path – path to session

  • signature – input and output file signatures

  • one – ONE instance

setUp()[source]

Function to download necessary data to run tasks using globus.

uploadData(outputs, version, **kwargs)[source]

Function to upload and register data of completed task via FTP patcher

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

output info of registered datasets

class SDSCDataHandler(task, session_path, signatures, one=None)[source]

Bases: DataHandler

Data handler for running tasks on SDSC compute node

Parameters:
  • session_path – path to session

  • signature – input and output file signatures

  • one – ONE instance

setUp()[source]

Function to create symlinks to necessary data to run tasks.

uploadData(outputs, version, **kwargs)[source]

Function to upload and register data of completed task via SDSC patcher

Parameters:
  • outputs – output files from task to register

  • version – ibllib version

Returns:

output info of registered datasets

cleanUp()[source]

Function to clean up symlinks created to run task.