Datajoint Introductory Tutorial

In this tutorial we will use datajoint to replicate the analysis we conducted in the ONE tutorial

This tutorial assumes that you have setup the unified ibl environment IBL python environment and set up your Datajoint credentials.

First let’s import datajoint

[1]:
import datajoint as dj
# for the purposes of tutorial limit the table print output to 5
dj.config['display.limit'] = 5

We can access datajoint tables by importing schemas from the IBL-pipeline. Let’s import the subject schema

[2]:
from ibl_pipeline import subject
Connecting mayofaulkner@datajoint.internationalbrainlab.org:3306

Within this schema there is a datajoint table called Subject. This holds all the information about subjects registered on Alyx under IBL projects. Let’s access this table and look at the first couple of entries

[3]:
subjects = subject.Subject()
subjects
Out[3]:

subject_uuid

subject_nickname

nickname

sex

sex

subject_birth_date

birth date

ear_mark

ear mark

subject_line

name

subject_source

name of source

protocol_number

protocol number

subject_description

subject_ts

subject_strain

0026c82d-39e4-4c6b-acb3-303eb4b24f05 IBL_32 M 2018-04-23 None C57BL/6NCrl Charles River 1 None 2019-08-06 21:30:42 C57BL/6N
00778394-c956-408d-8a6c-ca3b05a611d5 KS019 F 2019-04-30 None C57BL/6J None 2 None 2019-08-13 17:07:33 C57BL/6J
00c60db3-74c3-4ee2-9df9-2c84acf84e92 ibl_witten_10 F 2018-11-13 notag C57BL/6J Jax 3 None 2019-09-25 01:33:25 C57BL/6J
0124f697-16ce-4f59-b87c-e53fcb3a27ac 6867 M 2018-06-25 None Thy1-GCaMP6s CCU - Margarida colonies 1 None 2019-09-25 01:33:25 C57BL/6J
019a22c1-b944-4494-9e38-0e45ae6697bf SWC_022 M 2019-06-18 NA (Front HP) C57BL/6J Charles River 4 ID: 990762 2019-09-25 01:33:25 C57BL/6J

...

Total: 856

Next, we will find the entry in the table for the same subject that we looked at in the ONE tutorial, KS022. To do this we will restrict the Subject table by the subject nickname

[4]:
subjects & 'subject_nickname = "KS022"'
Out[4]:

subject_uuid

subject_nickname

nickname

sex

sex

subject_birth_date

birth date

ear_mark

ear mark

subject_line

name

subject_source

name of source

protocol_number

protocol number

subject_description

subject_ts

subject_strain

b57c1934-f9d1-4dc4-a474-e2cb4acdf918 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J

Total: 1

We now want to find information about the behavioural sessions. This information is stored in a table Session defined in the acquisition schema. Let’s import this schema, access the table and display the first few entries

[5]:
from ibl_pipeline import acquisition
sessions = acquisition.Session()
sessions
Out[5]:

subject_uuid

session_start_time

start time

session_uuid

session_number

number

session_end_time

end time

session_lab

name of lab

session_location

name of the location

task_protocol

session_type

type

session_narrative

session_ts

00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-07 10:49:41 0955e23e-90cf-4250-a092-82fabdf67a24 1 2019-08-07 11:23:14 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_habituationChoiceWorld5.2.5 Experiment None 2019-08-13 17:26:00
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-08 15:32:40 996167fa-07bc-4afe-b7b9-4aee0ba75c1f 1 2019-08-08 16:19:23 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_habituationChoiceWorld5.2.5 Experiment None 2019-08-13 17:26:01
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-09 10:37:49 13e38e6c-0cd6-41f9-a2ab-8d6021fea035 1 2019-08-09 11:40:12 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_habituationChoiceWorld5.2.7 Experiment None 2019-08-13 17:26:00
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-10 11:24:59 fb9bdf18-76be-452b-ac4e-21d5de3a6f9f 1 2019-08-10 12:11:04 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.2.7 Experiment None 2019-08-13 17:26:06
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-12 09:21:03 d47e9a4c-18dc-4d4d-991c-d30059ec2cbd 1 2019-08-12 10:07:19 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.2.7 Experiment None 2019-08-14 12:37:53

...

Total: 26634

If we look at the primary keys (columns with black headings) in the Subjects and Sessions table, we will notice that both contain subject_uuid as a primary key. This means that these two tables can be joined using *****.

We want to find information about all the sessions that KS022 did in the training phase of the IBL training pipeline. When combining the tables we will therefore restrict the Subject table by the subject nickname (as we did before) and the Sessions table by the task protocol

[6]:
(subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')
Out[6]:

subject_uuid

session_start_time

start time

subject_nickname

nickname

sex

sex

subject_birth_date

birth date

ear_mark

ear mark

subject_line

name

subject_source

name of source

protocol_number

protocol number

subject_description

subject_ts

subject_strain

session_uuid

session_number

number

session_end_time

end time

session_lab

name of lab

session_location

name of the location

task_protocol

session_type

type

session_narrative

session_ts

b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-23 13:43:40 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J 242ed7aa-6e02-4d55-b003-874b71357074 1 2019-09-23 14:28:22 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.3.0 Experiment None 2019-09-24 04:51:41
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-24 10:41:18 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J be7f7832-006f-4f79-8079-6dea549c90c0 1 2019-09-24 11:27:11 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.3.0 Experiment None 2019-09-25 04:51:58
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-25 09:37:26 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J c4f55950-26ac-4cb6-be38-57d9e604f5fc 1 2019-09-25 10:23:20 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.3.0 Experiment None 2019-09-26 04:53:06
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-26 14:43:24 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J 39a0993b-ea91-4d7d-aa8b-84365ae77a81 1 2019-09-26 15:31:48 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.3.0 Experiment None 2019-09-27 04:53:22
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-27 10:21:16 KS022 M 2019-06-25 None C57BL/6J None 3 None 2019-09-20 04:28:14 C57BL/6J f1eb053b-3cc2-45df-bbd5-2f5f54f05d23 1 2019-09-27 11:06:09 cortexlab _iblrig_cortexlab_behavior_3 _iblrig_tasks_trainingChoiceWorld5.3.0 Experiment None 2019-09-28 04:53:37

...

Total: 19

There is a lot of information in this table and we are not interested in all of it for the purposes of our analysis. Let’s just use the proj method to restrict the data presented. We do not want any columns from the Subject table (apart from the primary keys) and only want session_uuid from the Sesssions table. So we can write,

[7]:
((subjects & 'subject_nickname = "KS022"').proj() *
 (sessions & 'task_protocol LIKE "%training%"')).proj('session_uuid')
Out[7]:

subject_uuid

session_start_time

start time

session_uuid

b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-23 13:43:40 242ed7aa-6e02-4d55-b003-874b71357074
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-24 10:41:18 be7f7832-006f-4f79-8079-6dea549c90c0
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-25 09:37:26 c4f55950-26ac-4cb6-be38-57d9e604f5fc
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-26 14:43:24 39a0993b-ea91-4d7d-aa8b-84365ae77a81
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-27 10:21:16 f1eb053b-3cc2-45df-bbd5-2f5f54f05d23

...

Total: 19

Note

In the above expression we first used proj and then joined the tables using ** * . We could have also joined the tables first and then usedproj**,

 ((subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')
).proj('session_uuid')

If we look back to the ONE tutorial you will notice that we have the same number of training sessions and that session_uuid corresponds to what we previously defined as the eID

Up until now we have been inspecting the content of the tables but do not actually have access to the content. This is because we have not actually read the data from the tables into memory yet. For this we would need to use the fetch command. Let’s fetch the session uuid information into a pandas dataframe

[8]:
eids = ((subjects & 'subject_nickname = "KS022"').proj() *
        (sessions & 'task_protocol LIKE "%training%"').proj('session_uuid')).fetch(format='frame')
[9]:
eids
Out[9]:
session_uuid
subject_uuid session_start_time
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-23 13:43:40 242ed7aa-6e02-4d55-b003-874b71357074
2019-09-24 10:41:18 be7f7832-006f-4f79-8079-6dea549c90c0
2019-09-25 09:37:26 c4f55950-26ac-4cb6-be38-57d9e604f5fc
2019-09-26 14:43:24 39a0993b-ea91-4d7d-aa8b-84365ae77a81
2019-09-27 10:21:16 f1eb053b-3cc2-45df-bbd5-2f5f54f05d23
2019-10-01 11:01:44 a07e255f-cf31-47aa-9667-5c62b93443e0
2019-10-02 10:02:36 fc7b7ea7-55c4-48fd-bb7b-2422c741fe8e
2019-10-03 14:15:32 22f33af3-5e77-4517-94b9-2cfefea27a10
2019-10-04 10:52:19 7e508dce-e08c-4481-b634-1a9358673aa5
2019-10-07 10:59:27 548ca39d-8898-460c-a930-785083b4127d
2019-10-08 11:19:34 3f815a05-a114-4a16-96ef-b398bd5a0b88
2019-10-10 11:00:46 f6c9daee-3e1e-4a5d-b372-7be8fcda5512
2019-10-11 11:01:47 2f7e07d4-d713-46a1-8bce-6b497af405fc
2019-10-14 11:30:31 228b29b2-fb77-40c9-bd32-953fd5297896
2019-10-15 10:32:56 9ee8bd6b-1e5e-4440-bc2e-7c001abe7b60
2019-10-17 11:32:30 777ecfb9-616f-4ff9-a8d9-eb38f897a19f
2019-10-18 11:52:53 4e626e22-747e-4324-96f1-9827b7ca950d
2019-10-21 11:44:49 dd4ad209-fcff-4252-9e38-44cba17f7603
2019-10-22 11:48:05 b4ac5904-5693-4d98-adb3-362e612836be

Now that we have access to our list of session eIDs, we can get trial information associated with these sessions. The output from the trials dataset is stored in a table called PyschResults. We can import this from the analyses.behaviour schema

[10]:
from ibl_pipeline.analyses import behavior
trials = behavior.PsychResults()
trials
Connected to https://alyx.internationalbrainlab.org as mayo
Out[10]:

subject_uuid

session_start_time

start time

performance

percentage correct in this session

performance_easy

percentage correct of easy trials in this session

signed_contrasts

contrasts used in this session, negative when on the left

n_trials_stim

number of trials for each contrast

n_trials_stim_right

number of reporting "right" trials for each contrast

prob_choose_right

probability of choosing right, same size as contrasts

threshold

bias

lapse_low

lapse_high

00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-10 11:24:59 0.367347 0.367347 =BLOB= =BLOB= =BLOB= =BLOB= 29.7063 -58.2102 0.823603 0.409092
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-12 09:21:03 0.4 0.4 =BLOB= =BLOB= =BLOB= =BLOB= 21.0607 -58.2391 0.609229 0.181818
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-13 10:28:45 0.408072 0.408072 =BLOB= =BLOB= =BLOB= =BLOB= 9.4945 -0.13251 0.684564 0.364865
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-14 09:37:17 0.4 0.4 =BLOB= =BLOB= =BLOB= =BLOB= 3.28746 -67.9662 0.772727 0.454545
00778394-c956-408d-8a6c-ca3b05a611d5 2019-08-14 11:35:16 0.463668 0.463668 =BLOB= =BLOB= =BLOB= =BLOB= 10.4665 99.3649 0.72314 0.0373897

...

Total: 23408

Let’s get the trial information for the first day KS022 trained. We will restrict the Sessions table by the day 1 eID and combine this with the trials table

[11]:
eid_day1 = dict(session_uuid=eids['session_uuid'][0])
trials_day1 = ((sessions & eid_day1).proj() * trials).fetch(format='frame')
trials_day1
Out[11]:
performance performance_easy signed_contrasts n_trials_stim n_trials_stim_right prob_choose_right threshold bias lapse_low lapse_high
subject_uuid session_start_time
b57c1934-f9d1-4dc4-a474-e2cb4acdf918 2019-09-23 13:43:40 0.515432 0.515432 [-1.0, -0.5, 0.5, 1.0] [81, 90, 73, 80] [41, 51, 50, 38] [0.5061728395061729, 0.5666666666666667, 0.684... 7.23087 86.8559 0.581967 0.525546

Notice how a lot of the metrics that we manually computed from the trials dataset in the previous ONE tutorial have already been computed for us and are available in the trials table. This is one advantage of Datajoint, common computations such as performance can be computed when the data is ingested and stored in tables.

We can find out which visual stimulus contrasts were presented to KS022 on day 1 and how many of each contrast by looking at the signed_contrasts and n_trials_stim attributes

[12]:
contrasts = trials_day1['signed_contrasts'].to_numpy()[0]
n_contrasts = trials_day1['n_trials_stim'].to_numpy()[0]
print(f"Visual stimulus contrasts on day 1 = {contrasts * 100}")
print(f"No. of each contrast on day 1 = {n_contrasts}")
Visual stimulus contrasts on day 1 = [-100.  -50.   50.  100.]
No. of each contrast on day 1 = [81 90 73 80]

We can easily extract the performance by typing

[13]:
print(f"Correct = { trials_day1['performance'].to_numpy()[0] * 100} %")
Correct = 51.5432 %

We can plot the peformance at each contrast by looking at the prob_choose_right attribute. The results stored in Datajoint are already expressed in terms of rightward choice, so we don’t need worry about converting any computations

[14]:
contrast_performance = trials_day1['prob_choose_right'].to_numpy()[0]

import matplotlib.pyplot as plt
plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)
plt.ylim([0, 100])
plt.xticks([*(contrasts * 100)])
plt.xlabel('Signed contrast (%)')
plt.ylabel('Rightward choice (%)')

print(contrast_performance)
[0.50617284 0.56666667 0.68493151 0.475     ]
../../_images/notebooks_dj_intro_dj_intro_28_1.png

Let’s now repeat this for day 15 of training

[15]:
eid_day15 = dict(session_uuid=eids['session_uuid'][14])
trials_day15 = ((sessions & eid_day15).proj() * trials).fetch(format='frame')
contrasts = trials_day15['signed_contrasts'].to_numpy()[0]
n_contrasts = trials_day15['n_trials_stim'].to_numpy()[0]
print(f"Visual stimulus contrasts on day 1 = {contrasts * 100}")
print(f"No. of each contrast on day 1 = {n_contrasts}")
Visual stimulus contrasts on day 1 = [-100.   -50.   -25.   -12.5   12.5   25.    50.   100. ]
No. of each contrast on day 1 = [ 94  65 107  16  21  75  78  64]
[16]:
print(f"Correct = { trials_day15['performance'].to_numpy()[0] * 100} %")
Correct = 88.2692 %
[17]:
contrast_performance = trials_day15['prob_choose_right'].to_numpy()[0]

plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)
plt.ylim([0, 100])
plt.xticks([*(contrasts * 100)])
plt.xlabel('Signed contrast (%)')
plt.ylabel('Rightward choice (%)')
plt.xticks(rotation=90)

print(contrast_performance)
[0.09574468 0.12307692 0.22429907 0.1875     1.         0.93333333
 0.91025641 0.921875  ]
../../_images/notebooks_dj_intro_dj_intro_32_1.png

If we compare the results with the ONE tutorial, we will find that we have replicated those results using Datajoint, congratulations!

You should now be comfortable with the basics of exploring the Datajoint IBL pipeline. More Datajoint tutorials can be found in the IBL-Pipeline github or on the Datajoint jupyter.