Datajoint Introductory Tutorial

In this tutorial we will use datajoint to create a simple psychometric behavior plot.

This tutorial assumes that you have setup the unified ibl environment IBL python environment and set up your Datajoint credentials.

First let’s import datajoint

[1]:

import datajoint as dj
# for the purposes of tutorial limit the table print output to 5
dj.config['display.limit'] = 5

We can access datajoint tables by importing schemas from the IBL-pipeline. Let’s import the subject schema

[2]:

from ibl_pipeline import subject

Connecting mayofaulkner@datajoint.internationalbrainlab.org:3306

Within this schema there is a datajoint table called Subject. This holds all the information about subjects registered on Alyx under IBL projects. Let’s access this table and look at the first couple of entries

[3]:

subjects = subject.Subject()
subjects

Out[3]:

subject_uuid	subject_nickname nickname	sex sex	subject_birth_date birth date	ear_mark ear mark	subject_line name	subject_source name of source	protocol_number protocol number	subject_description	subject_ts	subject_strain
0026c82d-39e4-4c6b-acb3-303eb4b24f05	IBL_32	M	2018-04-23	None	C57BL/6NCrl	Charles River	1	None	2019-08-06 21:30:42	C57BL/6N
00778394-c956-408d-8a6c-ca3b05a611d5	KS019	F	2019-04-30	None	C57BL/6J	None	2	None	2019-08-13 17:07:33	C57BL/6J
00c60db3-74c3-4ee2-9df9-2c84acf84e92	ibl_witten_10	F	2018-11-13	notag	C57BL/6J	Jax	3	None	2019-09-25 01:33:25	C57BL/6J
0124f697-16ce-4f59-b87c-e53fcb3a27ac	6867	M	2018-06-25	None	Thy1-GCaMP6s	CCU - Margarida colonies	1	None	2019-09-25 01:33:25	C57BL/6J
019444e5-2192-4953-855d-084226fb965a	dop_48	M	2021-11-23	709	Dat-Cre	Princeton	2	None	2022-04-15 03:37:28	C57BL/6J

...

Total: 1318

Next, we will find the entry in the table for the same subject that we looked at in the ONE tutorial, KS022. To do this we will restrict the Subject table by the subject nickname

[4]:

subjects & 'subject_nickname = "KS022"'

Out[4]:

subject_uuid	subject_nickname nickname	sex sex	subject_birth_date birth date	ear_mark ear mark	subject_line name	subject_source name of source	protocol_number protocol number	subject_description	subject_ts	subject_strain
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J

Total: 1

We now want to find information about the behavioural sessions. This information is stored in a table Session defined in the acquisition schema. Let’s import this schema, access the table and display the first few entries

[5]:

from ibl_pipeline import acquisition
sessions = acquisition.Session()
sessions

Out[5]:

subject_uuid	session_start_time start time	session_uuid	session_number number	session_end_time end time	session_lab name of lab	session_location name of the location	task_protocol	session_type type	session_narrative	session_ts
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-07 10:49:41	0955e23e-90cf-4250-a092-82fabdf67a24	1	2019-08-07 11:23:14	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_habituationChoiceWorld5.2.5	Experiment	None	2019-08-13 17:26:00
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-08 15:32:40	996167fa-07bc-4afe-b7b9-4aee0ba75c1f	1	2019-08-08 16:19:23	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_habituationChoiceWorld5.2.5	Experiment	None	2019-08-13 17:26:01
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-09 10:37:49	13e38e6c-0cd6-41f9-a2ab-8d6021fea035	1	2019-08-09 11:40:12	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_habituationChoiceWorld5.2.7	Experiment	None	2019-08-13 17:26:00
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-10 11:24:59	fb9bdf18-76be-452b-ac4e-21d5de3a6f9f	1	2019-08-10 12:11:04	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.2.7	Experiment	None	2019-08-13 17:26:06
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-12 09:21:03	d47e9a4c-18dc-4d4d-991c-d30059ec2cbd	1	2019-08-12 10:07:19	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.2.7	Experiment	None	2019-08-14 12:37:53

...

Total: 51591

If we look at the primary keys (columns with black headings) in the Subjects and Sessions table, we will notice that both contain subject_uuid as a primary key. This means that these two tables can be joined using *****.

We want to find information about all the sessions that KS022 did in the training phase of the IBL training pipeline. When combining the tables we will therefore restrict the Subject table by the subject nickname (as we did before) and the Sessions table by the task protocol

[6]:

(subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')

Out[6]:

subject_uuid	session_start_time start time	subject_nickname nickname	sex sex	subject_birth_date birth date	ear_mark ear mark	subject_line name	subject_source name of source	protocol_number protocol number	subject_description	subject_ts	subject_strain	session_uuid	session_number number	session_end_time end time	session_lab name of lab	session_location name of the location	task_protocol	session_type type	session_narrative	session_ts
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-23 13:43:40	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J	242ed7aa-6e02-4d55-b003-874b71357074	1	2019-09-23 14:28:22	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.3.0	Experiment	None	2019-09-24 04:51:41
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-24 10:41:18	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J	be7f7832-006f-4f79-8079-6dea549c90c0	1	2019-09-24 11:27:11	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.3.0	Experiment	None	2019-09-25 04:51:58
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-25 09:37:26	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J	c4f55950-26ac-4cb6-be38-57d9e604f5fc	1	2019-09-25 10:23:20	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.3.0	Experiment	None	2019-09-26 04:53:06
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-26 14:43:24	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J	39a0993b-ea91-4d7d-aa8b-84365ae77a81	1	2019-09-26 15:31:48	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.3.0	Experiment	None	2019-09-27 04:53:22
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-27 10:21:16	KS022	M	2019-06-25	None	C57BL/6J	None	3	None	2019-09-20 04:28:14	C57BL/6J	f1eb053b-3cc2-45df-bbd5-2f5f54f05d23	1	2019-09-27 11:06:09	cortexlab	_iblrig_cortexlab_behavior_3	_iblrig_tasks_trainingChoiceWorld5.3.0	Experiment	None	2019-09-28 04:53:37

...

Total: 19

There is a lot of information in this table and we are not interested in all of it for the purposes of our analysis. Let’s just use the proj method to restrict the data presented. We do not want any columns from the Subject table (apart from the primary keys) and only want session_uuid from the Sesssions table. So we can write,

[7]:

((subjects & 'subject_nickname = "KS022"').proj() *
 (sessions & 'task_protocol LIKE "%training%"')).proj('session_uuid')

Out[7]:

subject_uuid	session_start_time start time	session_uuid
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-23 13:43:40	242ed7aa-6e02-4d55-b003-874b71357074
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-24 10:41:18	be7f7832-006f-4f79-8079-6dea549c90c0
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-25 09:37:26	c4f55950-26ac-4cb6-be38-57d9e604f5fc
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-26 14:43:24	39a0993b-ea91-4d7d-aa8b-84365ae77a81
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-27 10:21:16	f1eb053b-3cc2-45df-bbd5-2f5f54f05d23

...

Total: 19

Note

In the above expression we first used proj and then joined the tables using ** * . We could have also joined the tables first and then usedproj**,

 ((subjects & 'subject_nickname = "KS022"') * (sessions & 'task_protocol LIKE "%training%"')
).proj('session_uuid')

Note that the session_uuid corresponds to the eID string in ONE.

Up until now we have been inspecting the content of the tables but do not actually have access to the content. This is because we have not actually read the data from the tables into memory yet. For this we would need to use the fetch command. Let’s fetch the session uuid information into a pandas dataframe

[8]:

eids = ((subjects & 'subject_nickname = "KS022"').proj() *
        (sessions & 'task_protocol LIKE "%training%"').proj('session_uuid')).fetch(format='frame')

[9]:

eids

Out[9]:

		session_uuid
subject_uuid	session_start_time
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-23 13:43:40	242ed7aa-6e02-4d55-b003-874b71357074
	2019-09-24 10:41:18	be7f7832-006f-4f79-8079-6dea549c90c0
	2019-09-25 09:37:26	c4f55950-26ac-4cb6-be38-57d9e604f5fc
	2019-09-26 14:43:24	39a0993b-ea91-4d7d-aa8b-84365ae77a81
	2019-09-27 10:21:16	f1eb053b-3cc2-45df-bbd5-2f5f54f05d23
	2019-10-01 11:01:44	a07e255f-cf31-47aa-9667-5c62b93443e0
	2019-10-02 10:02:36	fc7b7ea7-55c4-48fd-bb7b-2422c741fe8e
	2019-10-03 14:15:32	22f33af3-5e77-4517-94b9-2cfefea27a10
	2019-10-04 10:52:19	7e508dce-e08c-4481-b634-1a9358673aa5
	2019-10-07 10:59:27	548ca39d-8898-460c-a930-785083b4127d
	2019-10-08 11:19:34	3f815a05-a114-4a16-96ef-b398bd5a0b88
	2019-10-10 11:00:46	f6c9daee-3e1e-4a5d-b372-7be8fcda5512
	2019-10-11 11:01:47	2f7e07d4-d713-46a1-8bce-6b497af405fc
	2019-10-14 11:30:31	228b29b2-fb77-40c9-bd32-953fd5297896
	2019-10-15 10:32:56	9ee8bd6b-1e5e-4440-bc2e-7c001abe7b60
	2019-10-17 11:32:30	777ecfb9-616f-4ff9-a8d9-eb38f897a19f
	2019-10-18 11:52:53	4e626e22-747e-4324-96f1-9827b7ca950d
	2019-10-21 11:44:49	dd4ad209-fcff-4252-9e38-44cba17f7603
	2019-10-22 11:48:05	b4ac5904-5693-4d98-adb3-362e612836be

Now that we have access to our list of session eIDs, we can get trial information associated with these sessions. The output from the trials dataset is stored in a table called PyschResults. We can import this from the analyses.behaviour schema

[10]:

from ibl_pipeline.analyses import behavior
trials = behavior.PsychResults()
trials

C:\Users\Mayo\iblenv\IBL-pipeline\ibl_pipeline\behavior.py:14: UserWarning: ONE not installed, cannot use populate
  warnings.warn('ONE not installed, cannot use populate')

Out[10]:

subject_uuid	session_start_time start time	performance percentage correct in this session	performance_easy percentage correct of easy trials in this session	signed_contrasts contrasts used in this session, negative when on the left	n_trials_stim number of trials for each contrast	n_trials_stim_right number of reporting "right" trials for each contrast	prob_choose_right probability of choosing right, same size as contrasts	threshold	bias	lapse_low	lapse_high
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-10 11:24:59	0.367347	0.367347	=BLOB=	=BLOB=	=BLOB=	=BLOB=	29.7063	-58.2102	0.823603	0.409092
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-12 09:21:03	0.4	0.4	=BLOB=	=BLOB=	=BLOB=	=BLOB=	21.0607	-58.2391	0.609229	0.181818
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-13 10:28:45	0.408072	0.408072	=BLOB=	=BLOB=	=BLOB=	=BLOB=	9.4945	-0.13251	0.684564	0.364865
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-14 09:37:17	0.4	0.4	=BLOB=	=BLOB=	=BLOB=	=BLOB=	3.28746	-67.9662	0.772727	0.454545
00778394-c956-408d-8a6c-ca3b05a611d5	2019-08-14 11:35:16	0.463668	0.463668	=BLOB=	=BLOB=	=BLOB=	=BLOB=	10.4665	99.3649	0.72314	0.0373897

...

Total: 41489

Let’s get the trial information for the first day KS022 trained. We will restrict the Sessions table by the day 1 eID and combine this with the trials table

[11]:

eid_day1 = dict(session_uuid=eids['session_uuid'][0])
trials_day1 = ((sessions & eid_day1).proj() * trials).fetch(format='frame')
trials_day1

Out[11]:

		performance	performance_easy	signed_contrasts	n_trials_stim	n_trials_stim_right	prob_choose_right	threshold	bias	lapse_low	lapse_high
subject_uuid	session_start_time
b57c1934-f9d1-4dc4-a474-e2cb4acdf918	2019-09-23 13:43:40	0.515432	0.515432	[-1.0, -0.5, 0.5, 1.0]	[81, 90, 73, 80]	[41, 51, 50, 38]	[0.5061728395061729, 0.5666666666666667, 0.684...	7.23087	86.8559	0.581967	0.525546

Notice how a lot of the metrics that we manually computed from the trials dataset in the previous ONE tutorial have already been computed for us and are available in the trials table. This is one advantage of Datajoint, common computations such as performance can be computed when the data is ingested and stored in tables.

We can find out which visual stimulus contrasts were presented to KS022 on day 1 and how many of each contrast by looking at the signed_contrasts and n_trials_stim attributes

[12]:

contrasts = trials_day1['signed_contrasts'].to_numpy()[0]
n_contrasts = trials_day1['n_trials_stim'].to_numpy()[0]
print(f"Visual stimulus contrasts on day 1 = {contrasts * 100}")
print(f"No. of each contrast on day 1 = {n_contrasts}")

Visual stimulus contrasts on day 1 = [-100.  -50.   50.  100.]
No. of each contrast on day 1 = [81 90 73 80]

We can easily extract the performance by typing

[13]:

print(f"Correct = { trials_day1['performance'].to_numpy()[0] * 100} %")

Correct = 51.5432 %

We can plot the peformance at each contrast by looking at the prob_choose_right attribute. The results stored in Datajoint are already expressed in terms of rightward choice, so we don’t need worry about converting any computations

[14]:

contrast_performance = trials_day1['prob_choose_right'].to_numpy()[0]

import matplotlib.pyplot as plt
plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)
plt.ylim([0, 100])
plt.xticks([*(contrasts * 100)])
plt.xlabel('Signed contrast (%)')
plt.ylabel('Rightward choice (%)')

print(contrast_performance)

[0.50617284 0.56666667 0.68493151 0.475     ]

../../_images/notebooks_dj_intro_dj_intro_28_1.png

Let’s now repeat this for day 15 of training

[15]:

eid_day15 = dict(session_uuid=eids['session_uuid'][14])
trials_day15 = ((sessions & eid_day15).proj() * trials).fetch(format='frame')
contrasts = trials_day15['signed_contrasts'].to_numpy()[0]
n_contrasts = trials_day15['n_trials_stim'].to_numpy()[0]
print(f"Visual stimulus contrasts on day 1 = {contrasts * 100}")
print(f"No. of each contrast on day 1 = {n_contrasts}")

Visual stimulus contrasts on day 1 = [-100.   -50.   -25.   -12.5   12.5   25.    50.   100. ]
No. of each contrast on day 1 = [ 94  65 107  16  21  75  78  64]

[16]:

print(f"Correct = { trials_day15['performance'].to_numpy()[0] * 100} %")

Correct = 88.2692 %

[17]:

contrast_performance = trials_day15['prob_choose_right'].to_numpy()[0]

plt.plot(contrasts * 100, contrast_performance * 100, 'o-', lw=3, ms=10)
plt.ylim([0, 100])
plt.xticks([*(contrasts * 100)])
plt.xlabel('Signed contrast (%)')
plt.ylabel('Rightward choice (%)')
plt.xticks(rotation=90)

print(contrast_performance)

[0.09574468 0.12307692 0.22429907 0.1875     1.         0.93333333
 0.91025641 0.921875  ]

../../_images/notebooks_dj_intro_dj_intro_32_1.png

If we compare the results with the ONE tutorial, we will find that we have replicated those results using Datajoint, congratulations!

You should now be comfortable with the basics of exploring the Datajoint IBL pipeline. More Datajoint tutorials can be found in the IBL-Pipeline github or on the Datajoint jupyter.