brainbox.metrics.single_units

Computes metrics for assessing quality of single units.

Run the following to set-up the workspace to run the docstring examples: >>> import brainbox as bb >>> import alf.io as aio >>> import numpy as np >>> import matplotlib.pyplot as plt >>> import ibllib.ephys.spikes as e_spks # (*Note, if there is no ‘alf’ directory, make ‘alf’ directory from ‘ks2’ output directory): >>> e_spks.ks2_to_alf(path_to_ks_out, path_to_alf_out) # Load the alf spikes bunch and clusters bunch, and get a units bunch. >>> spks_b = aio.load_object(path_to_alf_out, ‘spikes’) >>> clstrs_b = aio.load_object(path_to_alf_out, ‘clusters’) >>> units_b = bb.processing.get_units_bunch(spks_b) # may take a few mins to compute

Functions

average_drift

Computes the cumulative drift (normalized by the total number of spikes) of a spike feature array.

compute_labels

From a dataframe or a dictionary of unit metrics, compute a lablel :param r: dictionary or pandas dataframe containing unit qcs :param return_details: False (returns a full dictionary of metrics) :return: vector of proportion of qcs passed between 0 and 1, where 1 denotes an all pass

contamination

An estimate of the contamination of the unit (i.e.

contamination_alt

An estimate of the contamination of the unit (i.e.

firing_rate_coeff_var

Computes the coefficient of variation of the firing rate: the ratio of the standard deviation to the mean.

firing_rate_fano_factor

Computes the fano factor of the firing rate: the ratio of the variance to the mean.

missed_spikes_est

Computes the approximate fraction of spikes missing from a spike feature distribution for a given unit, assuming the distribution is symmetric.

noise_cutoff

A metric to determine whether a unit’s amplitude distribution is cut off (at floor), without assuming a Gaussian distribution.

pres_ratio

Computes the presence ratio of spike counts: the number of bins where there is at least one spike, over the total number of bins, given a specified bin width.

ptp_over_noise

For specified channels, for specified timestamps, computes the mean (peak-to-peak amplitudes / the MADs of the background noise).

quick_unit_metrics

Computes single unit metrics from only the spike times, amplitudes, and depths for a set of units.

slidingRP_viol

A binary metric which determines whether there is an acceptable level of refractory period violations by using a sliding refractory period:

spike_sorting_metrics

Computes: - cell level metrics (cf quick_unit_metrics) - label the metrics according to quality thresholds - estimates drift as a function of time :param times: vector of spike times :param clusters: :param amplitudes: :param depths: :param cluster_ids (optional): set of clusters (if None the output datgrame will match the unique set of clusters represented in spike clusters) :param params: dict (optional) parameters for qc computation ( see constant at the top of the module for default values and keys) :return: data_frame of metrics (cluster records, columns are qc attributes)| :return: dictionary of recording qc (keys ‘time_scale’ and ‘drift_um’)

unit_stability

Computes the probability that the empirical spike feature distribution(s), for specified feature(s), for all units, comes from a specific theoretical distribution, based on a specified statistical test.

wf_similarity

Computes a unit normalized spatiotemporal similarity score between two sets of waveforms.

unit_stability(units_b, units=None, feat_names=['amps'], dist='norm', test='ks')[source]

Computes the probability that the empirical spike feature distribution(s), for specified feature(s), for all units, comes from a specific theoretical distribution, based on a specified statistical test. Also computes the coefficients of variation of the spike feature(s) for all units.

Parameters
  • units_b (bunch) – A units bunch containing fields with spike information (e.g. cluster IDs, times, features, etc.) for all units.

  • units (array-like (optional)) – A subset of all units for which to create the bar plot. (If None, all units are used)

  • feat_names (list of strings (optional)) – A list of names of spike features that can be found in spks to specify which features to use for calculating unit stability.

  • dist (string (optional)) – The type of hypothetical null distribution for which the empirical spike feature distributions are presumed to belong to.

  • test (string (optional)) – The statistical test used to compute the probability that the empirical spike feature distributions come from dist.

Returns

  • p_vals_b (bunch) – A bunch with feat_names as keys, containing a ndarray with p-values (the probabilities that the empirical spike feature distribution for each unit comes from dist based on test) for each unit for all feat_names.

  • cv_b (bunch) – A bunch with feat_names as keys, containing a ndarray with the coefficients of variation of each unit’s empirical spike feature distribution for all features.

See also

plot.feat_vars()

Examples

1) Compute 1) the p-values obtained from running a one-sample ks test on the spike amplitudes for each unit, and 2) the variances of the empirical spike amplitudes distribution for each unit. Create a histogram of the variances of the spike amplitudes for each unit, color-coded by depth of channel of max amplitudes. Get cluster IDs of those units which have variances greater than 50.

>>> p_vals_b, variances_b = bb.metrics.unit_stability(units_b)
# Plot histograms of variances color-coded by depth of channel of max amplitudes
>>> fig = bb.plot.feat_vars(units_b, feat_name='amps')
# Get all unit IDs which have amps variance > 50
>>> var_vals = np.array(tuple(variances_b['amps'].values()))
>>> bad_units = np.where(var_vals > 50)
missed_spikes_est(feat, spks_per_bin=20, sigma=5, min_num_bins=50)[source]

Computes the approximate fraction of spikes missing from a spike feature distribution for a given unit, assuming the distribution is symmetric. Inspired by metric described in Hill et al. (2011) J Neurosci 31: 8699-8705.

Parameters
  • feat (ndarray) – The spikes’ feature values (e.g. amplitudes)

  • spks_per_bin (int (optional)) – The number of spikes per bin from which to compute the spike feature histogram.

  • sigma (int (optional)) – The standard deviation for the gaussian kernel used to compute the pdf from the spike feature histogram.

  • min_num_bins (int (optional)) – The minimum number of bins used to compute the spike feature histogram.

Returns

  • fraction_missing (float) – The fraction of missing spikes (0-0.5). *Note: If more than 50% of spikes are missing, an accurate estimate isn’t possible.

  • pdf (ndarray) – The computed pdf of the spike feature histogram.

  • cutoff_idx (int) – The index for pdf at which point pdf is no longer symmetrical around the peak. (This is returned for plotting purposes).

See also

plot.feat_cutoff()

Examples

1) Determine the fraction of spikes missing from unit 1 based on the recorded unit’s spike amplitudes, assuming the distribution of the unit’s spike amplitudes is symmetric.

# Get unit 1 amplitudes from a unit bunch, and compute fraction spikes missing. >>> feat = units_b[‘amps’][‘1’] >>> fraction_missing = bb.plot.feat_cutoff(feat)

wf_similarity(wf1, wf2)[source]

Computes a unit normalized spatiotemporal similarity score between two sets of waveforms. This score is based on how waveform shape correlates for each pair of spikes between the two sets of waveforms across space and time. The shapes of the arrays of the two sets of waveforms must be equal.

Parameters
  • wf1 (ndarray) – An array of shape (#spikes, #samples, #channels).

  • wf2 (ndarray) – An array of shape (#spikes, #samples, #channels).

Returns

s – The unit normalized spatiotemporal similarity score.

Return type

float

See also

io.extract_waveforms(), plot.single_unit_wf_comp()

Examples

1) Compute the similarity between the first and last 100 waveforms for unit1, across the 20 channels around the channel of max amplitude.

# Get the channels around the max amp channel for the unit, two sets of timestamps for the # unit, and the two corresponding sets of waveforms for those two sets of timestamps. # Then compute s. >>> max_ch = clstrs_b[‘channels’][1] >>> if max_ch < 10: # take only channels greater than max_ch. >>> ch = np.arange(max_ch, max_ch + 20) >>> elif (max_ch + 10) > 385: # take only channels less than max_ch. >>> ch = np.arange(max_ch - 20, max_ch) >>> else: # take n_c_ch around max_ch. >>> ch = np.arange(max_ch - 10, max_ch + 10) >>> ts1 = units_b[‘times’][‘1’][:100] >>> ts2 = units_b[‘times’][‘1’][-100:] >>> wf1 = bb.io.extract_waveforms(path_to_ephys_file, ts1, ch) >>> wf2 = bb.io.extract_waveforms(path_to_ephys_file, ts2, ch) >>> s = bb.metrics.wf_similarity(wf1, wf2)

TODO check s calculation: take median of waveforms xcorr all waveforms with median, and divide by autocorr of all waveforms profile for two sets of units: xcorr(cl1, cl2) / (sqrt autocorr(cl1) * autocorr(cl2))

firing_rate_coeff_var(ts, hist_win=0.01, fr_win=0.5, n_bins=10)[source]

Computes the coefficient of variation of the firing rate: the ratio of the standard deviation to the mean.

Parameters
  • ts (ndarray) – The spike timestamps from which to compute the firing rate.

  • hist_win (float (optional)) – The time window (in s) to use for computing spike counts.

  • fr_win (float (optional)) – The time window (in s) to use as a moving slider to compute the instantaneous firing rate.

  • n_bins (int (optional)) – The number of bins in which to compute a coefficient of variation of the firing rate.

Returns

  • cv (float) – The mean coefficient of variation of the firing rate of the n_bins number of coefficients computed.

  • cvs (ndarray) – The coefficients of variation of the firing for each bin of n_bins.

  • fr (ndarray) – The instantaneous firing rate over time (in hz).

See also

singlecell.firing_rate(), plot.firing_rate()

Examples

1) Compute the coefficient of variation of the firing rate for unit 1 from the time of its first to last spike, and compute the coefficient of variation of the firing rate for unit 2 from the first to second minute.

>>> ts_1 = units_b['times']['1']
>>> ts_2 = units_b['times']['2']
>>> ts_2 = np.intersect1d(np.where(ts_2 > 60)[0], np.where(ts_2 < 120)[0])
>>> cv, cvs, fr = bb.metrics.firing_rate_coeff_var(ts_1)
>>> cv_2, cvs_2, fr_2 = bb.metrics.firing_rate_coeff_var(ts_2)
firing_rate_fano_factor(ts, hist_win=0.01, fr_win=0.5, n_bins=10)[source]

Computes the fano factor of the firing rate: the ratio of the variance to the mean. (Almost identical to coeff. of variation)

Parameters
  • ts (ndarray) – The spike timestamps from which to compute the firing rate.

  • hist_win (float) – The time window (in s) to use for computing spike counts.

  • fr_win (float) – The time window (in s) to use as a moving slider to compute the instantaneous firing rate.

  • n_bins (int (optional)) – The number of bins in which to compute a fano factor of the firing rate.

Returns

  • ff (float) – The mean fano factor of the firing rate of the n_bins number of factors computed.

  • ffs (ndarray) – The fano factors of the firing for each bin of n_bins.

  • fr (ndarray) – The instantaneous firing rate over time (in hz).

See also

singlecell.firing_rate(), plot.firing_rate()

Examples

1) Compute the fano factor of the firing rate for unit 1 from the time of its first to last spike, and compute the fano factor of the firing rate for unit 2 from the first to second minute.

>>> ts_1 = units_b['times']['1']
>>> ts_2 = units_b['times']['2']
>>> ts_2 = np.intersect1d(np.where(ts_2 > 60)[0], np.where(ts_2 < 120)[0])
>>> ff, ffs, fr = bb.metrics.firing_rate_fano_factor(ts_1)
>>> ff_2, ffs_2, fr_2 = bb.metrics.firing_rate_fano_factor(ts_2)
average_drift(feat, times)[source]

Computes the cumulative drift (normalized by the total number of spikes) of a spike feature array.

Parameters

feat (ndarray) – The spike feature values from which to compute the maximum drift. Usually amplitudes

Returns

cd – The cumulative drift of the unit.

Return type

float

See also

max_drift()

Examples

  1. Get the cumulative depth drift for unit 1.
    >>> unit_idxs = np.where(spks_b['clusters'] == 1)[0]
    >>> depths = spks_b['depths'][unit_idxs]
    >>> amps = spks_b['amps'][unit_idxs]
    >>> depth_cd = bb.metrics.cum_drift(depths)
    >>> amp_cd = bb.metrics.cum_drift(amps)
    
pres_ratio(ts, hist_win=10)[source]

Computes the presence ratio of spike counts: the number of bins where there is at least one spike, over the total number of bins, given a specified bin width.

Parameters
  • ts (ndarray) – The spike timestamps from which to compute the presence ratio.

  • hist_win (float (optional)) – The time window (in s) to use for computing the presence ratio.

Returns

  • pr (float) – The presence ratio.

  • spks_bins (ndarray) – The number of spks in each bin.

See also

plot.pres_ratio()

Examples

  1. Compute the presence ratio for unit 1, given a window of 10 s.
    >>> ts = units_b['times']['1']
    >>> pr, pr_bins = bb.metrics.pres_ratio(ts)
    
ptp_over_noise(ephys_file, ts, ch, t=2.0, sr=30000, n_ch_probe=385, dtype='int16', offset=0, car=True)[source]

For specified channels, for specified timestamps, computes the mean (peak-to-peak amplitudes / the MADs of the background noise).

Parameters
  • ephys_file (string) – The file path to the binary ephys data.

  • ts (ndarray_like) – The timestamps (in s) of the spikes.

  • ch (ndarray_like) – The channels on which to extract the waveforms.

  • t (numeric (optional)) – The time (in ms) of the waveforms to extract to compute the ptp.

  • sr (int (optional)) – The sampling rate (in hz) that the ephys data was acquired at.

  • n_ch_probe (int (optional)) – The number of channels of the recording.

  • dtype (str (optional)) – The datatype represented by the bytes in ephys_file.

  • offset (int (optional)) – The offset (in bytes) from the start of ephys_file.

  • car (bool (optional)) – A flag to perform common-average-referencing before extracting waveforms.

Returns

ptp_sigma – An array containing the mean ptp_over_noise values for the specified ts and ch.

Return type

ndarray

Examples

1) Compute ptp_over_noise for all spikes on 20 channels around the channel of max amplitude for unit 1.

>>> ts = units_b['times']['1']
>>> max_ch = max_ch = clstrs_b['channels'][1]
>>> if max_ch < 10:  # take only channels greater than `max_ch`.
>>>     ch = np.arange(max_ch, max_ch + 20)
>>> elif (max_ch + 10) > 385:  # take only channels less than `max_ch`.
>>>     ch = np.arange(max_ch - 20, max_ch)
>>> else:  # take `n_c_ch` around `max_ch`.
>>>     ch = np.arange(max_ch - 10, max_ch + 10)
>>> p = bb.metrics.ptp_over_noise(ephys_file, ts, ch)
contamination_alt(ts, rp=0.002)[source]

An estimate of the contamination of the unit (i.e. a pseudo false positive measure) based on the number of spikes, number of isi violations, and time between the first and last spike. (see Hill et al. (2011) J Neurosci 31: 8699-8705).

Parameters
  • ts (ndarray_like) – The timestamps (in s) of the spikes.

  • rp (float (optional)) – The refractory period (in s).

Returns

ce – An estimate of the fraction of contamination.

Return type

float

Examples

  1. Compute contamination estimate for unit 1.
    >>> ts = units_b['times']['1']
    >>> ce = bb.metrics.contamination(ts)
    
contamination(ts, min_time, max_time, rp=0.002, min_isi=0.0001)[source]

An estimate of the contamination of the unit (i.e. a pseudo false positive measure) based on the number of spikes, number of isi violations, and time between the first and last spike. (see Hill et al. (2011) J Neurosci 31: 8699-8705).

Modified by Dan Denman from cortex-lab/sortingQuality GitHub by Nick Steinmetz.

Parameters
  • ts (ndarray_like) – The timestamps (in s) of the spikes.

  • min_time (float) – The minimum time (in s) that a potential spike occurred.

  • max_time (float) – The maximum time (in s) that a potential spike occurred.

  • rp (float (optional)) – The refractory period (in s).

  • min_isi (float (optional)) – The minimum interspike-interval (in s) for counting duplicate spikes.

Returns

  • ce (float) – An estimate of the contamination. A perfect unit has a ce = 0 A unit with some contamination has a ce < 0.5 A unit with lots of contamination has a ce > 1.0

  • num_violations (int) – The total number of isi violations.

See also

contamination()

Examples

1) Compute contamination estimate for unit 1, with a minimum isi for counting duplicate spikes of 0.1 ms.

>>> ts = units_b['times']['1']
>>> ce = bb.metrics.contamination_alt(ts, min_isi=0.0001)
slidingRP_viol(ts, bin_size=0.25, thresh=0.1, acceptThresh=0.1)[source]

A binary metric which determines whether there is an acceptable level of refractory period violations by using a sliding refractory period:

This takes into account the firing rate of the neuron and computes a maximum acceptable level of contamination at different possible values of the refractory period. If the unit has less than the maximum contamination at any of the possible values of the refractory period, the unit passes.

A neuron will always fail this metric for very low firing rates, and thus this metric takes into account both firing rate and refractory period violations.

Parameters
  • ts (ndarray_like) – The timestamps (in s) of the spikes.

  • bin_size (float) – The size of binning for the autocorrelogram.

  • thresh (float) –

    Spike rate used to generate poisson distribution (to compute maximum

    acceptable contamination, see _max_acceptable_cont)

  • acceptThresh (float) –

    The fraction of contamination we are willing to accept (default value

    set to 0.1, or 10% contamination)

Returns

didpass – 0 if unit didn’t pass 1 if unit did pass

Return type

int

See also

contamination()

Examples

1) Compute whether a unit has too much refractory period contamination at any possible value of a refractory period, for a 0.25 ms bin, with a threshold of 10% acceptable contamination

>>> ts = units_b['times']['1']
>>> didpass = bb.metrics.slidingRP_viol(ts, bin_size=0.25, thresh=0.1,
                                        acceptThresh=0.1)
noise_cutoff(amps, quartile_length=0.2, n_bins=100, n_low_bins=2)[source]

A metric to determine whether a unit’s amplitude distribution is cut off (at floor), without assuming a Gaussian distribution.

This metric takes the amplitude distribution, computes the mean and std of an upper quartile of the distribution, and determines how many standard deviations away from that mean a lower quartile lies.

Parameters
  • amps (ndarray_like) – The amplitudes (in uV) of the spikes.

  • quartile_length (float) – The size of the upper quartile of the amplitude distribution.

  • n_bins (int) – The number of bins used to compute a histogram of the amplitude distribution.

  • n_low_bins (int) – The number of bins used in the lower part of the distribution (where cutoff is determined).

Returns

cutoff – Number of standard deviations that the lower mean is outside of the mean of the upper quartile.

Return type

float

Examples

  1. Compute whether a unit’s amplitude distribution is cut off
    >>> amps = spks_b['amps'][unit_idxs]
    >>> cutoff = bb.metrics.noise_cutoff(amps, quartile_length=.2,
                                         n_bins=100, n_low_bins=2)
    
spike_sorting_metrics(times, clusters, amps, depths, cluster_ids=None, params={'RPslide_thresh': 0.1, 'acceptable_contamination': 0.1, 'bin_size': 0.25, 'med_amp_thresh_uv': 50, 'min_isi': 0.0001, 'min_num_bins_for_missed_spks_est': 50, 'nc_bins': 100, 'nc_n_low_bins': 2, 'nc_quartile_length': 0.2, 'nc_thresh': 20, 'presence_window': 10, 'refractory_period': 0.0015, 'spks_per_bin_for_missed_spks_est': 10, 'std_smoothing_kernel_for_missed_spks_est': 4})[source]

Computes: - cell level metrics (cf quick_unit_metrics) - label the metrics according to quality thresholds - estimates drift as a function of time :param times: vector of spike times :param clusters: :param amplitudes: :param depths: :param cluster_ids (optional): set of clusters (if None the output datgrame will match

the unique set of clusters represented in spike clusters)

Parameters

params – dict (optional) parameters for qc computation ( see constant at the top of the module for default values and keys)

Returns

data_frame of metrics (cluster records, columns are qc attributes)|

Returns

dictionary of recording qc (keys ‘time_scale’ and ‘drift_um’)

quick_unit_metrics(spike_clusters, spike_times, spike_amps, spike_depths, params={'RPslide_thresh': 0.1, 'acceptable_contamination': 0.1, 'bin_size': 0.25, 'med_amp_thresh_uv': 50, 'min_isi': 0.0001, 'min_num_bins_for_missed_spks_est': 50, 'nc_bins': 100, 'nc_n_low_bins': 2, 'nc_quartile_length': 0.2, 'nc_thresh': 20, 'presence_window': 10, 'refractory_period': 0.0015, 'spks_per_bin_for_missed_spks_est': 10, 'std_smoothing_kernel_for_missed_spks_est': 4}, cluster_ids=None, tbounds=None)[source]

Computes single unit metrics from only the spike times, amplitudes, and depths for a set of units.

Metrics computed:

‘amp_max’, ‘amp_min’, ‘amp_median’, ‘amp_std_dB’, ‘contamination’, ‘contamination_alt’, ‘drift’, ‘missed_spikes_est’, ‘noise_cutoff’, ‘presence_ratio’, ‘presence_ratio_std’, ‘slidingRP_viol’, ‘spike_count’

spike_clustersndarray_like

A vector of the unit ids for a set of spikes.

spike_timesndarray_like

A vector of the timestamps for a set of spikes.

spike_ampsndarray_like

A vector of the amplitudes for a set of spikes.

spike_depthsndarray_like

A vector of the depths for a set of spikes.

clusters_id: (optional) lists of cluster ids. If not all clusters are represented in the spikes_clusters (ie. cluster has no spike), this will ensure the output size is consistent with the input arrays. tbounds: (optional) list or 2 elements array containing a time-selection to perform the

metrics computation on.

paramsdict (optional)
Parameters used for computing some of the metrics in the function:
‘presence_window’: float

The time window (in s) used to look for spikes when computing the presence ratio.

‘refractory_period’: float

The refractory period used when computing isi violations and the contamination estimate.

‘min_isi’: float

The minimum interspike-interval (in s) for counting duplicate spikes when computing the contamination estimate.

‘spks_per_bin_for_missed_spks_est’: int

The number of spikes per bin used to compute the spike amplitude pdf for a unit, when computing the missed spikes estimate.

‘std_smoothing_kernel_for_missed_spks_est’: float

The standard deviation for the gaussian kernel used to compute the spike amplitude pdf for a unit, when computing the missed spikes estimate.

‘min_num_bins_for_missed_spks_est’: int

The minimum number of bins used to compute the spike amplitude pdf for a unit, when computing the missed spikes estimate.

Returns

r – A bunch whose keys are the computed spike metrics.

Return type

bunch

Notes

This function is called by ephysqc.unit_metrics_ks2 which is called by spikes.ks2_to_alf during alf extraction of an ephys dataset in the ibl ephys extraction pipeline.

Examples

  1. Compute quick metrics from a ks2 output directory:
    >>> from ibllib.ephys.ephysqc import phy_model_from_ks2_path
    >>> m = phy_model_from_ks2_path(path_to_ks2_out)
    >>> cluster_ids = m.spike_clusters
    >>> ts = m.spike_times
    >>> amps = m.amplitudes
    >>> depths = m.depths
    >>> r = bb.metrics.quick_unit_metrics(cluster_ids, ts, amps, depths)
    
compute_labels(r, params={'RPslide_thresh': 0.1, 'acceptable_contamination': 0.1, 'bin_size': 0.25, 'med_amp_thresh_uv': 50, 'min_isi': 0.0001, 'min_num_bins_for_missed_spks_est': 50, 'nc_bins': 100, 'nc_n_low_bins': 2, 'nc_quartile_length': 0.2, 'nc_thresh': 20, 'presence_window': 10, 'refractory_period': 0.0015, 'spks_per_bin_for_missed_spks_est': 10, 'std_smoothing_kernel_for_missed_spks_est': 4}, return_details=False)[source]

From a dataframe or a dictionary of unit metrics, compute a lablel :param r: dictionary or pandas dataframe containing unit qcs :param return_details: False (returns a full dictionary of metrics) :return: vector of proportion of qcs passed between 0 and 1, where 1 denotes an all pass