HDF5 format specification

This page documents the on-disk layout of .h5 files written by lfpack.

Layout (current — multi-recording, pyramidal)

<file>.h5
└─ <recording>/           # one group per recording (e.g. "probe00")
   └─ <scale_2digit>/     # zero-padded scale index (e.g. "00", "01")
      ├─ meta             # scalar/array attributes (see below)
      └─ chunks/
         ├─ 0/            # first compressed chunk
         │   ├─ U_scaled       float32  (nc, r)     shuffle+gzip
         │   ├─ vh_indices     int32    (n_kept,)    gzip
         │   └─ vh_values      float32  (n_kept,)    gzip
         ├─ 1/
         ...

`meta` attributes

Attribute	Type	Description
`nc`	int	Number of channels
`ns_total`	int	Total number of samples
`fs`	float	Nominal sample rate (Hz)
`fs_sync`	float	Sync-corrected sample rate (Hz); `NaN` when not available
`t0_sync`	float	Session-clock time in seconds at sample 0; `NaN` when not available
`epsilon`	float	SVD noise-floor threshold
`alpha`	float	Wavelet-packet threshold multiplier
`compress_chunk`	int	Samples per compressed chunk
`geometry_x`	float32 array	Channel x positions (µm)
`geometry_y`	float32 array	Channel y positions (µm)
`sglx_meta`	JSON string	Original SpikeGLX metadata

fs_sync and t0_sync are written by compress_to_h5 when the caller supplies the corresponding keyword arguments. LFPackReader.fs returns fs_sync when it is finite, otherwise falls back to fs. LFPackReader.t0 returns t0_sync when finite, otherwise NaN. LFPackReader.times produces the full session-clock time vector using these values.

Chunk datasets

Each chunk group (chunks/<i>/) stores three datasets.

`U_scaled`

Shape (nc, r), dtype float32. Stores U[:, :r] * sv[:r] — the left singular vectors scaled by their singular values. Written with the HDF5 shuffle filter followed by gzip (level 4). The shuffle filter reorders bytes by significance (all MSBs first, then the next byte, …), improving gzip compression on dense float32 data by ~10–20%.

`vh_indices` and `vh_values`

Shape (n_kept,), dtype int32 and float32 respectively. Store the sparse non-zero wavelet-packet coefficients of the right singular vectors. The dense array can be reconstructed as:

Vh_hat = np.zeros(vh_shape, dtype=np.float32)
Vh_hat.flat[vh_indices] = vh_values

vh_shape is stored as an attribute on the chunk group. Storing indices + values rather than a dense zeroed array and relying on gzip to compress zero runs gives a smaller file and faster reads.

Chunk group attributes

Attribute	Type	Description
`r`	int	SVD rank for this chunk
`ns`	int	Number of samples in this chunk
`ns_extended`	int	Samples including guard bands
`vh_shape`	2-tuple	Shape of the dense Vh_hat array

HDF5 version compatibility

All files are written with libver=('earliest', 'v110'), which pins the format to HDF5 1.10 features (released 2017). Any HDF5 ≥ 1.10 — and therefore any h5py ≥ 3.0 — can read lfpack files.

Warning

Do not change this to libver='latest'. That setting resolves to whatever the installed library considers current at write time, so files written on different machines end up in different formats. Reading them elsewhere then fails with cryptic low-level errors (bad version number for layout message) with no indication that a version mismatch is the cause.

Legacy flat layout

Files written by lfpack ≤ 0.0.x use a flat layout with meta and chunks/ at the root (no recording/scale hierarchy). LFPackReader detects this automatically (by checking for a meta key at root level) and remains fully readable. Writing this format is no longer supported.

Layout (current — multi-recording, pyramidal)

meta attributes