Note

Go to the end to download the full example code.

Group level analysis

Datasets provide a means to collect data for statistical analysis. A Dataset is similar to a dataframe in R or pandas, but can hold mass-univariate measurements in NDVars.

This example illustrates how to construct a Datasets by first collecting the cases, or rows, of the desired data table, and then combining them using Dataset.from_caselist().

Simulating subject data 

This example uses simulated EEG data to illustrate that.

We will use the simulated EEG data to derive a general input structure for creating group level datasets: A list of condition labels and corresponding (simulated) EEG responses.

from eelbrain import *


data = datasets.simulate_erp(seed=1)
data.head()

#	cloze	predictability	n_chars
0	0.0054865	low	6
1	0.20557	low	5
2	0.95786	high	4
3	0.20756	low	3
4	0.042116	low	3
5	0.2161	low	3
6	0.82065	high	3
7	0.050949	low	4
8	0.025513	low	6
9	0.12511	low	5

NDVars: eeg

Average the data by condition to get two condition averages per subject:

data_by_condition = data.aggregate('predictability')
data_by_condition

#	n	cloze	predictability	n_chars
0	40	0.89466	high	4.95
1	40	0.13778	low	4.975

NDVars: eeg

Turn this into the general label/brain response structure:

subject_data = list(data_by_condition.zip('predictability', 'eeg'))
subject_data

[('high', <NDVar 'eeg': 140 time, 65 sensor>), ('low', <NDVar 'eeg': 140 time, 65 sensor>)]

Construct group level data 

Use the procedure described above to simulate a group level dataset.

We collect the labels (subject and condition labels) and brain responses in a list (cases). Each entry in this list corresponds to one row of the desired Dataset:

cases = []  # list of rows
for subject in range(10):
    data = datasets.simulate_erp(seed=subject)
    data_by_condition = data.aggregate('predictability')
    for predictability, eeg in data_by_condition.zip('predictability', 'eeg'):
        cases.append([str(subject), predictability, eeg])

cases

[['0', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['0', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['1', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['1', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['2', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['2', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['3', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['3', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['4', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['4', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['5', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['5', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['6', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['6', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['7', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['7', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['8', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['8', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['9', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['9', 'low', <NDVar 'eeg': 140 time, 65 sensor>]]

This list can now be turned into a Dataset:

data = Dataset.from_caselist(['subject', 'predictability', 'eeg'], cases, random='subject')
data.head()

#	subject	predictability
0	0	high
1	0	low
2	1	high
3	1	low
4	2	high
5	2	low
6	3	high
7	3	low
8	4	high
9	4	low

NDVars: eeg

Averaging by condition 

In a dataset that contains condition labels, these labels can be used to derive averages by condition:

data_by_condition = data.aggregate('predictability', drop_bad=True)
data_by_condition

#	n	predictability
0	10	high
1	10	low

NDVars: eeg

This could be used to retrieve those average responses:

data_by_condition[0, 'eeg']

<NDVar 'eeg': 140 time, 65 sensor>

The grand average could be derived by aggregating without a model, resulting in a single row:

data.aggregate(drop_bad=True)

#	n
0	20

NDVars: eeg

Many functions automatically average across cases …

p = plot.TopoButterfly('eeg', data=data)

… and directly accept a parameter for averaging by condition:

p = plot.TopoButterfly('eeg', 'predictability', data=data)

Models can similarly be used to define conditions in statistical tests:

result = testnd.TTestRelated('eeg', 'predictability', match='subject', data=data)
result

Permutation test:   0%|          | 0/1023 [00:00<?, ? permutations/s]
Permutation test:  10%|█         | 105/1023 [00:00<00:00, 1038.37 permutations/s]
Permutation test:  22%|██▏       | 229/1023 [00:00<00:00, 1151.63 permutations/s]
Permutation test:  34%|███▍      | 352/1023 [00:00<00:00, 1183.35 permutations/s]
Permutation test:  46%|████▋     | 475/1023 [00:00<00:00, 1196.99 permutations/s]
Permutation test:  59%|█████▊    | 600/1023 [00:00<00:00, 1215.97 permutations/s]
Permutation test:  71%|███████▏  | 731/1023 [00:00<00:00, 1246.04 permutations/s]
Permutation test:  84%|████████▍ | 862/1023 [00:00<00:00, 1266.73 permutations/s]
Permutation test:  97%|█████████▋| 993/1023 [00:00<00:00, 1278.24 permutations/s]
Permutation test: 100%|██████████| 1023/1023 [00:00<00:00, 1236.01 permutations/s]

<TTestRelated 'eeg', 'predictability', 'high', 'low', 'subject' (n=10), samples=1023, p = .009>

Gallery generated by Sphinx-Gallery

Group level analysis

Simulating subject data

Construct group level data

Averaging by condition

Simulating subject data 

Construct group level data 

Averaging by condition 