Group level analysis

Datasets provide a means to collect data for statistical analysis. A Dataset is similar to a dataframe in R or pandas, but can hold mass-univariate measurements in NDVars.

This example illustrates how to construct a Datasets by first collecting the cases, or rows, of the desired data table, and then combining them using Dataset.from_caselist().

Simulating subject data

This example uses simulated EEG data to illustrate that.

We will use the simulated EEG data to derive a general input structure for creating group level datasets: A list of condition labels and corresponding (simulated) EEG responses.

from eelbrain import *


data = datasets.simulate_erp(seed=1)
data.head()
# cloze predictability n_chars
0 0.0054865 low 6
1 0.20557 low 5
2 0.95786 high 4
3 0.20756 low 3
4 0.042116 low 3
5 0.2161 low 3
6 0.82065 high 3
7 0.050949 low 4
8 0.025513 low 6
9 0.12511 low 5
NDVars: eeg


Average the data by condition to get two condition averages per subject:

# n cloze predictability n_chars
0 40 0.89466 high 4.95
1 40 0.13778 low 4.975
NDVars: eeg


Turn this into the general label/brain response structure:

subject_data = list(data_by_condition.zip('predictability', 'eeg'))
subject_data
[('high', <NDVar 'eeg': 140 time, 65 sensor>), ('low', <NDVar 'eeg': 140 time, 65 sensor>)]

Construct group level data

Use the procedure described above to simulate a group level dataset.

We collect the labels (subject and condition labels) and brain responses in a list (cases). Each entry in this list corresponds to one row of the desired Dataset:

cases = []  # list of rows
for subject in range(10):
    data = datasets.simulate_erp(seed=subject)
    data_by_condition = data.aggregate('predictability')
    for predictability, eeg in data_by_condition.zip('predictability', 'eeg'):
        cases.append([str(subject), predictability, eeg])

cases
[['0', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['0', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['1', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['1', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['2', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['2', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['3', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['3', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['4', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['4', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['5', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['5', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['6', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['6', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['7', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['7', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['8', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['8', 'low', <NDVar 'eeg': 140 time, 65 sensor>], ['9', 'high', <NDVar 'eeg': 140 time, 65 sensor>], ['9', 'low', <NDVar 'eeg': 140 time, 65 sensor>]]

This list can now be turned into a Dataset:

data = Dataset.from_caselist(['subject', 'predictability', 'eeg'], cases, random='subject')
data.head()
# subject predictability
0 0 high
1 0 low
2 1 high
3 1 low
4 2 high
5 2 low
6 3 high
7 3 low
8 4 high
9 4 low
NDVars: eeg


Averaging by condition

In a dataset that contains condition labels, these labels can be used to derive averages by condition:

data_by_condition = data.aggregate('predictability', drop_bad=True)
data_by_condition
# n predictability
0 10 high
1 10 low
NDVars: eeg


This could be used to retrieve those average responses:

<NDVar 'eeg': 140 time, 65 sensor>

The grand average could be derived by aggregating without a model, resulting in a single row:

data.aggregate(drop_bad=True)
# n
0 20
NDVars: eeg


Many functions automatically average across cases …

group level analysis

… and directly accept a parameter for averaging by condition:

p = plot.TopoButterfly('eeg', 'predictability', data=data)
group level analysis

Models can similarly be used to define conditions in statistical tests:

result = testnd.TTestRelated('eeg', 'predictability', match='subject', data=data)
result
<TTestRelated 'eeg', 'predictability', 'high', 'low', 'subject' (n=10), samples=1023, p = .009>

Gallery generated by Sphinx-Gallery