Dataset basics

# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy as np

A dataset can be constructed column by column, by adding one variable after another:

# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(np.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# check the result:
print(ds)
y          a
------------
-1.6458    a
-0.24622   a
0.48274    b
0.8733     b
-0.26879   c
0.69281    c

For larger datasets it can be more convenient to print only the first few cases…

print(ds.head())
y          a
------------
-1.6458    a
-0.24622   a
0.48274    b
0.8733     b
-0.26879   c
0.69281    c

… or a summary of variables:

print(ds.summary())
Key   Type     Values
--------------------------------------------------------------------------
y     Var      -1.64581, -0.26879, -0.246219, 0.482743, 0.692815, 0.873297
a     Factor   a:2, b:2, c:2
--------------------------------------------------------------------------
Dataset: 6 cases

An alternative way of constructing a dataset is case by case (i.e., row by row):

rows = []
for i in range(6):
    subject = f'S{i}'
    y = np.random.normal(0, 1)
    a = 'abc'[i % 3]
    rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
print(ds)
subject   y          a
----------------------
S0        1.1242     a
S1        -0.20257   b
S2        -1.1141    c
S3        0.030536   a
S4        -0.41999   b
S5        1.5798     c

Gallery generated by Sphinx-Gallery