Dataset basics

# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy as np

A dataset can be constructed column by column, by adding one variable after another:

# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(np.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# check the result:
print(ds)

Out:

y          a
------------
0.1815     a
0.28401    a
-2.5908    b
1.0572     b
-0.03092   c
-1.475     c

For larger datasets it can be more convenient to print only the first few cases…

print(ds.head())

Out:

y          a
------------
0.1815     a
0.28401    a
-2.5908    b
1.0572     b
-0.03092   c
-1.475     c

… or a summary of variables:

print(ds.summary())

Out:

Key   Type     Values
----------------------------------------------------------------------
y     Var      -2.59083, -1.47502, -0.03092, 0.1815, 0.284014, 1.05724
a     Factor   a:2, b:2, c:2
----------------------------------------------------------------------
Dataset: 6 cases

An alternative way of constructing a dataset is case by case (i.e., row by row):

rows = []
for i in range(6):
    subject = f'S{i}'
    y = np.random.normal(0, 1)
    a = 'abc'[i % 3]
    rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
print(ds)

Out:

subject   y           a
-----------------------
S0        0.59608     a
S1        -0.23879    b
S2        -0.076139   c
S3        0.23884     a
S4        -0.41739    b
S5        1.1984      c

Gallery generated by Sphinx-Gallery