Dataset basics

# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy as np

A dataset can be constructed column by column, by adding one variable after another:

# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(np.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# check the result:
print(ds)

Out:

y           a
-------------
1.9916      a
-1.1491     a
-0.040077   b
0.42575     b
-0.2187     c
-1.4327     c

For larger datasets it can be more convenient to print only the first few cases…

print(ds.head())

Out:

y           a
-------------
1.9916      a
-1.1491     a
-0.040077   b
0.42575     b
-0.2187     c
-1.4327     c

… or a summary of variables:

print(ds.summary())

Out:

Key   Type     Values
-------------------------------------------------------------------------
y     Var      -1.4327, -1.14913, -0.218699, -0.040077, 0.425754, 1.99161
a     Factor   a:2, b:2, c:2
-------------------------------------------------------------------------
Dataset: 6 cases

An alternative way of constructing a dataset is case by case (i.e., row by row):

rows = []
for i in range(6):
    subject = f'S{i}'
    y = np.random.normal(0, 1)
    a = 'abc'[i % 3]
    rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
print(ds)

Out:

subject   y          a
----------------------
S0        -0.10058   a
S1        -1.635     b
S2        0.19041    c
S3        1.0565     a
S4        -0.26542   b
S5        -1.2072    c

Gallery generated by Sphinx-Gallery