Dataset basics

# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy

A dataset can be constructed column by column, by adding one variable after another:

# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(numpy.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# A variable that's equal in all cases can be assigned quickly:
ds[:, 'z'] = 0.
# check the result:
ds
y a z
-0.12677 a 0
0.95357 a 0
-0.21697 b 0
0.066677 b 0
-0.7492 c 0
0.46497 c 0


For larger datasets it can be more convenient to print only the first few cases…

y a z
-0.12677 a 0
0.95357 a 0
-0.21697 b 0
0.066677 b 0
-0.7492 c 0
0.46497 c 0


… or a summary of variables:

Key Type Values
y Var -0.749198, -0.216966, -0.126774, 0.0666765, 0.464972, 0.953569
a Factor a:2, b:2, c:2
z Var 0:6
Dataset: 6 cases


An alternative way of constructing a dataset is case by case (i.e., row by row):

rows = []
for i in range(6):
    subject = f'S{i}'
    y = numpy.random.normal(0, 1)
    a = 'abc'[i % 3]
    rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
ds
subject y a
S0 0.25756 a
S1 1.0328 b
S2 0.41678 c
S3 0.24911 a
S4 -0.58812 b
S5 -0.66663 c


Example

Below is a simple example using data objects (for more, see the Examples):

y = numpy.empty(21)
y[:14] = numpy.random.normal(0, 1, 14)
y[14:] = numpy.random.normal(2, 1, 7)
ds = Dataset({
    'a': Factor('abc', 'A', repeat=7),
    'y': Var(y, 'Y'),
})
ds
a y
a 1.3913
a -1.3665
a 0.57004
a 1.8531
a 0.19558
a 1.3243
a 0.27022
b 0.57745
b -0.45996
b 0.8148
b 0.65245
b -1.3438
b 0.20948
b 0.47726
c 1.5961
c 2.8759
c 2.0963
c 2.0853
c 3.3374
c 1.2779
c 3.314


table.frequencies('a', data=ds)
a n
a 7
b 7
c 7


test.ANOVA('y', 'a', data=ds)
SS df MS F p
a 19.45 2 9.72 12.09*** < .001
Residuals 14.48 18 0.80
Total 33.93 20


test.pairwise('y', 'a', data=ds, corr='Hochberg')

Pairwise T-Tests (independent samples)

b c
a t12 = 0.95
p = .362
pc = .362
t12 = -3.46*
p = .005
pc = .014
b t12 = -5.25**
p < .001
pc = .001
(* Corrected after Hochberg, 1988)


t = test.pairwise('y', 'a', data=ds, corr='Hochberg')
print(t.get_tex())
\begin{center}
\begin{tabular}{lll}
\toprule
 & b & c \\
\midrule
a & $t_{12} = 0.95^{   \ \ \ }$\\
$p = .362$\\
$p_{c} = .362$ & $t_{12} = -3.46^{*  \ \ }$\\
$p = .005$\\
$p_{c} = .014$ \\
b &  & $t_{12} = -5.25^{** \ }$\\
$p < .001$\\
$p_{c} = .001$ \\
\bottomrule
\end{tabular}
\end{center}
p = plot.Boxplot('y', 'a', data=ds, title="My Boxplot", ylabel="value", corr='Hochberg')
My Boxplot

Gallery generated by Sphinx-Gallery