Dataset basics

# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy

A dataset can be constructed column by column, by adding one variable after another:

# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(numpy.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# A variable that's equal in all cases can be assigned quickly:
ds[:, 'z'] = 0.
# check the result:
ds

y	a	z
0.97585	a	0
-0.84282	a	0
-0.39702	b	0
-0.088688	b	0
-0.26644	c	0
1.0767	c	0

For larger datasets it can be more convenient to print only the first few cases…

ds.head()

y	a	z
0.97585	a	0
-0.84282	a	0
-0.39702	b	0
-0.088688	b	0
-0.26644	c	0
1.0767	c	0

… or a summary of variables:

ds.summary()

Key	Type	Values
y	Var	-0.842819, -0.397017, -0.266438, -0.0886884, 0.975853, 1.07665
a	Factor	a:2, b:2, c:2
z	Var	0:6

Dataset: 6 cases

An alternative way of constructing a dataset is case by case (i.e., row by row):

rows = []
for i in range(6):
    subject = f'S{i}'
    y = numpy.random.normal(0, 1)
    a = 'abc'[i % 3]
    rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
ds

subject	y	a
S0	1.4456	a
S1	-3.0226	b
S2	-0.97383	c
S3	1.4724	a
S4	0.092753	b
S5	-1.0482	c

Example

Below is a simple example using data objects (for more, see the Examples):

y = numpy.empty(21)
y[:14] = numpy.random.normal(0, 1, 14)
y[14:] = numpy.random.normal(2, 1, 7)
ds = Dataset({
    'a': Factor('abc', 'A', repeat=7),
    'y': Var(y, 'Y'),
})
ds

a	y
a	0.78088
a	0.92937
a	2.2588
a	0.26756
a	0.72138
a	1.0038
a	-0.16805
b	0.10831
b	0.78845
b	-0.12398
b	0.46897
b	-1.2794
b	-0.41484
b	1.1206
c	2.3634
c	0.57474
c	1.2629
c	1.7722
c	3.1952
c	2.2073
c	1.8299

table.frequencies('a', data=ds)

a	n
a	7
b	7
c	7

test.ANOVA('y', 'a', data=ds)

	SS	df	MS	F	p
a	11.35	2	5.68	8.93^**	.002
Residuals	11.45	18	0.64
Total	22.80	20

test.pairwise('y', 'a', data=ds, corr='Hochberg')

Pairwise T-Tests (independent samples)

	b	c
a	t₁₂ = 1.76 p = .104 p_c = .104	t₁₂ = -2.49 p = .028 p_c = .085
b		t₁₂ = -4.09^** p = .001 p_c = .007

(* Corrected after Hochberg, 1988)

t = test.pairwise('y', 'a', data=ds, corr='Hochberg')
print(t.get_tex())

\begin{center}
\begin{tabular}{lll}
\toprule
 & b & c \\
\midrule
a & $t_{12} = 1.76^{   \ \ \ }$\\
$p = .104$\\
$p_{c} = .104$ & $t_{12} = -2.49^{   \ \ \ }$\\
$p = .028$\\
$p_{c} = .085$ \\
b &  & $t_{12} = -4.09^{** \ }$\\
$p = .001$\\
$p_{c} = .007$ \\
\bottomrule
\end{tabular}
\end{center}

p = plot.Boxplot('y', 'a', data=ds, title="My Boxplot", ylabel="value", corr='Hochberg')

Gallery generated by Sphinx-Gallery