# Dataset basics

```# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy as np
```

A dataset can be constructed column by column, by adding one variable after another:

```# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(np.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# check the result:
print(ds)
```
```y          a
------------
-1.6458    a
-0.24622   a
0.48274    b
0.8733     b
-0.26879   c
0.69281    c
```

For larger datasets it can be more convenient to print only the first few cases…

```print(ds.head())
```
```y          a
------------
-1.6458    a
-0.24622   a
0.48274    b
0.8733     b
-0.26879   c
0.69281    c
```

… or a summary of variables:

```print(ds.summary())
```
```Key   Type     Values
--------------------------------------------------------------------------
y     Var      -1.64581, -0.26879, -0.246219, 0.482743, 0.692815, 0.873297
a     Factor   a:2, b:2, c:2
--------------------------------------------------------------------------
Dataset: 6 cases
```

An alternative way of constructing a dataset is case by case (i.e., row by row):

```rows = []
for i in range(6):
subject = f'S{i}'
y = np.random.normal(0, 1)
a = 'abc'[i % 3]
rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
print(ds)
```
```subject   y          a
----------------------
S0        1.1242     a
S1        -0.20257   b
S2        -1.1141    c
S3        0.030536   a
S4        -0.41999   b
S5        1.5798     c
```

Gallery generated by Sphinx-Gallery