# Dataset basics¶

```# Author: Christian Brodbeck <christianbrodbeck@nyu.edu>
from eelbrain import *
import numpy as np
```

A dataset can be constructed column by column, by adding one variable after another:

```# initialize an empty Dataset:
ds = Dataset()
# numeric values are added as Var object:
ds['y'] = Var(np.random.normal(0, 1, 6))
# categorical data as represented in Factors:
ds['a'] = Factor(['a', 'b', 'c'], repeat=2)
# check the result:
print(ds)
```

Out:

```y           a
-------------
-0.035148   a
-0.084978   a
0.26078     b
-0.76769    b
-1.4068     c
-0.68582    c
```

For larger datasets it can be more convenient to print only the first few cases…

```print(ds.head())
```

Out:

```y           a
-------------
-0.035148   a
-0.084978   a
0.26078     b
-0.76769    b
-1.4068     c
-0.68582    c
```

… or a summary of variables:

```print(ds.summary())
```

Out:

```Key   Type     Values
-------------------------------------------------------------------------------
y     Var      -1.40682, -0.767689, -0.685822, -0.0849778, -0.0351483, 0.260784
a     Factor   a:2, b:2, c:2
-------------------------------------------------------------------------------
Dataset: 6 cases
```

An alternative way of constructing a dataset is case by case (i.e., row by row):

```rows = []
for i in range(6):
subject = f'S{i}'
y = np.random.normal(0, 1)
a = 'abc'[i % 3]
rows.append([subject, y, a])
ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject')
print(ds)
```

Out:

```subject   y          a
----------------------
S0        1.3183     a
S1        -0.84995   b
S2        1.755      c
S3        -0.5812    a
S4        0.54568    b
S5        0.65884    c
```

Gallery generated by Sphinx-Gallery