eelbrain.Dataset

class eelbrain.Dataset(items=None, name=None, caption=None, info=None, n_cases=None)

Store multiple variables pertaining to a common set of measurement cases

Parameters:
  • items (dict | list) – Items in the Dataset (either specified as {key: data_object} dictionary, or as [data_object] list in which data-object names will be used as keys). The Dataset stores the input items directly, without making a copy.

  • name (str) – Name for the Dataset.

  • caption (str) – Caption for the table.

  • info (dict) – Info dictionary, can contain arbitrary entries and can be accessed as .info attribute after initialization. The Dataset makes a shallow copy.

  • n_cases (int) – Specify the number of cases in the Dataset if no items are added upon initialization (by default the number is inferred when the fist item is added).

Variables:
  • n_cases (None | int) – The number of cases in the Dataset (corresponding to the number of rows in the table representation). None if no variables have been added.

  • n_items (int) – The number of items (variables) in the Dataset (corresponding to the number of columns in the table representation).

Notes

A Dataset represents a data table as a {variable_name: value_list} dictionary. Each variable corresponds to a column, and each index in the value list corresponds to a row, or case.

The Dataset class inherits basic behavior from dict. Dictionary keys are enforced to be str objects and should correspond to the variable names. As for a dictionary, The Dataset’s length (len(ds)) reflects the number of variables in the Dataset (i.e., the number of rows).

Assigning data

The Dataset assumes certain properties of the items that are assigned, for example they need to support numpy indexing. When assigning items that are not eelbrain data containers, they are coerced in the following manner:

  • 1-d numpy.ndarray are coerced to Var; other numpy.ndarray are assigned as is

  • Objects conforming to the Python collections.Sequence abstract base class are coerced to Datalist

  • mne.Epochs are assigned as is

  • For advanced use, additional classes can be assigned as is by extending the Dataset._value_type_exceptions class attribute tuple

Accessing Data

Standard indexing with str is used to access the contained Var and Factor objects:

  • ds['var1'] –> var1.

  • ds['var1',] –> Dataset([var1]).

  • ds['var1', 'var2'] –> Dataset([var1, var2])

When indexing numerically, the first index defines cases (rows):

  • ds[1] –> row 1

  • ds[1:5] or ds[1,2,3,4] –> rows 1 through 4

  • ds[1, 5, 6, 9] or ds[[1, 5, 6, 9]] –> rows 1, 5, 6 and 9

The second index accesses columns, so case indexing can be combined with column indexing:

  • ds[:4, :2] –> first 4 rows of first 2 columns

Index a single case retrieves an individual case as {name: value} dictionaries:

  • ds[1] –> {'var': 1, 'factor': 'value', ...}

The itercases() method can be used to iterate over cases as dict.

Naming

While Var and Factor objects themselves need not be named, they need to be named when added to a Dataset. This can be done by a) adding a name when initializing the Dataset:

>>> ds = Dataset((('v1', var1), ('v2', var2)))

or b) by adding the Var or Factor with a key:

>>> ds['v3'] = var3

If a Var/Factor that is added to a Dataset does not have a name, the new key is automatically assigned to the Var/Factor’s .name attribute.

Examples

Methods

add(item[, replace])

ds.add(item) -> ds[item.name] = item

add_empty_var(name[, dtype])

Create an empty variable in the dataset

aggregate([x, drop_empty, name, count, ...])

Return a Dataset with one case for each cell in x.

as_dataframe()

Convert to a pandas.DataFrame

as_key(name[, default])

Convert a string name to a legal dataset key

as_table([cases, fmt, sfmt, sort, header, ...])

Create an fmtxt.Table containing all Vars and Factors in the Dataset.

clear()

copy([name])

Create a shallow copy of the dataset

equalize_counts(x[, n])

Create a copy of the Dataset with equal counts in each cell of x

eval(expression)

Evaluate an expression involving items stored in the Dataset.

from_caselist(names, cases[, name, caption, ...])

Create a Dataset from a list of cases

from_r(name)

Create a Dataset from an R data frame through rpy2

fromkeys([value])

Create a new dictionary with keys from iterable and values set to value.

get(key[, default])

Return the value for key if key is in the dictionary, else default.

get_case(i)

The i'th case as a dictionary

get_subsets_by(x[, exclude, name])

Split the Dataset by the cells of x

head([n, title])

Table with the first n cases in the Dataset

index([name, start])

Add an index to the Dataset (i.e., range(n_cases))

items()

itercases([start, stop])

Iterate through cases (each case represented as a dict)

keys()

pop(k[,d])

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

rename(old, new)

Shortcut to rename a data-object in the Dataset.

repeat(repeats[, name])

Return a new Dataset with each row repeated n times.

save()

Shortcut to save the Dataset, will display a system file dialog

save_pickled([path])

Pickle the Dataset.

save_rtf([path, fmt])

Save the Dataset as TeX table.

save_tex([path, fmt, header, midrule])

Save the Dataset as TeX table.

save_txt([path, fmt, delimiter, header, nan])

Save the Dataset as text file.

setdefault(key[, default])

Insert key with a value of default if key is not in the dictionary.

sort(order[, descending])

Sort the Dataset in place.

sort_index(order[, descending])

Create an index that could be used to sort the Dataset.

sorted(order[, descending])

Create an sorted copy of the Dataset.

sub([index, keys, name])

Access a subset of the data in the Dataset.

summary([width])

A summary of the Dataset's contents

tail([n, title])

Table with the last n cases in the Dataset

tile(repeats[, name])

Concatenate repeats copies of the dataset

to_r([name])

Place the Dataset into R as dataframe using rpy2

update(ds[, replace, info])

Update the Dataset with all variables in ds.

values()

zip(*variables)

Iterate through the values of multiple variables