eelbrain.Dataset

class eelbrain.Dataset(items=None, name=None, caption=None, info=None, n_cases=None)

Store multiple variables pertaining to a common set of measurement cases

Parameters:
items : dict | list

Items in the Dataset (either specified as {key: data_object} dictionary, or as [data_object] list in which data-object names will be used as keys). The Dataset stores the input items directly, without making a copy.

name : str

Name for the Dataset.

caption : str

Caption for the table.

info : dict

Info dictionary, can contain arbitrary entries and can be accessed as .info attribute after initialization. The Dataset makes a shallow copy.

n_cases : int

Specify the number of cases in the Dataset if no items are added upon initialization (by default the number is inferred when the fist item is added).

Notes

A Dataset represents a data table as a {variable_name: value_list} dictionary. Each variable corresponds to a column, and each index in the value list corresponds to a row, or case.

The Dataset class inherits basic behavior from dict. Dictionary keys are enforced to be str objects and should correspond to the variable names. As for a dictionary, The Dataset’s length (len(ds)) reflects the number of variables in the Dataset (i.e., the number of rows).

Assigning data

The Dataset assumes certain properties of the items that are assigned, for example they need to support numpy indexing. When assigning items that are not eelbrain data containers, they are coerced in the following manner:

  • 1-d numpy.ndarray are coerced to Var; other numpy.ndarray are assigned as is
  • Objects conforming to the Python collections.Sequence abstract base class are coerced to Datalist
  • mne.Epochs are assigned as is
  • For advanced use, additional classes can be assigned as is by extending the Dataset._value_type_exceptions class attribute tuple

Accessing Data

Standard indexing with str is used to access the contained Var and Factor objects:

  • ds['var1'] –> var1.
  • ds['var1',] –> Dataset([var1]).
  • ds['var1', 'var2'] –> Dataset([var1, var2])

When indexing numerically, the first index defines cases (rows):

  • ds[1] –> row 1
  • ds[1:5] or ds[1,2,3,4] –> rows 1 through 4
  • ds[1, 5, 6, 9] or ds[[1, 5, 6, 9]] –> rows 1, 5, 6 and 9

The second index accesses columns, so case indexing can be combined with column indexing:

  • ds[:4, :2] –> first 4 rows of first 2 columns

Index a single case retrieves an individual case as {name: value} dictionaries:

  • ds[1] –> {'var': 1, 'factor': 'value', ...}

The itercases() method can be used to iterate over cases as dict.

Naming

While Var and Factor objects themselves need not be named, they need to be named when added to a Dataset. This can be done by a) adding a name when initializing the Dataset:

>>> ds = Dataset((('v1', var1), ('v2', var2)))

or b) by adding the Var or Factor with a key:

>>> ds['v3'] = var3

If a Var/Factor that is added to a Dataset does not have a name, the new key is automatically assigned to the Var/Factor’s .name attribute.

Examples

Attributes:
n_cases : None | int

The number of cases in the Dataset (corresponding to the number of rows in the table representation). None if no variables have been added.

n_items : int

The number of items (variables) in the Dataset (corresponding to the number of columns in the table representation).

Methods

add(self, item[, replace]) ds.add(item) -> ds[item.name] = item
add_empty_var(self, name[, dtype]) Create an empty variable in the dataset
aggregate(self[, x, drop_empty, name, …]) Return a Dataset with one case for each cell in x.
as_key(name) Convert a string name to a legal dataset key
as_table(self[, cases, fmt, sfmt, sort, …]) Create an fmtxt.Table containing all Vars and Factors in the Dataset.
clear()
copy(self[, name]) Create a shallow copy of the dataset
equalize_counts(self, x[, n]) Create a copy of the Dataset with equal counts in each cell of x
eval(self, expression) Evaluate an expression involving items stored in the Dataset.
from_caselist(names, cases[, name, caption, …]) Create a Dataset from a list of cases
from_r(name) Create a Dataset from an R data frame through rpy2
fromkeys(iterable[, value]) Returns a new dict with keys from iterable and values equal to value.
get()
get_case(self, i) The i’th case as a dictionary
get_subsets_by(self, x[, exclude, name]) Split the Dataset by the cells of x
head(self[, n]) Table with the first n cases in the Dataset
index(self[, name, start]) Add an index to the Dataset (i.e., range(n_cases))
items()
itercases(self[, start, stop]) Iterate through cases (each case represented as a dict)
keys()
pop() If key is not found, d is returned if given, otherwise KeyError is raised
popitem() 2-tuple; but raise KeyError if D is empty.
rename(self, old, new) Shortcut to rename a data-object in the Dataset.
repeat(self, repeats[, name]) Return a new Dataset with each row repeated n times.
save(self) Shortcut to save the Dataset, will display a system file dialog
save_pickled(self[, path]) Pickle the Dataset.
save_rtf(self[, path, fmt]) Save the Dataset as TeX table.
save_tex(self[, path, fmt, header, midrule]) Save the Dataset as TeX table.
save_txt(self[, path, fmt, delimiter, …]) Save the Dataset as text file.
setdefault()
sort(self, order[, descending]) Sort the Dataset in place.
sort_index(self, order[, descending]) Create an index that could be used to sort the Dataset.
sorted(self, order[, descending]) Create an sorted copy of the Dataset.
sub(self[, index, keys, name]) Access a subset of the data in the Dataset.
summary(self[, width]) A summary of the Dataset’s contents
tail(self[, n]) Table with the last n cases in the Dataset
tile(self, repeats[, name]) Concatenate repeats copies of the dataset
to_r(self[, name]) Place the Dataset into R as dataframe using rpy2
update(self, ds[, replace, info]) Update the Dataset with all variables in ds.
values()
zip(self, *variables) Iterate through the values of multiple variables