eelbrain.Dataset¶
-
class
eelbrain.
Dataset
(items=None, name=None, caption=None, info=None, n_cases=None)¶ Store multiple variables pertaining to a common set of measurement cases
Parameters: - items : dict | list
Items in the Dataset (either specified as
{key: data_object}
dictionary, or as[data_object]
list in which data-object names will be used as keys). The Dataset stores the input items directly, without making a copy.- name : str
Name for the Dataset.
- caption : str
Caption for the table.
- info : dict
Info dictionary, can contain arbitrary entries and can be accessed as
.info
attribute after initialization. The Dataset makes a shallow copy.- n_cases : int
Specify the number of cases in the Dataset if no items are added upon initialization (by default the number is inferred when the fist item is added).
Notes
A Dataset represents a data table as a
{variable_name: value_list}
dictionary. Each variable corresponds to a column, and each index in the value list corresponds to a row, or case.The Dataset class inherits basic behavior from
dict
. Dictionary keys are enforced to bestr
objects and should correspond to the variable names. As for a dictionary, The Dataset’s length (len(ds)
) reflects the number of variables in the Dataset (i.e., the number of rows).Assigning data
The
Dataset
assumes certain properties of the items that are assigned, for example they need to supportnumpy
indexing. When assigning items that are noteelbrain
data containers, they are coerced in the following manner:- 1-d
numpy.ndarray
are coerced toVar
; othernumpy.ndarray
are assigned as is - Objects conforming to the Python
collections.Sequence
abstract base class are coerced toDatalist
mne.Epochs
are assigned as is- For advanced use, additional classes can be assigned as is by extending the
Dataset._value_type_exceptions
class attribute tuple
Accessing Data
Standard indexing with
str
is used to access the contained Var and Factor objects:ds['var1']
–>var1
.ds['var1',]
–>Dataset([var1])
.ds['var1', 'var2']
–>Dataset([var1, var2])
When indexing numerically, the first index defines cases (rows):
ds[1]
–> row 1ds[1:5]
ords[1,2,3,4]
–> rows 1 through 4ds[1, 5, 6, 9]
ords[[1, 5, 6, 9]]
–> rows 1, 5, 6 and 9
The second index accesses columns, so case indexing can be combined with column indexing:
ds[:4, :2]
–> first 4 rows of first 2 columns
Index a single case retrieves an individual case as
{name: value}
dictionaries:ds[1]
–>{'var': 1, 'factor': 'value', ...}
The
itercases()
method can be used to iterate over cases asdict
.Naming
While Var and Factor objects themselves need not be named, they need to be named when added to a Dataset. This can be done by a) adding a name when initializing the Dataset:
>>> ds = Dataset((('v1', var1), ('v2', var2)))
or b) by adding the Var or Factor with a key:
>>> ds['v3'] = var3
If a Var/Factor that is added to a Dataset does not have a name, the new key is automatically assigned to the Var/Factor’s
.name
attribute.Examples
- Introduction: basic functionality
- Dataset basics: how to construct datasets
Attributes: - n_cases : None | int
The number of cases in the Dataset (corresponding to the number of rows in the table representation). None if no variables have been added.
- n_items : int
The number of items (variables) in the Dataset (corresponding to the number of columns in the table representation).
Methods¶
add (self, item[, replace]) |
ds.add(item) -> ds[item.name] = item |
add_empty_var (self, name[, dtype]) |
Create an empty variable in the dataset |
aggregate (self[, x, drop_empty, name, …]) |
Return a Dataset with one case for each cell in x. |
as_key (name) |
Convert a string name to a legal dataset key |
as_table (self[, cases, fmt, sfmt, sort, …]) |
Create an fmtxt.Table containing all Vars and Factors in the Dataset. |
clear () |
|
copy (self[, name]) |
Create a shallow copy of the dataset |
equalize_counts (self, x[, n]) |
Create a copy of the Dataset with equal counts in each cell of x |
eval (self, expression) |
Evaluate an expression involving items stored in the Dataset. |
from_caselist (names, cases[, name, caption, …]) |
Create a Dataset from a list of cases |
from_r (name) |
Create a Dataset from an R data frame through rpy2 |
fromkeys (iterable[, value]) |
Returns a new dict with keys from iterable and values equal to value. |
get () |
|
get_case (self, i) |
The i’th case as a dictionary |
get_subsets_by (self, x[, exclude, name]) |
Split the Dataset by the cells of x |
head (self[, n]) |
Table with the first n cases in the Dataset |
index (self[, name, start]) |
Add an index to the Dataset (i.e., range(n_cases) ) |
items () |
|
itercases (self[, start, stop]) |
Iterate through cases (each case represented as a dict) |
keys () |
|
pop () |
If key is not found, d is returned if given, otherwise KeyError is raised |
popitem () |
2-tuple; but raise KeyError if D is empty. |
rename (self, old, new) |
Shortcut to rename a data-object in the Dataset. |
repeat (self, repeats[, name]) |
Return a new Dataset with each row repeated n times. |
save (self) |
Shortcut to save the Dataset, will display a system file dialog |
save_pickled (self[, path]) |
Pickle the Dataset. |
save_rtf (self[, path, fmt]) |
Save the Dataset as TeX table. |
save_tex (self[, path, fmt, header, midrule]) |
Save the Dataset as TeX table. |
save_txt (self[, path, fmt, delimiter, …]) |
Save the Dataset as text file. |
setdefault () |
|
sort (self, order[, descending]) |
Sort the Dataset in place. |
sort_index (self, order[, descending]) |
Create an index that could be used to sort the Dataset. |
sorted (self, order[, descending]) |
Create an sorted copy of the Dataset. |
sub (self[, index, keys, name]) |
Access a subset of the data in the Dataset. |
summary (self[, width]) |
A summary of the Dataset’s contents |
tail (self[, n]) |
Table with the last n cases in the Dataset |
tile (self, repeats[, name]) |
Concatenate repeats copies of the dataset |
to_r (self[, name]) |
Place the Dataset into R as dataframe using rpy2 |
update (self, ds[, replace, info]) |
Update the Dataset with all variables in ds . |
values () |
|
zip (self, *variables) |
Iterate through the values of multiple variables |