.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/datasets/dataset-basics.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_datasets_dataset-basics.py: .. _exa-dataset: Dataset basics ============== .. contents:: Contents :local: Load and prepare an example dataset: .. GENERATED FROM PYTHON SOURCE LINES 12-23 .. code-block:: Python # Author: Christian Brodbeck from eelbrain import * import numpy import pandas df = pandas.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/psych/Tal_Or.csv') data = Dataset.from_dataframe(df) data['cond'] = data['cond'].as_factor({0: 'low', 1: 'high'}) data['gender'] = data['gender'].as_factor({1: 'male', 2: 'female'}) .. GENERATED FROM PYTHON SOURCE LINES 24-29 Inspecting datasets ------------------- The whole dataset can be displayed like any variable in iPython (in a plain text environment, use ``print(data)``). For larger datasets it can be more convenient to print only the first few cases... .. GENERATED FROM PYTHON SOURCE LINES 29-32 .. code-block:: Python data.head() .. raw:: html
# rownames cond pmi import_ reaction gender age
0 1 high 7 6 5.25 male 51
1 2 low 6 1 1.25 male 40
2 3 high 5.5 6 5 male 26
3 4 low 6.5 6 2.75 female 21
4 5 low 6 5 2.5 male 27
5 6 low 5.5 1 1.25 male 25
6 7 low 3.5 1 1.5 female 23
7 8 high 6 6 4.75 male 25
8 9 low 4.5 6 4.25 male 22
9 10 low 7 6 6.25 male 24


.. GENERATED FROM PYTHON SOURCE LINES 33-34 ... or a summary of variables: .. GENERATED FROM PYTHON SOURCE LINES 34-36 .. code-block:: Python data.summary() .. raw:: html
Key Type Values
rownames Var 1 - 123
cond Factor low:65, high:58
pmi Var 1 - 7
import_ Var 1:11, 2:13, 3:16, 4:26, 5:24, 6:23, 7:10
reaction Var 1 - 7
gender Factor male:43, female:80
age Var 18 - 61
Dataset: 123 cases


.. GENERATED FROM PYTHON SOURCE LINES 37-38 Individual rows and columns can be retrieved with common indexing: .. GENERATED FROM PYTHON SOURCE LINES 38-41 .. code-block:: Python data[10:15] .. raw:: html
# rownames cond pmi import_ reaction gender age
0 11 high 1 3 1.25 female 22
1 12 low 6 3 2.75 female 21
2 13 high 5 4 3.75 female 23
3 14 low 7 7 5 female 21
4 15 high 7 1 4 female 22


.. GENERATED FROM PYTHON SOURCE LINES 42-45 .. code-block:: Python data[2] .. rst-class:: sphx-glr-script-out .. code-block:: none {'rownames': 3, 'cond': 'high', 'pmi': 5.5, 'import_': 6, 'reaction': 5.0, 'gender': 'male', 'age': 26.0} .. GENERATED FROM PYTHON SOURCE LINES 46-49 .. code-block:: Python data['age'] .. rst-class:: sphx-glr-script-out .. code-block:: none Var([51, 40, 26, 21, 27, 25, 23, 25, 22, 24, 22, 21, 23, 21, 22, 23, 23, 23, 22, 23, 22, 19.5, 61, 25, 23, 60, 22, 23, 22, 23, 25, 22, 23, 22, 25, 24, 24, 29, 24, 18, 23, 21, 24, 26, 24, 22, 21, 26, 24, 27, 26, 24, 24, 26, 24, 22, 23, 24, 24, 25, 23, 23, 23, 24, 18, 23, 25, 24, 23, 23, 24, 22, 24, 25, 22, 22, 23, 25, 23, 23, 24, 21, 23, 21, 23, 19, 25, 23, 22, 19, 23, 24, 32, 27, 25, 24, 23, 28, 24, 24, ... (N=123)], name='age') .. GENERATED FROM PYTHON SOURCE LINES 50-56 Using datasets in functions --------------------------- Datasets collect information describing the same cases (rows) on different variables (columns). This can simplify calling functions that combine information from multiple columns. Columns can be supplied as strings, and the dataset in the ``data`` parameter: .. GENERATED FROM PYTHON SOURCE LINES 56-59 .. code-block:: Python table.frequencies('cond', 'gender', data=data) .. raw:: html
# gender low high
0 male 19 24
1 female 46 34


.. GENERATED FROM PYTHON SOURCE LINES 60-63 .. code-block:: Python p = plot.Scatter('pmi', 'age', 'gender', data=data, w=3, legend=(.65, .2), alpha=.4) .. image-sg:: /auto_examples/datasets/images/sphx_glr_dataset-basics_001.png :alt: dataset basics :srcset: /auto_examples/datasets/images/sphx_glr_dataset-basics_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 64-66 These strings cannot only be keys, but they can be Python code that can be evaluated in the dataset. For example, if this is possible: .. GENERATED FROM PYTHON SOURCE LINES 66-69 .. code-block:: Python data.eval('age < 40') # equivalent to `data['age'] < 40` .. rst-class:: sphx-glr-script-out .. code-block:: none array([False, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, True, True, False, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]) .. GENERATED FROM PYTHON SOURCE LINES 70-71 Then, this can be used directly for plotting: .. GENERATED FROM PYTHON SOURCE LINES 71-74 .. code-block:: Python p = plot.Scatter('pmi', 'age', 'gender', sub="age < 40", data=data, w=3, legend=(.65, .4), alpha=.4) .. image-sg:: /auto_examples/datasets/images/sphx_glr_dataset-basics_002.png :alt: dataset basics :srcset: /auto_examples/datasets/images/sphx_glr_dataset-basics_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 75-76 As in other cases, ``%`` is used to specify interaction between categorial variables: .. GENERATED FROM PYTHON SOURCE LINES 76-79 .. code-block:: Python p = plot.Barplot('age', 'cond % gender', data=data, w=3) .. image-sg:: /auto_examples/datasets/images/sphx_glr_dataset-basics_003.png :alt: dataset basics :srcset: /auto_examples/datasets/images/sphx_glr_dataset-basics_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 80-81 And ``*`` expands to main effects plus interaction: .. GENERATED FROM PYTHON SOURCE LINES 81-84 .. code-block:: Python test.ANOVA('age', 'cond * gender', data=data) .. raw:: html
SS df MS F p
cond 0.97 1 0.97 0.03 .860
gender 414.69 1 414.69 13.39*** < .001
cond x gender 3.02 1 3.02 0.10 .755
Residuals 3685.10 119 30.97
Total 4105.42 122


.. GENERATED FROM PYTHON SOURCE LINES 85-92 Constructing datasets --------------------- While datasets can be imported from external data sources, it is also often convenient to store new data in a table on the fly. A dataset can be constructed column by column, by adding one variable after another: .. GENERATED FROM PYTHON SOURCE LINES 92-104 .. code-block:: Python # initialize an empty Dataset: ds = Dataset() # numeric values are added as Var object: ds['y'] = Var(numpy.random.normal(0, 1, 6)) # categorical data as represented in Factors: ds['a'] = Factor(['a', 'b', 'c'], repeat=2) # A variable that's equal in all cases can be assigned quickly: ds[:, 'z'] = 0. # check the result: ds .. raw:: html
# y a z
0 1.278 a 0
1 1.4323 a 0
2 0.35583 b 0
3 0.57962 b 0
4 -1.5772 c 0
5 -0.17162 c 0


.. GENERATED FROM PYTHON SOURCE LINES 105-107 An alternative way of constructing a dataset is case by case (i.e., row by row): .. GENERATED FROM PYTHON SOURCE LINES 107-116 .. code-block:: Python rows = [] for i in range(6): subject = f'S{i}' y = numpy.random.normal(0, 1) a = 'abc'[i % 3] rows.append([subject, y, a]) ds = Dataset.from_caselist(['subject', 'y', 'a'], rows, random='subject') ds .. raw:: html
# subject y a
0 S0 -1.9398 a
1 S1 -0.075872 b
2 S2 0.84564 c
3 S3 0.77702 a
4 S4 0.46316 b
5 S5 -0.18091 c


.. _sphx_glr_download_auto_examples_datasets_dataset-basics.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: dataset-basics.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: dataset-basics.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: dataset-basics.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_