eelbrain.boosting
- eelbrain.boosting(y, x, tstart, tstop, scale_data=True, delta=0.005, mindelta=None, error='l2', basis=0, basis_window='hamming', partitions=None, model=None, validate=1, test=0, data=None, selective_stopping=0, partition_results=False, debug=False)
Estimate a linear filter with coordinate descent
- Parameters:
y (NDVar) – Signal to predict. When
y
contains more than one signal (e.g., multiple EEG channels), results for each signal will be computed independently. Muiltiple cases along aCase
dimension are treated as different trials which share a filter. For correlation fit metrics, aSpace
dimension is interpreted as defining a vector measure.x (NDVar | sequence of NDVar) – Signal to use to predict
y
. Can be sequence of NDVars to include multiple predictors. Time dimension must correspond toy
.tstart (scalar | sequence of scalar) – Start of the TRF in seconds. A list can be used to specify different values for each item in
x
.tstop (scalar | sequence of scalar) – Stop of the TRF in seconds. Format must match
tstart
.scale_data (bool | 'inplace') – Scale
y
andx
before boosting: subtract the mean and divide by the standard deviation (whenerror='l2'
) or the mean absolute value (whenerror='l1'
). Use'inplace'
to save memory by scaling the original objects specified asy
andx
instead of making a copy. The data scale is stored in theBoostingResult: :attr:
.y_mean``,y_scale
,x_mean
, andx_scale
attributes.delta (float) – Step for changes in the kernel.
mindelta (float) – If the error for the training data can’t be reduced, divide
delta
in half untildelta < mindelta
. The default ismindelta = delta
, i.e.delta
is constant.error (Literal['l1', 'l2']) –
Error function to use (default is
l2
).error='l1'
: the sum of the absolute differences betweeny
andh * x
.error='l2'
: the sum of the squared differences betweeny
andh * x
.
For vector
y
, the error is defined based on the distance in space for each data point.basis (float) – Use a basis of windows with this length for the kernel (by default, impulses are used).
basis_window (str | scalar | tuple) – Basis window (see
scipy.signal.get_window()
for options; default is'hamming'
).partitions (int) – Divide the data into this many
partitions
for cross-validation-based early stopping. In each partition,n - 1
segments are used for training, and the remaining segment is used for validation. If data is continuous, data are divided into contiguous segments of equal length (default 10). If data has cases, cases are divided with[::partitions]
slices (defaultmin(n_cases, 10)
; ifmodel
is specified,n_cases
is the lowest number of cases in any cell of the model). See Data partitions for boosting example.model (Factor | Interaction | NestedEffect | str) – If data has cases, divide cases into different categories (division for crossvalidation is done separately for each cell).
data (Dataset) – If provided, other parameters can be specified as string for items in
ds
.validate (int) – Number of segments in validation dataset (currently has to be 1).
test (int) – By default (
test=0
), the boosting algorithm uses all available data to estimate the kernel. Settest=1
to perform k-fold cross- validation instead (with k =partitions
): Each partition is used as test dataset in turn, while the remainingk-1
partitions are used to estimate the kernel. The resulting model fit metrics reflect the re-combination of all partitions, each one predicted from the corresponding, independent training set.selective_stopping (int) – By default, the boosting algorithm stops when the testing error stops decreasing. With
selective_stopping=True
, boosting continues but excludes the predictor (one time-series inx
) that caused the increase in testing error, and continues until all predictors are stopped. The integer value ofselective_stopping
determines after how many steps with error increases each predictor is excluded.partition_results (bool) – Keep results (TRFs and model evaluation) for each test-partition. This is disabled by default to reduce file size when saving results.
debug (bool) – Add additional attributes to the returned result.
- Return type:
See also
plot.preview_partitions
preview data partitions for cross-validation
Notes
The boosting algorithm is described in [1].
In order to predict data, use the
convolve()
function:>>> ds = datasets.get_uts() >>> data['a1'] = epoch_impulse_predictor('uts', 'A=="a1"', ds=data) >>> data['a0'] = epoch_impulse_predictor('uts', 'A=="a0"', ds=data) >>> res = boosting('uts', ['a0', 'a1'], 0, 0.5, partitions=10, model='A', data=data) >>> y_pred = convolve(res.h_scaled, ['a0', 'a1'], ds=data) >>> y = data['uts'] >>> plot.UTS([y-y.mean('time'), y_pred], '.case')
References