Interaction Detector
InteractionDetector.Rd
Detects important pairwise interactions using a simple interaction detector inspired by the more sophisticated methodology proposed in Lou et al. (2013).
In contrast to the original FAST algorithm, this detector is a crude implementation. It discretizes all features based on a relatively small grid size, and the same unique grid values are used as potential cut points for a feature to find the best pair of cut points for each feature pair that results in the largest decrease in the residual sum of squares.
Additionally, please note that targets are simply converted to numeric values in the case of a mlr3::TaskClassif, instead of working on logits or model outputs of a previously computed proxy model.
Moreover, remember that this detector is solely used to initialize the group structures of the initial population within TunerEAGGA. Therefore, some impreciseness is acceptable.
This interaction detector only works with integer or numeric features. Logical features must be converted to integers.
References
Lou, Yin, Caruana, Rich, Gehrke, Johannes, Hooker, Giles (2013). “Accurate Intelligible Models with Pairwise Interactions.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 623--631.
Public fields
xs
(data.table::data.table)
Data.table of features from the mlr3::Task. Note that feature values of each feature are discretized based on the corresponding feature grid in$grids
.n_features
(integer(1))
Number of features in the mlr3::Task.feature_names
(character())
Names of the features in the mlr3::Task.feature_types
(character())
Original types of the features in the mlr3::Task prior to discretization.y
(numeric())
Numeric target vector of the mlr3::Task. In the case of a mlr3::TaskClassif, the label is simply converted to numeric.grids
(list())
List of grids containing a grid for each feature. Grids are constructed by computing quantiles on the feature with probabilities ranging fromseq(from = 0, to = 1, length.out = grid_size)
. For integer features, the resulting quantiles are rounded to the nearest integer, and the unique values are taken, which may result in a grid of smaller size thangrid_size
.cuts
(list())
List of cut points containing cut points for each feature. Cut points for features are based on their unique values after discretization using their corresponding grid.rss
(matrix())
Symmetric numeric matrix of dimensionsn_features
byn_features
containing the reduction in the residual sum of squares for each pair of features.NULL
after construction. To compute this, use$compute_best_rss
.
Methods
Method new()
Creates a new instance of this R6 class.
Usage
InteractionDetector$new(task, grid_size = 11L)
Arguments
task
(mlr3::Task)
The task.grid_size
(integer(1))
The grid size used to construct a grid for each feature. The default value is11
. A grid for a feature is constructed by computing quantiles on the feature with probabilities ranging fromseq(from = 0, to = 1, length.out = grid_size)
. If there are not at least three unique grid values that can be found based on quantiles (resulting in at least two bins for the discretized feature), the range of the feature is split into two intervals of equal width, and the corresponding interval points are used as grid values as a fallback. For integer features, the resulting quantiles are rounded to the nearest integer. Grid values are always assumed to be unique. Therefore, the actual number of unique grid values may be smaller thangrid_size
.
Method compute_best_rss()
Method to compute the reduction in the residual sum of squares for each pair of features, overwriting $rss
.
Method get_eqcs_from_top_k()
Retrieves equivalence classes (groups of features) based on the top k most important pairwise interactions. The interactions are determined by the reduction in the residual sum of squares for each pair of features.