Skip to contents

Detects important pairwise interactions using a simple interaction detector inspired by the more sophisticated methodology proposed in Lou et al. (2013).

In contrast to the original FAST algorithm, this detector is a crude implementation. It discretizes all features based on a relatively small grid size, and the same unique grid values are used as potential cut points for a feature to find the best pair of cut points for each feature pair that results in the largest decrease in the residual sum of squares.

Additionally, please note that targets are simply converted to numeric values in the case of a mlr3::TaskClassif, instead of working on logits or model outputs of a previously computed proxy model.

Moreover, remember that this detector is solely used to initialize the group structures of the initial population within TunerEAGGA. Therefore, some impreciseness is acceptable.

This interaction detector only works with integer or numeric features. Logical features must be converted to integers.

References

  • Lou, Yin, Caruana, Rich, Gehrke, Johannes, Hooker, Giles (2013). “Accurate Intelligible Models with Pairwise Interactions.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 623--631.

Public fields

xs

(data.table::data.table)
Data.table of features from the mlr3::Task. Note that feature values of each feature are discretized based on the corresponding feature grid in $grids.

n_features

(integer(1))
Number of features in the mlr3::Task.

feature_names

(character())
Names of the features in the mlr3::Task.

feature_types

(character())
Original types of the features in the mlr3::Task prior to discretization.

y

(numeric())
Numeric target vector of the mlr3::Task. In the case of a mlr3::TaskClassif, the label is simply converted to numeric.

grids

(list())
List of grids containing a grid for each feature. Grids are constructed by computing quantiles on the feature with probabilities ranging from seq(from = 0, to = 1, length.out = grid_size). For integer features, the resulting quantiles are rounded to the nearest integer, and the unique values are taken, which may result in a grid of smaller size than grid_size.

cuts

(list())
List of cut points containing cut points for each feature. Cut points for features are based on their unique values after discretization using their corresponding grid.

rss

(matrix())
Symmetric numeric matrix of dimensions n_features by n_features containing the reduction in the residual sum of squares for each pair of features. NULL after construction. To compute this, use $compute_best_rss.

Methods


Method new()

Creates a new instance of this R6 class.

Usage

InteractionDetector$new(task, grid_size = 11L)

Arguments

task

(mlr3::Task)
The task.

grid_size

(integer(1))
The grid size used to construct a grid for each feature. The default value is 11. A grid for a feature is constructed by computing quantiles on the feature with probabilities ranging from seq(from = 0, to = 1, length.out = grid_size). If there are not at least three unique grid values that can be found based on quantiles (resulting in at least two bins for the discretized feature), the range of the feature is split into two intervals of equal width, and the corresponding interval points are used as grid values as a fallback. For integer features, the resulting quantiles are rounded to the nearest integer. Grid values are always assumed to be unique. Therefore, the actual number of unique grid values may be smaller than grid_size.


Method compute_best_rss()

Method to compute the reduction in the residual sum of squares for each pair of features, overwriting $rss.

Usage

InteractionDetector$compute_best_rss()

Returns

Invisible (NULL)


Method get_eqcs_from_top_k()

Retrieves equivalence classes (groups of features) based on the top k most important pairwise interactions. The interactions are determined by the reduction in the residual sum of squares for each pair of features.

Usage

InteractionDetector$get_eqcs_from_top_k(k = 1L, features = NULL)

Arguments

k

(integer(1))
The number of top interactions to consider. The default value is 1.

features

(character() | NULL)
The features to consider for detecting interactions. If not provided, all features will be used (default).

Returns

A named integer vector indicating the equivalence class each feature belongs to.


Method clone()

The objects of this class are cloneable with this method.

Usage

InteractionDetector$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.