Forest Quantile Regressor¶
-
class
skgrf.ensemble.
GRFForestQuantileRegressor
(n_estimators=100, quantiles=None, regression_splitting=False, equalize_cluster_weights=False, sample_fraction=0.5, mtry=None, min_node_size=5, honesty=True, honesty_fraction=0.5, honesty_prune_leaves=True, alpha=0.05, imbalance_penalty=0, n_jobs=- 1, seed=42, enable_tree_details=False)[source]¶ GRF Quantile Regression implementation for sci-kit learn.
Provides a sklearn quantile regressor interface to the GRF C++ library using Cython.
Warning
Because the training dataset is required for prediction, the training dataset is recorded onto the estimator instance. This means that serializing this estimator will result in a file at least as large as the serialized training dataset.
- Parameters
n_estimators (int) – The number of tree regressors to train
quantiles (list(float)) – A list of quantiles on which to predict.
regression_splitting (bool) – Use regression splits instead of splitting specially for quantiles.
equalize_cluster_weights (bool) – Weight the samples such that clusters have equally weight. If
False
, larger clusters will have more weight. IfTrue
, the number of samples drawn from each cluster is equal to the size of the smallest cluster. IfTrue
, sample weights should not be passed on fitting.sample_fraction (float) – Fraction of samples used in each tree.
mtry (int) – The number of features to split on each node. The default is
sqrt(p) + 20
wherep
is the number of features.min_node_size (int) – The minimum number of observations in each tree leaf.
honesty (bool) – Use honest splitting (subsample splitting).
honesty_fraction (float) – The fraction of data used for subsample splitting.
honesty_prune_leaves (bool) – Prune estimation sample tree such that no leaves are empty. If
False
, trees with empty leaves are skipped.alpha (float) – The maximum imbalance of a split.
imbalance_penalty (float) – Penalty applied to imbalanced splits.
n_jobs (int) – The number of threads. Default is number of CPU cores.
seed (int) – Random seed value.
enable_tree_details (bool) – When
True
, perform additional calculations for detailing the underlying decision trees. Must be enabled forestimators_
andget_estimator
to work. Very slow.
- Variables
estimators_ (list) – A list of tree objects from the forest.
n_features_in_ (int) – The number of features (columns) from the fit input
X
.grf_forest_ (dict) – The returned result object from calling C++ grf.
mtry_ (int) – The
mtry
value determined by validation.outcome_index_ (int) – The index of the grf train matrix holding the outcomes.
samples_per_cluster_ (list) – The number of samples to train per cluster.
clusters_ (list) – The cluster labels determined from the fit input
cluster
.n_clusters_ (int) – The number of unique cluster labels from the fit input
cluster
.train_ (array2d) – The
X,y
concatenated train matrix passed to grf.criterion (str) – The criterion used for splitting:
gini
-
fit
(X, y, cluster=None)[source]¶ Fit the grf quantile forest using training data.
- Parameters
X (array2d) – training input features
y (array1d) – training input targets
cluster (array1d) – optional cluster assignments for input samples
-
get_estimator
(idx)[source]¶ Extract a single estimator tree from the forest.
- Parameters
idx (int) – The index of the tree to extract.
-
get_feature_importances
(decay_exponent=2, max_depth=4)¶ Get the feature importances.
- Parameters
decay_exponent (int) – Exponential decay of importance by split depth
max_depth (int) – The maximum depth of splits to consider
-
get_kernel_weights
(X, oob_prediction=False)¶ Get training sample weights for test data.
Given a trained forest and test data, compute the kernel weights for each test point.
Creates a sparse matrix in which the value at (i, j) gives the weight of training sample j for test sample i. Use
oob_prediction=True
if using training set.- Parameters
X (array2d) – input features
oob_prediction (bool) – whether to calculate weights out of bag
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
-
get_split_frequencies
(max_depth=4)¶ Get the split frequencies of feature indexes at various depths.
- Parameters
max_depth (int) – The maximum depth of splits to consider
-
predict
(X)[source]¶ Predict quantile regression target(s) for X.
- Parameters
X (array2d) – prediction input features
-
score
(X, y, sample_weight=None)¶ Return the coefficient of determination \(R^2\) of the prediction.
The coefficient \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred) ** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters
X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns
score – \(R^2\) of
self.predict(X)
wrt. y.- Return type
float
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance