Skip to main content

Train Random Trees

Description

Train a Random Trees model using the Scikit-Learn implementation of the Breiman (2001) algorithm.

Usage

Given predictor variables and ground truth labels for the variables, train a Random Trees model that can be subsequently used to predict new areas to belong to one of the ground truth labels.

Parameters

Parameter NameTypeDirectionData TypeDialog Reference
Input Training Raster (*.tif)RequiredInputRaster LayerInput raster where cells are assigned an integer value representing a ground truth label. The known locations of each class are used to train the Random Trees model to identify that class from the predictor variables. Must be in TIF format.
Input Predictor Variables Raster (*.tif)RequiredInputMultiple ValueInput raster(s) with the same extents as the training raster, where cells represent characteristics that may describe the ground truth classes. Must be in TIF format.
Output Trained Model (*.JOBLIB)RequiredOutputFileName for the trained Random Trees model. Must be in JOBLIB format. If the directory does not exist, it will be created.
Prepared Predictor Variable Raster (*.tif)RequiredOutputRaster DatasetName of the output prepared predictor variable raster. This will either be a composite of the all input predictor variable rasters, or a copy of the single input predictor variable raster. If a composite raster is created, it must be the input used to execute Run Random Trees. If only one predictor variable is being used, the input to Run Random Trees can be this output or the original.
Output Variable Importance (*.txt)RequiredOutputFileName for the variable importance file. For each input predictor variable, the variable importance represents the estimated decrease in model accuracy if that variable was removed from the training phase. Must be in TXT format. If the directory does not exist, it will be created.
Number of TreesOptionalInputDouble(Integer) The number of trees that are "grown" in the Random Trees algorithm. Each tree represents a decision tree model that is built for a bootstrapped selection of cells from the input predictor variable raster(s). The final model predictions represent the majority vote among all of the trees grown.
Maximum Tree DepthOptionalInputAny Value(Integer or None) The maximum number of levels (i.e., decisions) made in each tree. Each decision in the tree aims to split the collection of predictor variables into unique groups that belong to a ground truth class with minimal impurity within each group. The default is "None", which will expand nodes until are leaves are pure.
Maximum Number of FeaturesOptionalInputAny Value(Integer, float, string or None) The maximum number of variables to consider when making a decision. Acceptable strings are "auto", "sqrt", "log2", "None". Do not include quotes when entering a string. Integers are acceptable and floats (<1) are acceptable. See Scikit-Learn documentation for further details. The default is "auto", which will set max features equal to the square root of the number of features.
Class WeightsOptionalInputValue TableWeights assigned to each target class to predict, where the weight represents the penalty for misclassifying that class. The default is "balanced" (blank), which will set the weight for each class to be inversely proportional to the occurrence of that class in the training data.