Skip to main content

Train Test Split

Description

Create two subsets of a ground truth raster for training and testing purposes by randomly sampling user-defined percentages of each ground truth class.

Usage

Create training and testing subsets from a ground truth raster for supervised classification applications. The training raster is created by randomly sampling user-defined percentages of each discrete class of the ground truth data. The testing raster is the complement of the training raster, comprised of all remaining cells in the ground truth extents. The testing raster represents a reserved group of ground truth cells that can be used to assess model accuracy for areas where it was not trained.

Parameters

Parameter NameTypeDirectionData TypeDialog Reference
Input Ground Truth Data (*.tif)RequiredInputRaster LayerInput raster of integer type where each discrete value corresponds to a ground truth class. Class values must be 0-indexed and have a sequential order
Training Sampling Percentage per ClassRequiredInputValue TablePercent to sample from each discrete class in the ground truth dataset. If less than 100 is entered, the training dataset will contain a subset of random cells from that class according to the percentage specified. The remainder will be added to the testing dataset. Users may find that undersampling or oversampling classes in an imbalanced ground truth dataset improves modeling results.
Output Training Raster (*.tif)RequiredOutputRaster DatasetName of the resulting training raster. Must be in TIFF format. The directory will be created if it does not exist.
Output Testing Raster (*.tif)RequiredOutputRaster DatasetName of the resulting testing raster. Must be in TIFF format. The directory will be created if it does not exist.
Optional Training Sampling Area ConstraintOptionalInputFeature LayerPolygon representing the area within which training cells will be sampled from. The polygon must be fully encompassed by the ground truth raster. If no constraint is used, training cells will be sampled from the entire ground truth raster.