Zonal Statistics#

Table of Contents#

What is Processor ZonalStatistics
Processor ZonalStatistics Example

Part 1: What is Processor ZonalStatistics#

ZonalStatistics is a processor that calculates statistics on the values of the raster cells a given polygon overlaps.

ZonalStatistics Overview

Part 2: Processor ZonalStatistics Example#

Setup BDT#

[ ]:

import pyspark
import bdt
bdt.auth("bdt.lic")
from bdt import functions as F
from bdt import processors as P

BDT has been successfully authorized!

            Welcome to
             ___    _                ___         __             ______             __   __     _   __
            / _ )  (_)  ___ _       / _ \ ___ _ / /_ ___ _     /_  __/ ___  ___   / /  / /__  (_) / /_
           / _  | / /  / _ `/      / // // _ `// __// _ `/      / /   / _ \/ _ \ / /  /  '_/ / / / __/
          /____/ /_/   \_, /      /____/ \_,_/ \__/ \_,_/      /_/    \___/\___//_/  /_/\_\ /_/  \__/
                      /___/

BDT python version: v3.4.0-v3.4.0
BDT jar version: v3.4.0-v3.4.0

Input Raster#

Elevation data from the US Geologic Survey and the National Geospatial-Intelligence Agency

Available on ArcGIS Living Atlas

Cut down to the LA area as a sample

[ ]:

raster_file = "World_Elevation_GMTE_LA.tif"

Drawing

Input Data#

Create a sample polygon that overlaps some raster cells from the World_Elevation_GMTE_LA.tif.

Assign it a unique ID, PARCEL_ID in this example.

[ ]:

polygon = """
        POLYGON((
        -117.6878845 34.1968949,
        -117.6756497 34.1968949,
        -117.6754958 34.1939036,
        -117.6877306 34.1942218
        ))
        """

# The raster uses spatial reference 3857 so the polygon must be projected to it.
polygon_df = spark.sql(f"""
                        SELECT
                            ST_Project(ST_FromText('{polygon}'), 4326, 3857) AS SHAPE,
                            '1' AS PARCEL_ID
                        """).withMeta("POLYGON", 3857)

Drawing

Running Processor ZonalStatistics#

Calculate elevation statistics for the polygon.

Parameters:#

Polygon DataFrame

Path to raster TIF file

Unique ID field for the polygon DataFrame

mode: Set to “area” (any overlap) or “cell” (centroid overlap)

“area” mode considers all raster cells that overlap the polygon. When using this mode, also use the deno parameter.

“cell” mode only considers raster cells whose centroid overlaps the polygon.

[ ]:

output_df = P.zonal_statistics_table(polygon_df,
                                     raster_file,
                                     miid_field="PARCEL_ID",
                                     mode="cell")

Output statistics:#

Count

Min

Max

Area of the overlapping region with the polygon

Mean

Standard deviation

Sum

50th percentile

90th percentile

[ ]:

output_df.show(truncate=False)

+---------+-----+-----+------+--------+------+------------------+-------+------+------+
|PARCEL_ID|count|min  |max   |area    |mean  |std               |sum    |pct50 |pct90 |
+---------+-----+-----+------+--------+------+------------------+-------+------+------+
|1        |12   |872.0|1363.0|750000.0|1109.0|178.45074016844723|13308.0|1143.0|1363.0|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+

The resulting dataframe can be joined back to the original dataframe to enrich the polygon geometries with the raster cell statistics.

[ ]:

output_df.join(polygon_df, "PARCEL_ID").show()

+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+
|PARCEL_ID|count|  min|   max|    area|  mean|               std|    sum| pct50| pct90|               SHAPE|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+
|        1|   12|872.0|1363.0|750000.0|1109.0|178.45074016844723|13308.0|1143.0|1363.0|{[01 06 00 00 00 ...|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+

Drawing