Zonal Statistics#
Table of Contents#
Part 1: What is Processor ZonalStatistics#
ZonalStatistics is a processor that calculates statistics on the values of the raster cells a given polygon overlaps.
Part 2: Processor ZonalStatistics Example#
Setup BDT#
[1]:
import pyspark
import bdt
bdt.auth("../bdt.lic")
from bdt import functions as F
from bdt import processors as P
BDT has been successfully authorized!
Welcome to
___ _ ___ __ ______ __ __ _ __
/ _ ) (_) ___ _ / _ \ ___ _ / /_ ___ _ /_ __/ ___ ___ / / / /__ (_) / /_
/ _ | / / / _ `/ / // // _ `// __// _ `/ / / / _ \/ _ \ / / / '_/ / / / __/
/____/ /_/ \_, / /____/ \_,_/ \__/ \_,_/ /_/ \___/\___//_/ /_/\_\ /_/ \__/
/___/
BDT python version: v3.4.0-develop-12-ge79ea377
BDT jar version: v3.4.0-develop-12-ge79ea377
Input Raster#
Elevation data from the US Geologic Survey and the National Geospatial-Intelligence Agency
Available on ArcGIS Living Atlas
Cut down to the LA area as a sample
[2]:
raster_file = "World_Elevation_GMTE_LA.tif"
Input Data#
Create a sample polygon that overlaps some raster cells from the
World_Elevation_GMTE_LA.tif
.Assign it a unique ID,
PARCEL_ID
in this example.
[3]:
polygon = """
POLYGON((
-117.6878845 34.1968949,
-117.6756497 34.1968949,
-117.6754958 34.1939036,
-117.6877306 34.1942218
))
"""
# The raster uses spatial reference 3857 so the polygon must be projected to it.
polygon_df = spark.sql(f"""
SELECT
ST_Project(ST_FromText('{polygon}'), 4326, 3857) AS SHAPE,
'1' AS PARCEL_ID
""").withMeta("POLYGON", 3857)
Running Processor ZonalStatistics#
Calculate elevation statistics for the polygon.
Parameters:#
Polygon DataFrame
Path to raster TIF file
Unique ID field for the polygon DataFrame
mode: Set to “area” (any overlap) or “cell” (centroid overlap)
“area” mode considers all raster cells that overlap the polygon. When using this mode, also use the deno parameter.
“cell” mode only considers raster cells whose centroid overlaps the polygon.
[4]:
output_df = P.zonal_statistics_table(polygon_df,
raster_file,
miid_field="PARCEL_ID",
mode="cell")
Output statistics:#
Count
Min
Max
Area of the overlapping region with the polygon
Mean
Standard deviation
Sum
50th percentile
90th percentile
[6]:
output_df.show(truncate=False)
+---------+-----+-----+------+--------+------+------------------+-------+------+------+
|PARCEL_ID|count|min |max |area |mean |std |sum |pct50 |pct90 |
+---------+-----+-----+------+--------+------+------------------+-------+------+------+
|1 |12 |872.0|1363.0|750000.0|1109.0|178.45074016844723|13308.0|1143.0|1363.0|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+
The resulting dataframe can be joined back to the original dataframe to enrich the polygon geometries with the raster cell statistics.
[7]:
output_df.join(polygon_df, "PARCEL_ID").show()
+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+
|PARCEL_ID|count| min| max| area| mean| std| sum| pct50| pct90| SHAPE|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+
| 1| 12|872.0|1363.0|750000.0|1109.0|178.45074016844723|13308.0|1143.0|1363.0|{[01 06 00 00 00 ...|
+---------+-----+-----+------+--------+------+------------------+-------+------+------+--------------------+