NAV Navbar
usage

What is Big Data Toolkit

The Big Data Toolkit (BDT) is an Esri Professional Services solution that allows customers to aggregate, analyze, and enrich big data within their existing big data environment.

Delivered as a term subscription that includes a set of spatial analysis and data interoperability tools that work with an existing big data environment.

BDT can help data scientists and analysts enhance their big data analytics with spatial tools that take advantage of the massive computing capacity they already have.

With Big Data Toolkit, you can:

Performance

BDT performance is measured by testing individual processors on massive datasets with a large spark cluster. Benchmark runtimes are produced in an Azure HDInsight cluster using 40 executors with 4 cores and 59G each. Data sources include delimited and parquet files on Azure Gen2 Blob store.

Often we need to enrich a point dataset with attributes from a polygon layer. Our workflow takes 600 million point locations, calculates which US Postal polygon contains each point, and tags the points with zipcode information.

points polygons process time
600 million points (128GB) US Zipcodes (26MB) ProcessorPointInPolygon 4.2 minutes

Another common process is find the nearest line for each point. Our workflow enriches a point dataset with the name and distance to the nearest street segment. A required input to this process is radius, or how far away, in decimal degrees, to search for a street segment. Our input (0.001) searches within a city block for the nearest street. Larger values take longer to process.

points polylines process radius time
600 million points (128GB) 50 million streets (50GB) ProcessorDistance 0.001 6.6 minutes

System Requirements

Setup

Azure Databricks Setup

This section covers how to install and setup Big Data Toolkit in Azure Databricks.

Create Resource Group

Create a resource group

Create a resource group

Create Storage Account

Create a storage account

Create a storage account

Create a storage account

Add Databricks Service to Resource Group

Create Azure Databricks Service

Create Azure Databricks Service

Add Notebook to Databricks Workspace

Create Notebook

Create Notebook

Install & configure Databricks CLI

Launch Workspace

Launch Workspace

Launch Workspace

export DATABRICKS_HOST=https://westus2.azuredatabricks.net
export DATABRICKS_TOKEN=YOUR TOKEN

Use Principal Key to setup DBFS

databricks secrets create-scope --scope adls --initial-manage-principal users
databricks secrets put --scope adls --key credential

Install Big Data Toolkit

Install Jar

Install Jar

Install Jar

High Concurrency

How to access Azure Blob Storage

Access Azure Data Lake Storage Gen2 using OAuth 2.0 with an Azure service principal

    source = "abfss://rawdata@cdcvh.dfs.core.windows.net",
    mountPoint = "/mnt/data",
    extraConfigs = configs)

Local Jupyter Notebook Setup

This section covers how to use Big Data Toolkit with Jupyter Notebook locally.

Prerequisites

To setup Jupyter Notebook the following prerequisites are needed:

# Set the following environment variables in your bash or zsh profile to use Spark:

export SPARK_HOME={spark-location}/bin
export SPARK_LOCAL_IP=localhost
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab --ip=0.0.0.0 --port 8888 --allow-root --no-browser --NotebookApp.token=""'

# Install Jupyter:

pip install notebook

# Install deprecation:

pip install deprecation

Launch Jupyter Notebook with BDT

Start the Notebook with the PySpark driver

pyspark\ 
    --conf spark.submit.pyFiles="/Users/{User}/Downloads/{Egg}"\
    --conf spark.jars="/Users/{User}/Downloads/{Jar}"

Start using BDT with Jupyter Notebook

Walkthrough

Importing Big Data Toolkit


import bdt
spark.withBDT("/mnt/data/test.lic")

ST Functions


spark.sql("""
SHOW USER FUNCTIONS LIKE 'ST_*'
""").show(5)

+------------+
|    function|
+------------+
|     st_area|
|st_asgeojson|
|    st_ashex|
|   st_asjson|
|     st_asqr|
+------------+

spark.sql("""
SHOW USER FUNCTIONS LIKE 'H3*'
""").show()

+----------------+
|        function|
+----------------+
|      h3distance|
|         h3kring|
|h3kringdistances|
|    h3tochildren|
|         h3togeo|
| h3togeoboundary|
|      h3toparent|
|      h3tostring|
+----------------+

ST Function Examples

Do it all in Spark SQL

df1 = spark.range(10)
df1.createOrReplaceTempView("df1")
df1_o = spark.sql("""

SELECT *, ST_Buffer(ST_MakePoint(x, y), 100.0, 3857) AS SHAPE
FROM   
    (SELECT *, LonToX(lon) as x, LatToY(lat) as y
     FROM
         (SELECT id, 360-180*rand() AS lon, 360-180*rand() AS lat
          FROM df1))
""")
df1_o.show()

+---+------------------+------------------+--------------------+--------------------+--------------------+
| id|               lon|               lat|                   x|                   y|               SHAPE|
+---+------------------+------------------+--------------------+--------------------+--------------------+
|  0| 209.7420517387036|278.69107966694395|2.3348378397488922E7|-1.64374601454058...|{          ...|
|  1| 235.6965940336686|283.60464624333105|2.6237624829536907E7|-1.35615066742416...|{          ...|
|  2|253.00393806010976| 183.6534861987022|2.8164269553544343E7|  -406980.1151084085|{          ...|
|  3| 260.1383894167863| 193.1159327466332|2.8958473045658957E7| -1472980.4194758015|{          ...|
|  4|339.14158668989427|319.97117910147915| 3.775306873714188E7|   -4870131.33823664|{          ...|
|  5|196.72906017810777|350.15890505545616| 2.189977880326623E7| -1100932.2276620644|{          ...|
|  6| 308.3393018275092| 218.5508226001505| 3.432417407099181E7|  -4657533.480692809|{          ...|
|  7|230.71769139172113| 319.8814906385373| 2.568337592272603E7|  -4883178.711132153|{          ...|
|  8| 328.6877953661514| 183.3260391179804| 3.658935801012368E7| -370461.10529312206|{          ...|
|  9|290.42449965744925|189.24602659107038|  3.23299074157585E7| -1033759.5256822291|{          ...|
+---+------------------+------------------+--------------------+--------------------+--------------------+

Hybrid of Spark SQL and Python DataFrame API

df2 = spark.range(10)\
    .selectExpr("id","360-180*rand() lon","180-90*rand() lat")\
    .selectExpr("*","LonToX(lon) x","LatToY(lat) y")\
    .selectExpr("*","ST_Buffer(ST_MakePoint(x,y), 100.0, 4326) SHAPE")
df2.show()

+---+------------------+------------------+--------------------+--------------------+--------------------+
| id|               lon|               lat|                   x|                   y|               SHAPE|
+---+------------------+------------------+--------------------+--------------------+--------------------+
|  0| 272.2434268912087| 99.48671731291171| 3.030599965334515E7|1.5876415679293292E7|{          ...|
|  1|357.53193763641457|158.27029821614542| 3.980027324001811E7|   2479103.475519385|{          ...|
|  2|265.19248738341645|134.20606970025696|2.9521092657723542E7|   5747387.678019282|{          ...|
|  3|328.91483911816016|109.22908582796377|3.6614632404985085E7|1.1324393353312327E7|{          ...|
|  4| 198.1155773828232| 99.50944059423078|2.2054125192471266E7|1.5861086452229302E7|{          ...|
|  5|183.39843143144566|  113.097849987619| 2.041581999923363E7|1.0128237129727399E7|{          ...|
|  6| 241.2867789263583|170.76182038746066|2.6859921365231376E7|   1032874.515275041|{          ...|
|  7|192.59185824421752| 98.05268650607889| 2.143922759067662E7| 1.692579219007473E7|{          ...|
|  8|244.69326236261554|175.35379731732678|2.7239129366751254E7|   517780.7017250775|{          ...|
|  9|317.54181519860066|157.47594852504096|3.5348593173480004E7|   2574561.293889059|{          ...|
+---+------------------+------------------+--------------------+--------------------+--------------------+


Processors

Call ProcessorArea on df1:

df1_m = df1_o.withMeta("POLYGON", 3857)
area = P.area(df1_m)
area.show(5)

+---+------------------+------------------+--------------------+--------------------+--------------------+------------------+
| id|               lon|               lat|                   x|                   y|               SHAPE|        shape_area|
+---+------------------+------------------+--------------------+--------------------+--------------------+------------------+
|  0| 209.7420517387036|278.69107966694395|2.3348378397488922E7|-1.64374601454058...|{          ...| 31393.50203038831|
|  1| 235.6965940336686|283.60464624333105|2.6237624829536907E7|-1.35615066742416...|{          ...| 31393.50203038831|
|  2|253.00393806010976| 183.6534861987022|2.8164269553544343E7|  -406980.1151084085|{          ...| 31393.50203039233|
|  3| 260.1383894167863| 193.1159327466332|2.8958473045658957E7| -1472980.4194758015|{          ...| 31393.50203039616|
|  4|339.14158668989427|319.97117910147915| 3.775306873714188E7|   -4870131.33823664|{          ...|31393.502030451047|
+---+------------------+------------------+--------------------+--------------------+--------------------+------------------+

Concepts

Shape Column

Format


spark.sql("SELECT ST_MakePoint(-47.36744,-37.11551) AS SHAPE").printSchema() 
root
 |-- SHAPE: struct (nullable = false)
 |    |-- WKB: binary (nullable = true)
 |    |-- XMIN: double (nullable = true)
 |    |-- YMIN: double (nullable = true)
 |    |-- XMAX: double (nullable = true)
 |    |-- YMAX: double (nullable = true)

Methods

df = (spark
   .read
   .format("com.esri.spark.shp")
   .options(path="/data/poi.shp", format="WKB")
   .load()
   .selectExpr("ST_FromWKB(shape) as SHAPE")
   .withMeta("Point", 4326))
df.selectExpr("ST_AsText(SHAPE) WKT").write.csv("/tmp/sink/wkt")
+---------------------------+
|WKT                        |
+---------------------------+
|POINT (-47.36744 -37.11551)|
+---------------------------+

Shape structs can be created using multiple methods:

Processors
SQL functions
Export

Transform Shape structs into formats suitable for data interchange:

Metadata

# Python 
df1 = (spark
    .read
    .format("csv")
    .options(path="/data/taxi.csv", header="true", delimiter= ",", inferSchmea="false")
    .schema("VendorID STRING,tpep_pickup_datetime STRING,tpep_dropoff_datetime STRING,passenger_count STRING,trip_distance STRING,pickup_longitude DOUBLE,pickup_latitude DOUBLE,RatecodeID STRING,store_and_fwd_flag STRING,dropoff_longitude DOUBLE,dropoff_latitude DOUBLE,payment_type STRING,fare_amount STRING,extra STRING,mta_tax STRING,tip_amount DOUBLE,tolls_amount DOUBLE,improvement_surcharge DOUBLE,total_amount DOUBLE")
    .load()
    .selectExpr("tip_amount", "total_amount", "ST_FromXY(pickup_longitude, pickup_latitude) as SHAPE")
    .withMeta("Point", 4326, shapeField = "SHAPE"))

df1.schema[-1].metadata
{'type': 33, 'wkid': 4326}

Cell size

points_in_county = spark.bdt.pointInPolygon(
    points,
    county,
    cellSize = 0.1,
)

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Cache

# Python
points_in_county = spark.bdt.pointInPolygon(
    points,
    county,
    cellSize = 0.1,
    cache = True,
)
points_in_cy_zp = spark.bdt.pointInPolygon(
    points_in_county,
    zip_codes,
    cellSize = 0.1,
    cache = True, #Setting cache here substantially reduces total run time
) 
pssoverlaps = spark.bdt.powerSetStats(
    points_in_county_zp,
    statsFields = ["TotalInsuredValue"],
    geographyIds = ["CY_ID","ZP_ID"],
    categoryIds = ["Quarter","LineOfBusiness"],
    idField = "Points_ID",
    cache = False
)
pssoverlaps.cache() #same as setting cache = True above

pssoverlaps.write.mode("overwrite").parquet("/tmp/sink")
pssoverlaps.write.mode("overwrite").csv("/tmp/csv")

Extent

LMDB

Processors:

SQL Functions:

BDT v2

The title version number, for example ProcessorArea v2.1, indicates the version first available. ProcessorArea was available starting in v2.1 while v2 indicates available since v2.0 If new Processor configuration options are added the version availability is noted next to the property e.g. - emitPippedOnly (_v2.1_)

Sources v2

source can be thought of as the componenent that describes that data that the spark application will be consuming.

SourceGeneric v2


Hocon

# csv

{
 name = taxi
 type = com.esri.source.SourceGeneric
 format = csv
 options = {
   path = ./src/test/resources/taxi.csv
   header = true
   delimiter = ","
   inferSchema = false
 }
 schema = "VendorID STRING,tpep_pickup_datetime STRING,tpep_dropoff_datetime STRING,passenger_count STRING,trip_distance STRING,pickup_longitude DOUBLE,pickup_latitude DOUBLE,RatecodeID STRING,store_and_fwd_flag STRING,dropoff_longitude DOUBLE,dropoff_latitude DOUBLE,payment_type STRING,fare_amount STRING,extra STRING,mta_tax STRING,tip_amount DOUBLE,tolls_amount DOUBLE,improvement_surcharge DOUBLE,total_amount DOUBLE,SHAPE) as SHAPE", "tip_amount", "total_amount"]
 where = "VendorID = '2'"
 geometryType = point
 shapeField = SHAPE
 wkid = 4326
 cache = false
 numPartitions = 8
}

Hocon

# parquet

{
 name = taxi
 type = com.esri.source.SourceGeneric
 format = parquet
 options = {
   path = ./src/test/resources/taxi.parquet
   mergeSchema = true
 }
 where = "VendorID = '2'"
 geometryType = point
 shapeField = SHAPE
 wkid = 4326
 cache = false
 numPartitions = 8
}

Hocon

# JDBC

{
 name = service_area
 type = com.esri.source.SourceGeneric
 format = jdbc
 options = {
   url = "jdbc:sqlserver://localhost:1433;databaseName=bdt2;integratedSecurity=false"
   user = "you"
   password = "your secret"
    // Microsoft Spatial Type function.
   query = "SELECT *, Shape.STAsBinary() WKB FROM BFS_HEXAGON_SPLIT_ONE_T_2"
 }
  // BDT Spatial Type function.
 selectExpr = ["M_ID", "ST_FromWKB(WKB) SHAPE"]
 geometryType = Polygon
 shapeField = SHAPE
 wkid = 3857
}


# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sourceGeneric(
  "csv",
  {"path": "./src/test/resources/taxi.csv", "header": "true", "delimiter": ",", "inferSchema": "true"},
  schema = "VendorID STRING,tpep_pickup_datetime STRING,tpep_dropoff_datetime STRING,passenger_count STRING,trip_distance STRING,pickup_longitude DOUBLE,pickup_latitude DOUBLE,RatecodeID STRING,store_and_fwd_flag STRING,dropoff_longitude DOUBLE,dropoff_latitude DOUBLE,payment_type STRING,fare_amount STRING,extra STRING,mta_tax STRING,tip_amount DOUBLE,tolls_amount DOUBLE,improvement_surcharge DOUBLE,total_amount DOUBLE, SHAPE STRING",
  selectExpr = ["ST_FromXY(pickup_longitude, pickup_latitude) as SHAPE", "tip_amount", "total_amount"],
  where = "VendorID = '2'",
  geometryType = "Point",
  shapeField = "SHAPE",
  wkid = 4326,
  cache = True,
  numPartitions = 8)

# Spark API
spark\
    .read\
    .format("csv")\
    .options(path="./src/test/resources/taxi.csv", header="true", delimiter= ",", inferSchmea="true")\
    .schema("VendorID STRING,tpep_pickup_datetime STRING,tpep_dropoff_datetime STRING,passenger_count STRING,trip_distance STRING,pickup_longitude DOUBLE,pickup_latitude DOUBLE,RatecodeID STRING,store_and_fwd_flag STRING,dropoff_longitude DOUBLE,dropoff_latitude DOUBLE,payment_type STRING,fare_amount STRING,extra STRING,mta_tax STRING,tip_amount DOUBLE,tolls_amount DOUBLE,improvement_surcharge DOUBLE,total_amount DOUBLE, SHAPE STRING")\
    .load()\
    .selectExpr("ST_FromXY(pickup_longitude, pickup_latitude) as SHAPE", "tip_amount", "total_amount")\
    .where("VendorID = '2'")\
    .withMeta("Point", 4326, shapeField = "SHAPE")\
    .repartition(8)\
    .cache()

Read dataset from the external storage.

Source Generic is a BDT-2 Module for reading in generic file types that are already spark supported.

If the input data has a shape column: - Specify a geometryType, shapeField, and wkid. This will set the metadata of the shape column. - If the shape column is WKT, then use ST_FromText on the shape column in selectExpr - If the shape column is WKB, then use ST_FromWKB on the shape column in selectExpr - If the shape column is shape struct, then no ST function is needed.

If the input data does not have shape struct column, do not provide geometryType, shapeField, and wkid. These values will default to None.

Configuration
Required
Optional

References

SourceSQL v2


Hocon

{
 name = zip
 type = com.esri.source.SourceSQL
 sql = "select 'hello' as world"
 cache = false
}

Hocon

{
 name = zip
 type = com.esri.source.SourceSQL
 sql = "SELECT * FROM parquet.`examples/src/main/resources/users.parquet`"
 cache = false
}

Hocon

# Example for shape column with WKB values
{
 name = zip
 type = com.esri.source.SourceSQL
 sql = "SELECT ST_FromWKB(SHAPE) FROM parquet.`examples/src/main/resources/users.parquet`"
 geometryType = polygon
 shapeField = SHAPE
 wkid = 4326
 cache = false
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sourceSQL(
    "SELECT * FROM parquet.`./src/test/resources/States_st.parquet`",
    geometryType = "Polygon",
    shapeField = "SHAPE",
    wkid = 4326,
    cache = True)

# Spark API
spark\
   .sql("SELECT * FROM parquet. `./src/test/resources/States_st.parquet`")\
   .withMeta("Polygon", 4326, shapeField = "SHAPE")\
   .cache()

com.esri.source.SourceSQL

Generate a static DataFrame from sql.

You can run SQL on files directly. {{{ SELECT * FROM parquet.examples/src/main/resources/users.parquet }}}

If the input data does not have shape struct column, do not provide geometryType, shapeField, and wkid. These values will default to None.

Capable of reading directly from a hive table.

Configuration
Required
Optional

SourceFGDB v2


Hocon

{
 name = bgs
 type = com.esri.source.SourceFGDB
 path = "./src/test/resources/Redlands.gdb"
 table = BlockGroups
 sql = "SELECT TOTPOP_CY, Shape FROM BlockGroups WHERE ID = 060650301011"
 geometryType = Polygon
 geometryField = Shape
 shapeField = SHAPE
 wkid = 4326
 cache = false
 numPartitions = 4
}

Hocon

{
 name = standalone
 type = com.esri.source.SourceFGDB
 path = "./src/test/resources/standalone_table.gdb"
 table = standalone_table
 sql = "SELECT * FROM standalone_table"
 geometryType = None
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sourceFGDB(
    "./src/test/resources/Redlands.gdb",
    "Policy_25M",
    "SELECT * FROM Policy_25M",
    geometryType = "Point",
    geometryField = "Shape",
    shapeField = "SHAPE",
    wkid = 4326,
    cache = True,
    numPartitions = 4)

# Spark API
spark\
    .read\
    .format("com.esri.gdb")\
    .options(path="./src/test/resources/Redlands.gdb", name="Policy_25M")\
    .load()\
    .selectExpr("ST_FromFGDBPoint(Shape) as NEW_SHAPE", "*")\
    .drop("Shape")\
    .withColumnRenamed("NEW_SHAPE", "SHAPE")\
    .withMeta("Point", 4326)\
    .repartition(4)\
    .cache()

com.esri.source.SourceFGDB

Read and create a DataFrame from the file geodatabase. [[SHAPE_STRUCT]] will be created when geometryType is not set to None. Unlike other sources, SourceFGDB sets the metadata on [[SHAPE_FIELD]] in the outgoing DataFrame.

Configuration

Required

Optional

SourceSHP v2.2


Hocon

{
 name = bgs
 type = com.esri.source.SourceSHP
 path = "./src/test/resources/Redlands.shp"
 sql = "SELECT TOTPOP_CY, SHAPE FROM BlockGroups WHERE ID = 060650301011"
 table = BlockGroups
 geometryType = Polygon
 shapeField = SHAPE
 wkid = 4326
 cache = false
 numPartitions = 4
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sourceSHP(
    "./src/test/resources/shp",
    "Dev_Ops",
    "select * from Dev_Ops",
    "Point",
    shapeField="SHAPE",
    wkid=4326,
    cache=True,
    numPartitions=1)

# Spark API
spark\
   .read\
   .format("com.esri.spark.shp")\
   .options(path="./src/test/resources/shp/", format="WKB")\
   .load()\
   .selectExpr("ST_FromWKB(shape) as SHAPE_NEW", "*")\
   .drop("shape")\
   .withColumnRenamed("SHAPE_NEW", "SHAPE")\
   .withMeta("Point", 4326)\
   .repartition(1)\
   .cache()

com.esri.source.SourceSHP

Read and create a DataFrame from a shapefile. SourceSHP sets the metadata on [[SHAPE_FIELD]] in the outgoing DataFrame.

Configuration
Required
Optional

SourceHive v2


Hocon

{
 name = taxi
 type = com.esri.source.SourceHive
 properties = {
   "hive.metastore.uris" = "thrift://127.0.0.1:9083"
 }
 preHQL = []
 mainHQL = "SELECT * FROM taxi"
 database = default
 cache = false
 numPartitions = 8
}

Hocon

# Example for shape column with WKT values
{
 name = taxi
 type = com.esri.source.SourceHive
 properties = {
   "hive.metastore.uris" = "thrift://127.0.0.1:9083"
 }
 preHQL = []
 mainHQL = "SELECT ST_FromText(SHAPE) FROM taxi"
 database = default
 geometryType = point
 shapeField =  SHAPE
 wkid = 4326
 cache = false
 numPartitions = 8
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sourceHive(
    {"hive.metastore.uris": "thrift://127.0.0.1:9083"},
    "SELECT ST_FromText(SHAPE) FROM taxi",
    [],
    "default",
    geometryType = "Point",
    shapeField = "SHAPE",
    wkid = 4326,
    cache = False,
    numPartitions = 8)

com.esri.source.SourceHive

Read from hive storage.

If the input data does not have shape struct column, do not provide geometryType, shapeField, and wkid. These values will default to None.

https://stackoverflow.com/questions/53044191/spark-warehouse-vs-hive-warehouse

https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html

Configuration
Required
Optional

Processors v2

ProcessorAddFields v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddFields
 expressions = ["(A/B) as C"]
 cache = false
}
# Python

# v2.2
spark.bdt.processorAddFields(
    df,
    ["(A/B) as C"],
    cache = False)

# v2.3
spark.bdt.addFields(
    df,
    ["(A/B) as C"],
    cache = False)

com.esri.processor.ProcessorAddFields

Add new fields using SQL expressions.

Configuration
Required
Optional

ProcessorAddMetadata v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddMetadata
 geometryType = Point
 wkid = 4326
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddMetadata(
    df,
    "Point",
    4326,
    shapeField = "SHAPE",
    cache = False)

# v2.3
df.withMeta(
    "Point",
    4326,
    "SHAPE")

com.esri.processor.ProcessorAddMetadata

Set the geometry and spatial reference metadata on the [[SHAPE_STRUCT]] column.

Configuration
Required
Optional

ProcessorAddShapeFromGeoJson v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddShapeFromGeoJson
 geoJsonField = GEOJSON
 geometryType = Point
 wkid = 4326
 keep = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddShapeFromGeoJson(
    df,
    "GEOJSON",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.addShapeFromGeoJson(
    df,
    "GEOJSON",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddShapeFromGeoJson

Add the SHAPE column based on the configured GeoJson field. The possible values for geometryType are: Unknown, Point, Line, Envelope, MultiPoint, Polyline, Polygon, GeometryCollection.

Consider using ST_FromGeoJSON

Configuration
Required
Optional

ProcessorAddShapeFromJson v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddShapeFromJson
 geometryType = Polygon
 jsonField = JSON
 keep = false
 wkid = 4326
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddShapeFromJson(
    df,
    "JSON",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.addShapeFromJson(
    df,
    "JSON",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddShapeFromJson

Add the SHAPE column based on the configured JSON field. The possible values for geometryType are: Unknown, Point, Line, Envelope, MultiPoint, Polyline, Polygon, GeometryCollection.

Consider using ST_FromJson

Configuration
Required
Optional

ProcessorAddShapeFromWKB v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddShapeFromWKB
 wkbField = WKB
 geometryType = Point
 wkid = 4326
 keep = false
 shapeField = SHAPE
}

# Python

# v2.2
spark.bdt.processorAddShapeFromWKB(
    df,
    "WKB",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.addShapeFromWKB(
    df,
    "WKB",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddShapeFromWKB

Add the SHAPE column based on the configured WKB field. The possible values for geometryType are: Unknown, Point, Line, Envelope, MultiPoint, Polyline, Polygon, GeometryCollection.

Consider using ST_FromWKB

Configuration
Required
Optional

ProcessorAddShapeFromWKT v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddShapeFromWKT
 wktField = WKT
 geometryType = Point
 wkid = 4326
 keep = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddShapeFromWKT(
    df,
    "WKT",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.addShapeFromWKT(
    df,
    "WKT",
    "Point",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddShapeFromWKT

Add the SHAPE column based on the configured WKT field. The possible values for geometryType are: Unknown, Point, Line, Envelope, MultiPoint, Polyline, Polygon, GeometryCollection.

Consider using ST_FromText

Configuration
Required
Optional

ProcessorAddShapeFromXY v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddShapeFromXY
 xField = x
 yField = y
 wkid = 4326
 keep = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddShapeFromXY(
    df,
    "x",
    "y",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.addShapeFromXY(
    df,
    "x",
    "y",
    wkid = 4326,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddShapeFromXY

Add SHAPE column based on the configured X field and Y field.

Consider using ST_FromXY

Configuration
Required
Optional

ProcessorAddWKT v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAddWKT
 wktField = WKT
 shapeField = SHAPE
 keep = false
 cache = false
}

# Python

# v2.2
spark.bdt.processorAddWKT(
    df,
    wktField = "WKT",
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.2
spark.bdt.addWKT(
    df,
    wktField = "WKT",
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAddWKT

Add a new field containing the well-known text representation.

Consider using ST_AsText

Configuration
Optional

ProcessorArea v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorArea
 areaField = shape_area
 shapeField = SHAPE
 cache = false
}

# Python

# 2.2
spark.bdt.processorArea(
    df,
    areaField = "shape_area",
    shapeField = "SHAPE",
    cache = False)

# 2.3
spark.bdt.area(
    df,
    areaField = "shape_area",
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorArea

Calculate the area of a geometry.

Consider using ST_Area

Configuration
Optional

ProcessorAssembler v2.2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorAssembler
 targetIDFields = [ID1, ID2, ID3]
 timeField = current_time
 timeFormat = timestamp
 separator = ":"
 trackIDName = trackID
 origMillisName = origMillis
 destMillisName = destMillis
 durationMillisName = durationMillis
 numTargetsName = numTargets
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorAssembler(
    df,
    ["ID1", "ID2", "ID3"],
    "current_time",
    "timestamp",
    separator = ":",
    trackIDName = "trackID",
    origMillisName = "origMillis",
    destMillisName = "destMillis",
    durationMillisName = "durationMillis",
    numTargetsName = "numTargets",
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.assembler(
    df,
    ["ID1", "ID2", "ID3"],
    "current_time",
    "timestamp",
    separator = ":",
    trackIDName = "trackID",
    origMillisName = "origMillis",
    destMillisName = "destMillis",
    durationMillisName = "durationMillis",
    numTargetsName = "numTargets",
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorAssembler

Processor to assemble a set of targets (points) into a track (polyline). The user can specify a set of target fields that make up a track identifier and a timestamp field in the target attributes, such that all the targets with that same track identifier will be assembled chronologically as polyline vertices. The polyline consists of 3D points where x/y are the target location and z is the timestamp.

The output DataFrame will have the following columns: - shape: The shape struct that defines the track (polyline) - trackID: The track identifier. Typically, this is a concatenation of the string representation of the user specified target fields separated by a colon (:) - origMillis: The track starting epoch in milliseconds. - destMillis: The track ending epoch in milliseconds. - durationMillis: The track duration in milliseconds. - numTargets: The number of targets in the track.

Configuration
Required
optional

ProcessorBuffer v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorBuffer
 distance = 2.0
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorBuffer(
    df,
    2.0,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.buffer(
    df,
    2.0,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorBuffer

Buffer geometries by a radius.

Consider using ST_Buffer

Configuration
Required
Optional

ProcessorCentroid v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorCentroid
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorCentroid(
    df,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.centroid(
    df,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorCentroid

Produces centroids of features.

Consider using ST_Centroid

Configuration
Optional

ProcessorCountWithin v2.2


Hocon

{
 inputs = [_default, _default]
 output = _default
 type = com.esri.processor.ProcessorCountWithin
 cellSize = 1
 keepGeometry = false
 emitContainsOnly = false
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorCountWithin(
    df1,
    df2,
    1.0,
    keepGeometry = False,
    emitContainsOnly = False,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.countWithin(
    df1,
    df2,
    1.0,
    keepGeometry = False,
    emitContainsOnly = False,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorCountWithin

Processor to produce a count of all lhs features that are contained in a rhs feature.

If emitContainsOnly is set to true (the default), this processor will only output rhs features that contain at least one lhs feature. To get all rhs features in the output, emitContainsOnly must be set to false. This will include rhs features that contain 0 lhs features. IMPORTANT: There will be a non-trivial time cost if emitContainsOnly is set to false.

If keepGeometry is set to false, the output format will be [rhs attributes + lhs count]. This is the default. If keepGeometry is set to true, the output format will be [rhs shape + rhs attributes + lhs count].

For example, let's assume that

Polygon with attributes [NAME: A, ID: 1] contains 2 points. Polygon with attributes [NAME: A, ID: 2] contains 1 point.

With keepGeometry set to false, the output rows will be: [NAME: A, ID: 1, 2] [NAME: A, ID: 2, 1]

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorDifference v2.2


Hocon

{
 inputs = [blockgroups, zipcodes]
 output = _default
 type = com.esri.processor.ProcessorDifference
 cellSize = 0.1
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorDifference(
    df1,
    df2,
    0.1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.difference(
    df1,
    df2,
    0.1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.combine.ProcessorDifference

Per each left hand side (lhs) feature, find all relevant right hand side (rhs) features and take the set-theoretic difference of each rhs feature with the lhs feature. For the LHS feature type, it can be point, polyline, polygon types. For the RHS feature type, it can be point, polyline, polygon types.

When n relevant rhs features are found, the lhs feature is emitted n times, and the lhs geometry is replaced with the difference of the lhs geometry and rhs geometry. When a lhs feature has no overlapping rhs features, nothing is emitted. The length output table could be less than the amount of input lhs features.

The final schema is: lhs attributes + [[SHAPE_FIELD]] + rhs attributes.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorDissolve v2


Hocon

{
 type = com.esri.processor.ProcessorDissolve
 fields = [TYPE]
 singlePart = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorDissolve(
    df,
    fields = ["TYPE"],
    singlePart = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.dissolve(
    df,
    fields = ["TYPE"],
    singlePart = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorDissolve

Dissolve features either by both attributes and spatially or spatially only. Performs the topological union operation on two geometries. The supported geometry types are multiPoint, polyline and polygon. singlePart indicates whether the returned geometry is multipart or not.

The output schema is: fields + SHAPE Struct.

Configuration
Optional

ProcessorDissolve2 v2


Hocon

{
 type = com.esri.processor.ProcessorDissolve2
 fields = [TYPE]
 singlePart = true
 sort = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorDissolve2(
    df,
    fields = ["TYPE"],
    singlePart = True,
    sort = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.dissolve2(
    df,
    fields = ["TYPE"],
    singlePart = True,
    sort = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorDissolve2

This implementation is the same as ProcessorDissolve in BDT 1.0. Dissolve features by both attributes and spatially or spatially only. Performs the topological union operation on two geometries. The supported geometry types are multipoint, polyline and polygon. singlePart indicates whether the returned geometry is multipart or not.

The output schema is: fields + SHAPE Struct.

Configuration
Optional

ProcessorDistance v2.1


Hocon

{
 inputs = [blockgroups, zipcodes]
 output = _default
 type = com.esri.processor.ProcessorDistance
 cellSize = 0.1
 radius = 0.003
 emitEnrichedOnly = false
 doBroadcast = false
 distanceField = distance
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
}

# Python

# v2.2
spark.bdt.processorDistance(
    df1,
    df2,
    0.1,
    0.003,
    emitEnrichedOnly = True,
    doBroadcast = False,
    distanceField = "distance",
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.distance(
    df1,
    df2,
    0.1,
    0.003,
    emitEnrichedOnly = True,
    doBroadcast = False,
    distanceField = "distance",
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorDistance

Find a target feature within the search distance for a source feature.

Per each left hand side (lhs) feature, find the closest right hand side (rhs) feature within the specified radius. For the LHS feature type, it can be point, polyline, polygon types. For the RHS feature type, it can be point, polyline, polygon types.

When the lhs and rhs feature types are point, the distance is in meters. When a lhs feature has no rhs features within the specified radius, that lhs feature is emitted with null values for rhs attributes.

The final schema is: lhs attributes + [[SHAPE_FIELD]] + rhs attributes + distance.

Extent must be specified when using a projected spatial reference. The default extent assumes world geographic (WGS84/4326).

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Consider using ST_Distance

Configuration
Required
Optional

ProcessorDistanceAllCandidates v2.1


Hocon

{
 inputs = [_default, _default]
 output = _default
 type = com.esri.processor.ProcessorDistanceAllCandidates
 cellSize = 1
 radius = 1
 emitEnrichedOnly = true
 doBroadcast = false
 distanceField = distance
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorDistanceAllCandidates(
    df1,
    df2,
    1.0,
    1.0,
    emitEnrichedOnly = True,
    doBroadcast = False,
    distanceField = "distance",
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.distanceAllCandidates(
    df1,
    df2,
    1.0,
    1.0,
    emitEnrichedOnly = True,
    doBroadcast = False,
    distanceField = "distance",
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorDistanceAllCandidates

Processor to find all features from the right hand side dataset within a given search distance. Enriches lhs features with the attributes of rhs features found within the specified distance. If emitEnrichedOnly is set to false, when no rhs features are found, a lhs feature is enriched with null values. When emitEnrichedOnly is set to true, when no rhs features are found, the lhs feature is not emitted in the output.

The distance between the lhs and corresponding rhs geometry is appended to the end of each row. When the input sources are points, the distance is calculated in meters.

When there are multiple right hand side features (A,B,C) are found within the specified distance for a left hand side feature (X), the output would be:

X,A
X,B
X,C

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorDropFields v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorDropFields
 fields = ["tip_amount", "total_amount"]
}

# Python

# v2.2
spark.bdt.processorDropFields(
    df,
    ["tip_amount", "total_amount"],
    cache = False)

# v2.3
spark.bdt.dropFields(
    df,
    ["tip_amount", "total_amount"],
    cache = False)

com.esri.processor.ProcessorDropFields

Drop the configured fields.

Configuration
Required
Optional

ProcessorEliminatePolygonPartArea v2.3

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorEliminatePolygonPartArea
 areaThreshold = 1.0
 shapeField = SHAPE
 cache = false
}
# Python

# v2.3
spark.bdt.eliminatePolygonPartArea(
    df,
    1.0,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorEliminatePolygonPartArea

Remove the holes of a polygon that have area below the given threshold.

Consider using ST_EliminatePolygonPartArea

Configuration
Required
Optional

ProcessorEliminatePolygonPartPercent v2.3

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorEliminatePolygonPartPercent
 percentThreshold = 10.0
 shapeField = SHAPE
 cache = false
}
# Python

# v2.3
spark.bdt.eliminatePolygonPartPercent(
    df,
    1.0,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorEliminatePolygonPartPercent

Remove the holes of a polygon whose area as a percentage of the whole polygon is less than this value.

Consider using ST_EliminatePolygonPartPercent

Configuration
Required
Optional

ProcessorExtentFilter v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorExtentFilter
 oper = CONTAINS
 extent = [-180.0, -90.0, 180.0, 90.0]
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorExtentFilter(
    df,
    "CONTAINS",
    extentList = [-180.0, -90.0, 180.0, 90.0],
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.extentFilter(
    df,
    "CONTAINS",
    extentList = [-180.0, -90.0, 180.0, 90.0],
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorExtentFilter

Processor to filter geometries that meet the operation relation with the given envelope. The operation is executed on the geometry in relation to the envelope: read as 'envelope intersects/overlaps/within/contains geometry'

Configuration
Required
Optional

ProcessorFeatureFilter (application of com.esri.processor.ProcessorSQL)


Hocon

#Suppose your feature column list is two column names, [price, location_Name], where x is of type Int and y is of type String.
#Return all rows where price is below 400000 and location_name is equal to "San Fransisco"
{
 inputs = [_default]
 output = _default
 type = com.esri.processor.ProcessorSQL
 sql = "SELECT * FROM _default WHERE x < 400000 AND  y = 'San Fransisco'"
 cache = false
}

Hocon

# Return the SHAPE column for rows where price is less than or equal to 700000 and location_name starts with "San"
{
 inputs = [_default]
 output = _default
 type = com.esri.processor.ProcessorSQL
 sql = "SELECT SHAPE FROM _default WHERE x <= 700000 AND y = 'San%'"
 cache = false
}

In BDT 2, ProcessorSQL accomplishes the same task as ProcessorFeatureFilter in BDT 1. Use Processor SQL instead. Your filter condition will have to be expressed as a SQL statement, and must have a boolean return value.

ProcessorGeneralize v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorGeneralize
 maxDeviation = 0.001
 removeDegenerateParts = true
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorGeneralize(
    df,
    maxDeviation = 0.001,
    removeDegenerateParts = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.generalize(
    df,
    maxDeviation = 0.001,
    removeDegenerateParts = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorGeneralize

Generalize geometries using Douglas-Peucker algorithm.

Consider using ST_Generalize

Configuration
Optional

ProcessorGeodeticArea v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorGeodeticArea
 curveType = 0
 fieldName = geodetic_area
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorGeodeticArea(
    df,
    curveType=0,
    fieldName="geodetic_area",
    shapeField="SHAPE",
    cache = False)

# v2.3, v.2.4
spark.bdt.geodeticArea(
    df,
    curveType=0,
    fieldName="geodetic_area",
    shapeField="SHAPE",
    cache = False)

com.esri.processor.ProcessorGeodeticArea

Processor to calculate the geodetic areas of the input geometries. This is more reliable measurement of area than ProcessorArea because it takes into account the Earth's curvature.

Possible curve type values:

Shortest distance between two points on an ellipsoide int Geodesic = 0;

A line of constant bearing or azimuth. Also known as a rhmub line int Loxodrome = 1;

The line on a spheroid defined along the intersection at the surface by a plane that passes through the center of the spheroid. When the spheroid flattening is equal to zero (sphere) then a Great Elliptic is a Great Circle int GreatElliptic = 2; int NormalSection = 3;

The ShapePreserving type means the segments shapes are preserved in the spatial reference where they are defined. The behavior of the ShapePreserving type can be emulated by densifying the geometry with a small step, and then calling a geodetic method using Geodesic or GreatElliptic curve types. int ShapePreserving = 4;

Consider using ST_GeodeticArea

Configuration
Optional

ProcessorHex v2


Hocon

{
 type = com.esri.processor.ProcessorHex
 input = _default
 output = _default
 shapeField = SHAPE
 hexFields = [
  {name = "H100", size = 100}
  {name = "H200", size = 200}
 ]
 cache = false
}

# Python

# v2.2
spark.bdt.processorHex(
    df,
    hexFields = {"H100": 100.0, "H200": 200.0},
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.hex(
    df,
    hexFields = {"H100": 100.0, "H200": 200.0},
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorHex

Enrich a point DataFrame with hex column and row values for the given sizes. The input DataFrame must be the point type and its spatial reference should be WGS84 (4326). The size is in meters.

Consider using ST_AsHex

Configuration
Required
Optional

ProcessorHexToPolygon v2


Hocon

{
 type = com.esri.processor.ProcessorHexToPolygon
 input = _default
 output = _default
 hexField = H100
 size = 100
 shapeField = SHAPE
 cache = false
 keep = false
}

# Python

# v2.2
spark.bdt.processorHexToPolygon(
    df,
    hexField = "H100",
    size = 100.0,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.hexToPolygon(
    df,
    hexField = "H100",
    size = 100.0,
    keep = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorHexToPolygon

Replace or create the SHAPE struct field containing hexagon polygons based on the configured hex column row field. The outgoing DataFrame will have the spatial reference of 3857 (102100) WGS84 Web Mercator (Auxiliary Sphere).

https://developers.arcgis.com/web-map-specification/objects/spatialReference/

The input DataFrame does not need to have SHAPE struct field since this processor only depends on the existing hex field.

Consider using ST_FromHex

Configuration
Required
Optional

ProcessorIntersection v2


Hocon

{
 type = com.esri.processor.ProcessorIntersection
 inputs = [intersector, inGeoms]
 output = _default
 mask = -1
 cellSize = 0.1
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 extent = [-180.0, -90.0, 180.0, 90.0]
 cache = false
}

# Python

# v2.2
spark.bdt.processorIntersection(
    df1,
    df2,
    cellSize = 0.1,
    mask = -1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    extentList = [-180.0, -90.0, 180.0, 90.0],
    cache = False)

# v2.3
spark.bdt.intersection(
    df1,
    df2,
    cellSize = 0.1,
    mask = -1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    extentList = [-180.0, -90.0, 180.0, 90.0],
    cache = False)

com.esri.processor.ProcessorIntersection

Produce the intersection geometries of two spatial datasets.

This is the equivalent one to the ProcessorIntersection2 in BDT 1x. This implementation uses dimension mask as explained in http://esri.github.io/geometry-api-java/doc/Intersection.html

Each LHS row is an intersector. If it intersects a RHS row, a lhs row's shape is replaced by the intersection geometry and LHS attributes are enriched with the RHS attributes.

If the intersection is empty, nothing will be emitted.

Depending on the dimension mask, multiple intersections could be emitted per intersector (hence LHS row).

The outgoing DataFrame will have the SHAPE metadata same as the RHS SHAPE metadata and it's SHAPE field name will be same as the LHS SHAPE field name.

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Consider using ST_Inserection

Configuration
Required
Optional

ProcessorLength v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorLength
 lengthField = shape_length
 shapeField = SHAPE
 cache = false
}

com.esri.processor.ProcessorLength


# Python

# v2.2
spark.bdt.processorLength(
    df,
    lengthField = "shape_length",
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.length(
    df,
    lengthField = "shape_length",
    shapeField = "SHAPE",
    cache = False)

Calculate the length of a geometry.

Consider using ST_Length

Configuration
Optional

ProcessorMapMatch v2.4


Hocon


{
input = _default
output = _default
type = com.esri.processor.ProcessorMapMatch
idField = oid
timeField = dt
pidField = vid
xField = x
yField = y
snapDist = 50.0
pathDist = 300.0
microPathSize = 5
alpha = 10.0
beta = 1.0
cache = false
}

#Python

outDF = spark.bdt.mapMatch(
    df,
    idField="oid", 
    timeField="dt",
    pidField="vid",
    xField="x", 
    yField="y", 
    snapDist=50.0, 
    pathDist=300.0,
    microPathSize=5, 
    alpha=10.0, 
    beta=1.0,
    cache=False)


com.esri.processor.ProcessorMapMatch

Sequentially snap a sequence of noisy gps points to their correct path vectors. Depending on the context, "path" is used to refer to both (a) a path of noisy gps points and (b) a path of vectors on which the gps points are snapped

An example use case would be smart snapping gps breadcrumbs from vehicle paths to their correct streets.

A network of links are required in LMDB format, configured through the spark properties:

spark.bdt.lmdb.map.size 304857600000

spark.bdt.lmdb.path /data/LMDB_v2_Release

This processor uses Web Mercator (srid 3857). The x, y, and distance values are in Meters

Configuration
Optional

ProcessorNearestCoordinate v2.1


Hocon

{
 inputs = [points, lines]
 output = _default
 type = com.esri.processor.ProcessorNearestCoordinate
 cellSize = 1.0
 snapRadius =  1.0
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 distanceField = distance
 xField = X
 yField = Y
 isOnRightField = isOnRight
 distanceForNotSnapped = -999.0
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorNearestCoordinate(
    df1,
    df2,
    cellSize = 1.0,
    snapRadius = 1.0,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    distanceField = "distance",
    xField = "X",
    yField = "Y",
    isOnRightField = "isOnRight",
    distanceForNotSnapped = -999.0,
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.nearestCoordinate(
    df1,
    df2,
    cellSize = 1.0,
    snapRadius = 1.0,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    distanceField = "distance",
    xField = "X",
    yField = "Y",
    isOnRightField = "isOnRight",
    distanceForNotSnapped = -999.0,
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.core.processor.ProcessorNearestCoordinate

Processor to find nearest coordinate between two sources in a distributed parallel way. The features on the left-hand-side input are augmented with closest feature attributes on the right-hand-side input. In the implementation, the reduce side creates a spatial index of the right-hand-side features for fast lookup.

The LHS feature type has to be a point. The RHS feature type can be a point, polyline, or polygon type.

When RHS feature type is a point, the distance is in meters.

The input points that are snapped are kept in the output. When a point is not snapped: - The nearest_distance is set to input parameter distanceForNotSnapped (default value -999.0), - The X and Y coordinate values of the snapped point geometry are the same as the input point geometry.

If LHS feature has multiple nearest coordinates (all with equal distances), the result could be different between doBroadcast=False or doBroadcast= True. Both results are valid.

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorPointExclude v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorPointExclude
 xmin = -10
 ymin = -10
 xmax = 10
 ymax = 10
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorPointExclude(
    df,
    xmin = -10.0,
    ymin = -10.0,
    xmax = 10.0,
    ymax = 10.0,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.pointExclude(
    df,
    xmin = -10.0,
    ymin = -10.0,
    xmax = 10.0,
    ymax = 10.0,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorPointExclude

Filter point features that are not in the specified extent, excluding points on the extent boundary.

Configuration
Required
Optional

ProcessorPointInclude v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorPointInclude
 xmin = -10
 ymin = -10
 xmax = 10
 ymax = 10
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorPointInclude(
    df,
    xmin = -10.0,
    ymin = -10.0,
    xmax = 10.0,
    ymax = 10.0,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.pointInclude(
    df,
    xmin = -10.0,
    ymin = -10.0,
    xmax = 10.0,
    ymax = 10.0,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorPointInclude

Filter point features that are in the specified extent, including points on the extent boundary.

Configuration
Required
Optional

ProcessorPointInPolygon v2


Hocon

{
 inputs = [taxi, zip]
 output = _default
 type = com.esri.processor.ProcessorPointInPolygon
 cellSize = 1.0
 emitPippedOnly = true
 clip = true
 doBroadcast = false
 pointShapeField = SHAPE
 polygonShapeField = SHAPE
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorPointInPolygon(
    df1,
    df2,
    1.0,
    emitPippedOnly = True,
    take = 1,
    clip = True,
    doBroadcast = False,
    pointShapeField = "SHAPE",
    polygonShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.pointInPolygon(
    df1,
    df2,
    1.0,
    emitPippedOnly = True,
    take = 1,
    clip = True,
    doBroadcast = False,
    pointShapeField = "SHAPE",
    polygonShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorPointInPolygon

Enrich a source point feature with the attributes from a target polygon feature which contains the source point feature. A point is enriched if it is inside a polygon. A point is not enriched if it is on an edge or outside of polygon. The first DataFrame must be of Point type and the second polygon must be of Polygon type.

If you know the outgoing DataFrame will be re-used, cache configuration would help.

The clip parameter is only relevant if doBroadcast is false.

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorPowerSetStats v2.2

Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.summary.ProcessorPowerSetStats
 statsFields = ["field1", "field2"]
 geographyIds = ["geoId1", "geoId2"]
 categoryIds = ["catId1", "catId2"]
 idField = "ids"
 cache = false
}
Hocon

{
 input = points_source
 output = pssoverlaps
 type = com.esri.processor.ProcessorPowerSetStats
 statsFields = [TotalInsuredValue,TotalReplacementValue,YearBuilt]
 geographyIds = [FID,ST_ID,CY_ID,ZP_ID]
 categoryIds = [Quarter,LineOfBusiness]
 idField = Points_ID
 cache = false
}

# Python

# v2.2
spark.bdt.processorPowerSetStats(
    df,
    statsFields = ["TotalInsuredValue","TotalReplacementValue","YearBuilt"],
    geographyIds = ["FID","ST_ID","CY_ID","ZP_ID"],
    categoryIds = ["Quarter","LineOfBusiness"],
    idField = "Points_ID",
    cache = False)

# v2.3
spark.bdt.powerSetStats(
    df,
    statsFields = ["TotalInsuredValue","TotalReplacementValue","YearBuilt"],
    geographyIds = ["FID","ST_ID","CY_ID","ZP_ID"],
    categoryIds = ["Quarter","LineOfBusiness"],
    idField = "Points_ID",
    cache = False)

com.esri.processor.ProcessorPowerSetStats

Processor to produce summary statistics for combinations of columns with categorical values. Combinations will be limited to those that appear in rows of the table.

The categorical columns from which combinations are created are specified by categoryIds and geographyIds inputs. Only one value will be taken from the value space of a given categorical column per combination. If no value is specified for a categorical column, then "ALL" will be put in its place, and the whole value space of that column will be considered for that combination.

If an idField is specified and repeated ids are found, then statistics will only be counted once per id for a given combination. This prevents double-counting, triple-counting, or any amount of over-counting that would skew a statistic from its true value.

Designed for AccumulationEngine

Configuration
Required
Optional

ProcessorProject v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorProject
 fromWk = 4326
 toWk = 3857
 shapeField = SHAPE
 cache = true
}

Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorProject
 fromWk = "GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]"
 toWk = 3857
 shapeField = SHAPE
 cache = true
}

Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorProject
 fromWk = 4326
 toWk = "PROJCS[\"WGS_1984_Web_Mercator_Auxiliary_Sphere\",GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Mercator_Auxiliary_Sphere\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",0.0],PARAMETER[\"Standard_Parallel_1\",0.0],PARAMETER[\"Auxiliary_Sphere_Type\",0.0],UNIT[\"Meter\",1.0]]"
 shapeField = SHAPE
 cache = true
}
{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorProject
 fromWk = "GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]]"
 toWk = "PROJCS[\"WGS_1984_Web_Mercator_Auxiliary_Sphere\",GEOGCS[\"GCS_WGS_1984\",DATUM[\"D_WGS_1984\",SPHEROID[\"WGS_1984\",6378137.0,298.257223563]],PRIMEM[\"Greenwich\",0.0],UNIT[\"Degree\",0.0174532925199433]],PROJECTION[\"Mercator_Auxiliary_Sphere\"],PARAMETER[\"False_Easting\",0.0],PARAMETER[\"False_Northing\",0.0],PARAMETER[\"Central_Meridian\",0.0],PARAMETER[\"Standard_Parallel_1\",0.0],PARAMETER[\"Auxiliary_Sphere_Type\",0.0],UNIT[\"Meter\",1.0]]"
 shapeField = SHAPE
 cache = true
}

# Python

# v2.2
spark.bdt.processorProject(
    df,
    fromWk = "4326",
    toWk = "3857",
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.project(
    df,
    fromWk = "4326",
    toWk = "3857",
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorProject v2

Re-Project a geometry coordinate from a spatial reference to another spatial reference.

Consider using ST_Project

Configuration
Required
Optional

ProcessorRelation v2


Hocon

{
 inputs = [df1, df2]
 output = _default
 type = com.esri.processor.ProcessorRelation
 cellSize = 0.1
 relation = TOUCHES
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
 cache = false
}

# Python

# v2.2
spark.bdt.processorRelation(
    df1,
    df2,
    cellSize = 0.1,
    relation = "TOUCHES",
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.relation(
    df1,
    df2,
    cellSize = 0.1,
    relation = "TOUCHES",
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorRelation

Enrich left hand side row with right hand side row if the specified spatial relation is met. It's possible that the left hand side row will be emitted multiple time when there are multiple right hand side rows satisfying the specified spatial relationship. When a left hand side row finds no right hand side rows satisfying the specified spatial relationship, it won't be emitted.

The shape struct in the emitted row will retains the left hand side shape struct.

The possible spatial relationships are: CONTAINS, EQUALS, TOUCHES, DISJOINT, INTERSECTS, CROSSES, OVERLAPS, WITHIN.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Consider using ST_Relate

Configuration
Required
Optional

ProcessorRouteClosestTarget v2.4

Hocon

{
 inputs = [origins, destinations]
 output = routes
 type = com.esri.processor.ProcessorRouteClosestTarget
 maxMinutes = 30
 maxMeters = 60375
 oxField = OX
 oyField = OY
 dxField = DX
 dyField = DY
 origTolerance = 500.0
 destTolerance = 500.0
 allNodesAtLocation = true
 costOnly = false
 cache = false
}
#Python

df = spark.bdt.routeClosestTarget(
    origin_df,
    dest_df,
    maxMinutes = 30.0,
    maxMeters = 60375.0,
    oxField = "OX",
    oyField = "OY",
    dxField = "DX",
    dyField = "DY",
    origTolerance = 500.0,
    destTolerance = 500.0,
    allNodesAtLocation = False,
    costOnly = False,
    cache = False
    )

com.esri.processor.ProcessorRouteClosestTarget

For every origin point, find the closest destination point on the road network that is either less than or equal to maxMinutes or maxMeters. The cost is Meters.

The output dataframe will include all columns from both inputs therefore column names must be unique across both inputs. For example, do not include Shape columns in the input and make the X and Y fields unique e.g. OX, OY for origins and DX, DY for destinations.

A network is required in LMDB format, configured through the spark properties:

spark.bdt.lmdb.map.size 304857600000

spark.bdt.lmdb.path /data/LMDB_v2_Release

Configuration
Required
Optional

ProcessorSelectFields v2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorSelectFields
 exprs = ["floor(tip_amount/100)", "floor(total_amount/100)"]
}

# Python

# v2.2
spark.bdt.processorSelectFields(
    df,
    ["floor(tip_amount/100)", "floor(total_amount/100)"],
    cache = False)

# v2.3
spark.bdt.selectFields(
    df,
    ["floor(tip_amount/100)", "floor(total_amount/100)"],
    cache = False)

com.esri.processor.ProcessorSelectFields

Transform a DataFrame with the configured expressions only. This processor is useful for hocon. Consider using Spark SQL or select expressions instead when using the Python bindings.

See the Spark Documentation

Configuration
Required
Optional

ProcessorSelfJoinStatistics v2.2


Hocon

{
  input = _default
  output = _default
  type = com.esri.processor.ProcessorSelfJoinStatistics
  cellSize = 1
  radius = 1
  statisticsFields = [field1, field2]
  emitEnrichedOnly = false
  shapeField = SHAPE
  cache = false
  extent = [-180.0, -90.0, 180.0, 90.0]
  depth = 16
}

# Python

# v2.2
spark.bdt.processorSelfJoinStatistics(
    df,
    1.0,
    1.0,
    ["field1", "field2"],
    emitEnrichedOnly = False,
    shapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.selfJoinStatistics(
    df,
    1.0,
    1.0,
    ["field1", "field2"],
    emitEnrichedOnly = False,
    shapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.ProcessorSelfJoinStatistics

Processor to enrich the single dataset with the statistics for neighbors found within the given radius. For example, for a given feature, find all other features within the search radius, calculate COUNT,MEAN,STDEV,MAX,MIN,SUM for each configured field and append them. Each feature does not consider itself a neighbor within its search radius. This processor works with point, polyline and polygon datasets. If emitEnrichedOnly is set to false, when no rhs features are found, a lhs feature is enriched with null values. When emitEnrichedOnly is set to true, when no rhs features are found, the lhs feature is not emitted in the output.

For each statistics field, the following statistics are computed and appended.

{fieldName}_COUNT {fieldName}_MEAN {fieldName}_STDEV {fieldName}_MAX {fieldName}_MIN {fieldName}_SUM

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorServiceArea v2.4


  Hocon
{
  input = _default
  output = _default
  type = com.esri.processor.ProcessorServiceArea
  xField = X
  yField = Y
  noOfCosts = 5
  hexSize = 125
  shapeField = SHAPE
  cache = false
}

# Python

# v2.4
spark.bdt.serviceArea(
    df,
    xField="X",
    yField="Y",
    noOfCosts=5,
    hexSize= 125,
    costField="SERVICE_AREA_COST",
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorServiceArea

You must have the X field & Y field explicitly in the 1st level. In other words, we don't support the nested fields such as SHAPE.XMIN. If you wish to use the nested fields, please bring them to the 1st level using [[ProcessorSQL]] before entering this processor. (maybe less data to carry, the better...)

Therefore, the input DataFrame can be a standalone DataFrame or a spatial DataFrame. It just drops the specified shape field regardless whether the shape field exists or not.

Generates service areas per X & Y (in 3857). The cost is based on the time to travel. 0 to 1 minute, 1 to 2 minute, 2 to 3 minute, and all the way up to noOfCosts - 1 to noOfCosts minute service areas are created.

Hence, an input row will be exploded into noOfCost rows. Each row will be appended with the origination hex id, the SERVICE_AREA_COST field, for example, 0-1, 1-2,.., array of the destination hex ids, and it will have a polygon which could be empty polygon.

This processor snaps the given X, Y to the closest the street, walks along that the network by the (hexSize / 2.0) meter interval until it reaches the last max cost. Once we have the points along, we turn them into the hexagons and union them to create service areas.

In the future, there will be options to specify the type of service area polygons such as concave hull, convex hull, hexagon-based (current).

The input must have X & Y in 3857. Since it requires the explicit X & Y, the requirement of [[SHAPE_FIELD]] in the input DataFrame is not required.

The output appends the following columns:

The reason that the SERVICE_AREA_COST is [[DoubleType]] is that we can support the time cost such as 1.5 minutes when required.

The insertion order is very important.

The following application variables are required:

https://github.com/lmdbjava/lmdbjava/blob/master/src/test/java/org/lmdbjava/TutorialTest.java#L90

Configuration
Required
Optional

ProcessorSimplify v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorSimplify
 forceSimplify = false
 ogc = false
 shapeField = SHAPE
 cache = false
}

# Python

# v2.2
spark.bdt.processorSimplify(
    df,
    forceSimplify = False,
    ogc = False,
    shapeField = "SHAPE",
    cache = False)

# v2.3
spark.bdt.simplify(
    df,
    forceSimplify = False,
    ogc = False,
    shapeField = "SHAPE",
    cache = False)

com.esri.processor.ProcessorSimplify

Simplify a geometry to make it valid for a Geodatabase to store without additional processing.

WKB and WKT are OGC interchange formats and in general requires for the geometry to be topologically simple. When storing geometry that may be non-simple WKB or WKT, call simplify. WKB is better for interchange with OGC services. If you are interoperating with OGC services, use WKB format (or WKT) with the option: ogc = true.

Consider using ST_Simplify

Configuration
Optional

ProcessorSQL v2


Hocon

{
 inputs = [_default]
 output = _default
 type = com.esri.processor.ProcessorSQL
 sql = "SELECT SUM(total_amount) FROM _default"
 cache = false
}

# Python

# v2.2
spark.bdt.processorSQL(
    df,
    sql = "SELECT SUM(total_amount)",
    cache = False)

# v2.3
spark.bdt.SQL(
    df,
    sql = "SELECT SUM(total_amount)",
    cache = False)

com.esri.processor.ProcessorSQL

Processor that takes SQL and execute.

See the Spark Documentation

Configuration
Required
Optional

ProcessorSummarizeWithin v2.2


Hocon

{
 inputs = [_default, _default]
 output = _default
 type = com.esri.processor.ProcessorSummarizeWithin
 cellSize = 0.1
 statsFields = [field1, field2]
 keepGeometry = false
 emitContainsOnly = false
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorSummarizeWithin(
    df1,
    df2,
    0.1,
    ["field1", "field2"],
    keepGeometry = False,
    emitContainsOnly = False,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.summarizeWithin(
    df1,
    df2,
    0.1,
    ["field1", "field2"],
    keepGeometry = False,
    emitContainsOnly = False,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.SummarizeWithin

Processor to produce summary statistics of all lhs features that are contained in a rhs feature.

statsFields are fields from the lhs DataFrame that must have numeric values.

For each statistic field, the following lhs statistics are computed and appended to each unique rhs row.

{fieldName}_COUNT {fieldName}_MEAN {fieldName}_STDEV {fieldName}_MAX {fieldName}_MIN {fieldName}_SUM

When emitContainsOnly is set to true, only summary statistics of lhs geometries that are contained within a rhsGeometry are in the output. This is the default. Setting emitContainsOnly to false will output all rhsGeometries, even those that do not contain lhs geometries. The lhs statisticsFields will be populated with count 0 and null values. IMPORTANT: Setting emitContainsOnly to false will result in non-trivial time cost.

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorSymmetricDifference v2.2


Hocon

{
 inputs = [_default, _default]
 output = _default
 type = com.esri.processor.ProcessorSymmetricDifference
 cellSize = 0.1
 doBroadcast = false
 lhsShapeField = SHAPE
 rhsShapeField = SHAPE
 cache = false
 extent = [-180.0, -90.0, 180.0, 90.0]
 depth = 16
}

# Python

# v2.2
spark.bdt.processorSymmetricDifference(
    df1,
    df2,
    0.1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

# v2.3
spark.bdt.symmetricDifference(
    df1,
    df2,
    0.1,
    doBroadcast = False,
    lhsShapeField = "SHAPE",
    rhsShapeField = "SHAPE",
    cache = False,
    extentList = [-180.0, -90.0, 180.0, 90.0],
    depth = 16)

com.esri.processor.SymmetricDifference

Per each left hand side (lhs) feature, find all relevant right hand side (rhs) features and take the set-theoretic symmetric difference of each rhs feature with the lhs feature. For the LHS feature type, it can be point, polyline, polygon types. For the RHS feature type, it can be point, polyline, polygon types.

The symmetric difference of two geometries A and B is defined as the portions of A and B that to do not intersect with each other.

When n relevant rhs features are found, the lhs feature is emitted n times, and the lhs geometry is replaced with the difference of the lhs geometry and rhs geometry. When a lhs feature has no overlapping rhs features, nothing is emitted. The length output table could be less than the amount of input lhs features.

The final schema is: lhs attributes + [[SHAPE_FIELD]] + rhs attributes.

If doBroadcast is true and there are null values in the shape column of either of the input dataframes, a NullPointerException will be thrown. These null values should be handled before using this processor. If doBroadcast=false, then rows with null values for shape column are implicitly handled and will be dropped.

Cell size must not be smaller than the following values:

For Spatial Reference WGS 84 GPS: 0.000000167638064 degrees For Spatial Reference WGS 1984 Web Mercator: 0.0186264515 meters

Configuration
Required
Optional

ProcessorTimeFilter v2.1


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorTimeFilter
 timestampField = time
 minTimestamp = "1997-11-15 00:00:00"
 maxTimestamp = "1997-11-17 00:00:00"
 cache = false
}

# Python

# v2.2
spark.bdt.processorTimeFilter(
    df,
    timeField = "time",
    minTimestamp = "1997-11-15 00:00:00",
    maxTimestamp = "1997-11-17 00:00:00",
    cache = False)

# v2.3
spark.bdt.timeFilter(
    df,
    timeField = "time",
    minTimestamp = "1997-11-15 00:00:00",
    maxTimestamp = "1997-11-17 00:00:00",
    cache = False)

com.esri.processor.ProcessorTimeFilter

Filter rows in the DataFrame based on a time field. The attribute time value has to be strictly between minTimestamp and maxTimestamp. The attribute time values must all be in either timestamp, date, or unix millisecond format.

Configuration
Required
Optional

ProcessorUnivariateOutliers v2.2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorUnivariateOutliers
 attributeField = price
 factor = 1.0
 cache = false
}

# Python

# v2.2
spark.bdt.processorUnivariateOutliers(
    df,
    attributeField = "price",
    factor = 1.0,
    cache = False)

# v2.3
spark.bdt.univariateOutliers(
    df,
    attributeField = "price",
    factor = 1.0,
    cache = False)

com.esri.processor.ProcessorUnivariateOutliers

Processor to identify univariate outliers based on mean and standard deviation. A feature is an outlier if it field value is beyond mean +/- factor * standard deviation.

Configuration
Required
Optional

ProcessorWebMercator v2.2


Hocon

{
 input = _default
 output = _default
 type = com.esri.processor.ProcessorWebMercator
 latField = lat
 lonField = lon
 xField = X
 yField = Y
 cache = false
}

# Python

# v2.2
spark.bdt.processorWebMercator(
    df,
    latField = "lat",
    lonField = "lon",
    xField = "X",
    yField = "Y",
    cache = False)

# v2.3
spark.bdt.webMercator(
    df,
    latField = "lat",
    lonField = "lon",
    xField = "X",
    yField = "Y",
    cache = False)

com.esri.processor.ProcessorWebMercator

Add x and y WebMercator coordinates to the output DataFrame. The input dataframe must have Latitude and Longitude columns.

Consider using ST_WebMercator

Configuration
Required
Optional

Sinks v2

SinkGeneric v2


Hocon

{
 input = _input
 type = com.esri.sink.SinkGeneric
 format = csv
 options = {
  path = /tmp/sink
 }
 mode = Overwrite
 shapeField = SHAPE
 numPartitions = 8
 debug = true
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sinkGeneric(
    df,
    format = "csv",
    options = { "path":"/tmp/sink" },
    mode = "Overwrite",
    shapeField  = "SHAPE",
    numPartitions = 8,
    debug = True)


# Spark API: CSV Format. Internal SHAPE Struct has to be converted to WKT before writing to disk.
df\
    .selectExpr("ST_AsText(SHAPE) as WKT", "*") \
    .repartition(8) \
    .write \
    .format("csv") \
    .mode("Overwrite") \
    .options({"path": "/tmp/sink"}) \
    .save()

# Spark API: All Other Formats
df\
    .repartition(8) \
    .write \
    .format("parquet") \
    .mode("Overwrite") \
    .options({"path": "/tmp/sink"}) \
    .save()

com.esri.sink.SinkGeneric

Persist to the storage.

Required

optional

SinkConsole v2


Hocon

{
 input = $DEFAULT_INPUT
 type = com.esri.sink.SinkConsole
 numRows = 2
 truncate = false
 printSchema = true
 debug = true
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sinkConsole(
    df,
    numRows = 2,
    truncate = False,
    printSchema = True,
    debug = True)

# Spark API
printSchema = True
debug = True
if debug:
    df.doDebug()
if printSchema:
    df.printSchema()
df.show(numRows=2, truncate=False)

com.esri.sink.SinkConsole

Print the DataFrame to stdout.

Configuration

Optional

SinkEGDB v2


Hocon

{
 input = _default
 type = com.esri.sink.SinkEGDB
 options = {
   numPartitions = 8
   url = "$url"
   user = $username
   password = "$password"
   dbtable = DBO.test
   truncate = false
 }
 postSQL = [
   "ALTER TABLE DBO.test ADD ID INT IDENTITY(1,1) NOT NULL",
   "ALTER TABLE DBO.test ALTER COLUMN Shape geometry NOT NULL",
   "ALTER TABLE DBO.test ADD CONSTRAINT PK_test PRIMARY KEY CLUSTERED (ID)",
   "CREATE SPATIAL INDEX SIndx_test ON DBO.test(SHAPE) WITH ( BOUNDING_BOX = ( -14786743.1218, 2239452.0547, -5961629.5841, 6700928.5216 ) )"
 ]
 shapeField = $SHAPE_FIELD
 geometryField = Shape
 mode = Overwrite
 debug = false
}

# Python

spark.bdt.sinkEGDB(
    df,
    options = {
        "numPartitions": "8",
        "url": f"{url}",
        "user": f"{username}",
        "password": f"{password}",
        "dbtable": "DBO.test",
        "truncate": "false"
    },
    postSQL = [
        "ALTER TABLE DBO.test ADD ID INT IDENTITY(1,1) NOT NULL",
        "ALTER TABLE DBO.test ALTER COLUMN Shape geometry NOT NULL",
        "ALTER TABLE DBO.test ADD CONSTRAINT PK_test PRIMARY KEY CLUSTERED (ID)",
        "CREATE SPATIAL INDEX SIndx_test ON DBO.test(SHAPE) WITH ( BOUNDING_BOX = ( -14786743.1218, 2239452.0547, -5961629.5841, 6700928.5216 ) )"
        ],
    shapeField  = "SHAPE",
    geometryField = "Shape",
    mode = "Overwrite",
    debug = False)

com.esri.sink.SinkEGDB

Persist to the enterprise geodatabase (aka relational database). Use this sink if the input DataFrame has the [[SHAPE_STRUCT]]. Otherwise, use SinkGeneric to persist to the relational database.

The numPartitions parameter in the options dictionary should be carefully selected. It controls the maximal number of concurrent JDBC database connections, and by default is set to the amount of partitions of the input DataFrame. You can find additional options in the Spark SQL guide to JDBC

note: PostSQL will not be run when the save mode is Overwrite and truncate is set to true. More broadly, PostSQL will only be run when a new table is created.

Green means yes PostSQL is run, red means no:

Configuration

Required

Optional

SinkTable v2


Hocon

{
 input = _input
 type = com.esri.sink.SinkTable
 format = csv
 options = {
  path = /tmp/sink
 }
 mode = Overwrite
 shapeField = SHAPE
 numPartitions = 8
 debug = true
}

# Python

# Deprecated as of v2.4. Use the Spark API instead.
spark.bdt.sinkTable(
    df,
    format = "csv",
    options = { "path": "/tmp/sink" },
    mode = "Overwrite",
    shapeField = "SHAPE",
    numPartitions = 8,
    debug = False)

# Spark API: CSV Format. Internal SHAPE Struct has to be converted to WKT before writing to disk.
df\
    .selectExpr("ST_AsText(SHAPE_2) as WKT", "*") \
    .repartition(8) \
    .write \
    .format("csv") \
    .mode("Overwrite") \
    .options({"path": "/tmp/sink"}) \
    .saveAsTable("tableName")

# Spark API: All Other Formats
df\
    .repartition(8) \
    .write \
    .format("parquet") \
    .mode("Overwrite") \
    .options({"path": "/tmp/sink"}) \
    .saveAsTable("tableName")

com.esri.sink.SinkTable

Persist DataFrame as spark table.

Configuration
Required
Optional

SinkBDS v2.2


Hocon

{
 inputs = [policy, highways, zipcodes]
 type = com.esri.sink.bds.SinkBDS
 geometryTypes = [esriGeometryPoint, esriGeometryPolyline, esriGeometryPolygon]
 dataSourceNames = [policy, highways, zipcodes]
 username = abc
 password = def
 machines = ["myMachine:9300:9220"]
 clusterId = myClusterId
 replicationFactor = 0
 numShards = -1
 responsePath = /tmp/sink
}

Hocon

{
 inputs = [taxi]
 type = com.esri.sink.bds.SinkBDS
 geometryTypes = [esriGeometryPoint]
 dataSourceNames = [taxi]
 timeInfos = {
  "taxi" : {
    "startTimeField" : "tpep_pickup_datetime",
    "endTimeField" : "tpep_dropoff_datetime",
    "hasLiveData" : false,
    "timeReference" : {
      "timeZone" : "UTC",
      "respectsDaylightSaving" : false
   },
    "timeInterval" : 1,
    "timeIntervalUnits" : "esriTimeUnitsHours"
  }
 }
 resolutions = {
  "taxi" : {
    "precision" : "50m",
    "distance_error_pct" : 0.025
  }
 }
 username = abc
 password = def
 machines = ["myServer:9300:9220"]
 clusterId = myClusterId
 replicationFactor = 0
 numShards = -1
 responsePath = /tmp/sink
}

com.esri.ext.SinkBDS

Note: Requires ArcGIS Enterprise 10.8.1

Sink to persist one or more feature classes to Spatiotemporal Big Data Store. Spatial references of the inputs must be 4326. SinkBDS writes the summary for the newly created BDS data sources to responsePath such as, per input, datasource name, spatial extent, big data store schema, timeInfo and record count.

When a layer has no records, a data source is created in BDS but it is not saved to BDS as it errors out. But responsePath with contain all layer information.

The client-tier application or middle-tier service must be responsible for managing the portal including creating/updating the (existing) feature/map service using the summary information provided and ArcGIS REST Services.

Note: If a data source of identical name to the corresponding dataSourceName already exists in the data store, then an error will be thrown.

Configuration

Note: BDS username, password, and clusterid can be obtained by running the listadminusers command on the Spatiotemporal Big DataStore machine. Find this utiliy under <ArcGIS Data Store installation directory>/datastore/tools

Required
Optional

Spatial Type Functions

You can use these functions in ProcessorSQL or also you can use this in the SourceGeneric with selectExpr. These functions can also be used in a Notebook environment.

GeoToH3

SELECT GeoToH3(lat, lon, res) FROM df
Long

org.apache.spark.sql.catalyst.expressions.GeoToH3

For Uber H3. Return the conversion of Latitude and Longitude values to an H3 index.

since v2.3

Arguments:

H3Distance

SELECT H3Distance(h3Origin, h3idx) FROM df
Int

org.apache.spark.sql.catalyst.expressions.H3Distance

For Uber H3. Return the distance in H3 grid cell hexagons between the two H3 indicies.

since v2.3

Arguments:

H3kRing

SELECT H3kRing(h3Idx, k) FROM df
Array[Long]

org.apache.spark.sql.catalyst.expressions.H3kRing

For Uber H3. Return a collection of the H3 Indicies within k distance of the origin H3 index.

since v2.3

Arguments:

H3kRingDistances

SELECT H3kRingDistances(h3Idx, k) FROM df
Array[Array[Long]]

org.apache.spark.sql.catalyst.expressions.H3kRingDistances

For Uber H3. Return a collection of collections of the H3 Indicies within k distance of the origin H3 index.

since v2.4

Arguments:

H3ToChildren

SELECT H3ToChildren(h3Idx, resolution) FROM df
Array[Long]

org.apache.spark.sql.catalyst.expressions.H3ToChildren

For Uber H3. Return a collection of the children H3 Indicies of an H3 Index.

since v2.3

Arguments:

H3ToGeo

SELECT H3ToGeo(h3Idx) FROM df
Array[Double]

org.apache.spark.sql.catalyst.expressions.H3ToGeo

For Uber H3. Return the conversion of an H3 index to its Latitude and Longitude centroid.

since v2.3

Arguments:

H3ToGeoBoundary

SELECT H3ToGeoBoundary(h3idx) from df
Array[Double]

org.apache.spark.sql.catalyst.expressions.H3ToGeoBoundary

For Uber H3. Return a collection of the hexagon boundary latitude and longitude values for an H3 index. The output is ordered as follows: [x1, y1, x2, y2, x3, y3, ...]

since v2.3

Arguments:

H3ToParent

SELECT H3ToParent(h3idx, res) FROM df
Long

org.apache.spark.sql.catalyst.expressions.H3ToParent

For Uber H3. Return the parent H3 Index containing the given H3 Index.

since v2.3

Arguments:

H3ToString

SELECT H3ToString(h3idx) FROM df
string

org.apache.spark.sql.catalyst.expressions.H3ToString

For Uber H3. Return the conversion of an H3 index to its string representation.

since v2.3

Arguments:

LatToY

SELECT LatToY(lat) FROM LatLonDF
Double

org.apache.spark.sql.catalyst.expressions.LatToY

Return the conversion of a Latitude value to a WebMercator Y value.

Arguments:

LonToX

SELECT LonToX(lon) FROM LonLatDF
Double

org.apache.spark.sql.catalyst.expressions.LonToX

Return the conversion of a Longitude value to a WebMercator X value.

Arguments:

ST_Area

SELECT ST_Area(A.SHAPE) FROM polygons A
wkt2 = """MULTIPOLYGON (((5148012 4080116, 4534599 3838522, 4985676 2684166, 
        7448886 2291969, 6703639 4769064, 5148012 4080116)))"""

df = spark.sql(f"""
  SELECT ST_Area(ST_FromText('{wkt2}')) AREA
  """).show()
+------------------+
|              AREA|
+------------------+
|4.3430662295895E12|
+------------------+

Double

org.apache.spark.sql.catalyst.expressions.STArea

Return the area of a given shape struct. The area will be calculated in units of spatial reference.

since v2.1

Arguments:

Returns:

ST_AsGeoJSON

SELECT ST_AsGeoJSON(t1.shape) FROM t1
String

org.apache.spark.sql.catalyst.expressions.STAsGeoJSON

Return GeoJSON representation of the given shape struct.

Arguments:

ST_AsHex

SELECT
    *,
    ST_AsHex(pickup_longitude, pickup_latitude, 200.0) as H200,
    ST_AsHex(pickup_longitude, pickup_latitude, 500.0) as H500
FROM
    src_taxi

org.apache.spark.sql.catalyst.expressions.STAsHex

For a given longitude, latitude size in meter, return a hex qr code string delimited by :.

Arguments:

ST_AsQR

SELECT ST_AsQR(SHAPE, 1.0) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of long values.

org.apache.spark.sql.catalyst.expressions.STAsQR Return an array of array of two long values, representing row and column, of shape struct. This is optimized for 2-dimensions only.

Arguments:

since v2.3

ST_AsQRClip

SELECT inline(ST_AsQRClip(SHAPE, 1.0, 4326)) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))' wkt))
Array of InternalRow with LongType and SHAPE Struct.

org.apache.spark.sql.catalyst.expressions.STAsQRClip

Return an array of struct with QR and clipped SHAPE struct. If a clipped SHAPE struct is completely within the geometry, wkb will be empty but the extent will be populated accordingly. Use inline() function to flatten out. If the DataFrame has the default SHAPE column before calling this function, there will be two SHAPE columns in the DataFrame after calling the function. It's the user's responsibility to rename the columns appropriately to avoid the column name collisions.

Currently, this function is used by the ProcessorPointInPolygon(clip = true). Each clip is inflated a bit by (cellSize / 100). We are doing so to take care of the case where points are sitting on the edge of completely inner clip or sitting on the inner edge of the clip that is NOT completely inside the polygon.

Arguments:

since v2.3

ST_AsQRPrune

SELECT explode(ST_AsQRPrune(SHAPE, 1.0, 4326, 0.5)) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of long values.

org.apache.spark.sql.catalyst.expressions.STAsQRPrune

Create and return an array of QR values for the given shape struct. QR values that overlap the geometry bounding box but not the actual geometry will not be returned.

Arguments:

since v2.3

ST_AsQRSpill

SELECT ST_AsQRSpill(SHAPE, 1.0, 0.1) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of long values.

org.apache.spark.sql.catalyst.expressions.STAsQRSpill

Create and return an array of qr values of the shape struct. Spill over to neighboring cells by the specified distance.

Arguments:

since v2.3

ST_AsRC

SELECT ST_AsRC(SHAPE, 1.0) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of array of two long values.

org.apache.spark.sql.catalyst.expressions.STAsRC

Return an array of array of two long values, representing row and column, of shape struct.

Arguments:

ST_AsRCClip

SELECT ST_AsRCClip(SHAPE, 1.0, 4326) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))' wkt))
Array of InternalRow with ArrayType(LongType) and SHAPE Struct.

org.apache.spark.sql.catalyst.expressions.STAsRCClip

Return an array of array of two long values representing row & column and a shape struct representing a clipped geometries.

Arguments:

ST_AsRCPrune

SELECT explode(ST_AsRCPrune(SHAPE, 1.0, 4326, 0.5)) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of Arrays with 2 long values.

org.apache.spark.sql.catalyst.expressions.STAsRCPrune

Create and return an array of RC value arrays for the given shape struct. RC values that overlap the geometry bounding box but not the actual geometry will not be returned.

Arguments:

since v2.3

ST_AsRCSpill

SELECT ST_AsRCSpill(SHAPE, 1.0, 0.1) FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Array of array of two long values.

org.apache.spark.sql.catalyst.expressions.STAsRCSpill

Return an array of array of two long values, representing row and column, of shape struct. * Spill over to neighboring cells by the specified distance.

Arguments:

ST_AsText

SELECT ST_AsText(t1.shape) FROM t1
String

org.apache.spark.sql.catalyst.expressions.STAsText

Return the Well-Known Text (WKT) representation of the shape struct.

Arguments:

ST_Buffer

SELECT ST_Buffer(shape, 1.0, 4326) FROM poi
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STBuffer

Create and return a buffer polygon of the shape struct as specified by the distance and wkid.

Arguments:

ST_BufferGeodesic

SELECT ST_BufferGeodesic(shape, "ShapePreserving", 1.0, 4326, 'NaN') FROM locations
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STBufferGeodesic

Create and return a geodesic buffer polygon of the shape struct as specified by the curveType, distance, wkid, and maxDeviation.

The options for curveType are:
- Geodesic
- Loxodrome
- GreatElliptic
- NormalSection
- ShapePreserving

Arguments:

since v2.3

ST_Centroid

SELECT ST_Centroid(SHAPE) FROM locations
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STCentroid

Return centroid of shape struct.

Arguments:

ST_Contains

SELECT ST_Contains(A.SHAPE, B.SHAPE, 4326) FROM polygons A, points B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STContains

Return true if one shape struct contains another shape struct.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_Coverage

SELECT ST_Coverage(A.SHAPE, B.SHAPE, 10) FROM polylines A, polylines B
Array(double, double)

org.apache.spark.sql.catalyst.expressions.STCoverage

Return the coverage fraction and coverage distance of segment1 on segment2 as an array. Array(coverageFraction, coverageDistance)

The coverage fraction represents the fraction of segement1 that is covered by segment2. Its range is [0, 1]. The coverage distance measures how close segement1 is to segment2. It takes into account the input distance threshold. The coverage distance range is [1, 0). Coverage distance 1 means the actual distance between the segments is 0. As the segments become further and further apart, the coverage distance approaches 0.

Arguments:

since v2.3

ST_CreateFishnet

(SELECT GRID_ID, ST_FromWKB(WKB) AS SHAPE FROM parquet.`src/test/resources/grid_centroids`) LATERAL VIEW explode(ST_CreateFishnet(ST_Project(SHAPE, 4326, 3857), 400, 800)) as NBR
Array of InternalRow with ArrayType(LongType) and SHAPE Struct.

Per a web mercator point, return additional points north, south, west, east directions in the given interval up to the given max distance. The input shape must be in 3857 web mercator spatial reference. The output is an array of (Seq(r,c), Shape struct). The output is in 3857 web mercator spatial reference.

consider using ST_Grid

Arguments:

ST_CreateNeighbors

(SELECT GRID_ID, ST_FromWKB(WKB) AS SHAPE FROM parquet.`src/test/resources/grid_centroids`) LATERAL VIEW explode(ST_CreateNeighbors(ST_Project(SHAPE, 4326, 3857), 400, 800)) as NBR
Array of InternalRow with ArrayType(LongType) and SHAPE Struct.

org.apache.spark.sql.catalyst.expressions.STCreateNeighbors

Per a web mercator point, return additional points north, south, west, east directions in the given interval up to the given max distance. The input shape must be in 3857 web mercator spatial reference. The output is an array of (Seq(r,c), Shape struct). The output is in 3857 web mercator spatial reference.

since v2.1

ST_Crosses

SELECT ST_Crosses(A.SHAPE, B.SHAPE, 4326) FROM line A, other_line B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STCrosses

Return true if one shape struct crosses another shape struct. Otherwise return false.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_Disjoint

SELECT ST_Disjoint(A.SHAPE, B.SHAPE, 4326) FROM polygons A, point B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STDisjoint

Return true when the two shape structs disjoint. Otherwise return false.

Geometries are Disjoint, when the intersection of the geometries is empty. The relation operation Disjoint is the opposite of the Intersects.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_Distance

SELECT ST_Distance(t1.shape, t2.shape) FROM t1, t2
Double

org.apache.spark.sql.catalyst.expressions.STDistance

Calculates the 2D planar distance between two shape structs.

Arguments:

ST_DistanceWGS84Points

SELECT ST_DistanceWGS84Points(t1.shape, t2.shape) FROM t1, t2
Double

org.apache.spark.sql.catalyst.expressions.STDistanceWGS84Points

Calculate and return the distance in meters between the two shape structs representing points in WGS84.

Arguments:

ST_DoBBoxesOverlap

SELECT *
FROM roads A, grids B
WHERE
    A.RC = B.RC
    AND
    ST_DoBBoxesOverlap(A.SHAPE, B.SHAPE)
    AND ST_Intersects(A.SHAPE, B.SHAPE, 4326)
Boolean

org.apache.spark.sql.catalyst.expressions.STDoBBoxesOverlap

Return true when the extents from the two given shape structs overlap. Return false otherwise.

This expression is used to do a simple check before executing ST_* operations such as ST_Distance, ST_Contains. If the two extents do not overlap, there is no need to execute the following ST_* operation.

Arguments:

ST_DumpXY

wkt = """POLYGON((-57.55608571669294 -11.287931275985708,-43.84514821669295 -11.287931275985708, 
-43.84514821669295 -23.020737845578587,-57.55608571669294 -23.020737845578587,
-57.55608571669294 -11.287931275985708))"""

df = spark.sql(f"""
    SELECT
        inline(array(dump))
    FROM (
         SELECT explode(ST_DumpXY(ST_FromText('{wkt}'))) dump
   )
""")
df.show()

+------------------+-------------------+
|                 x|                  y|
+------------------+-------------------+
|-57.55608571669294|-11.287931275985708|
|-43.84514821669295|-11.287931275985708|
|-43.84514821669295|-23.020737845578587|
|-57.55608571669294|-23.020737845578587|
|-57.55608571669294|  -11.2879312759857|
+------------------+-------------------+

org.apache.spark.sql.catalyst.expressions.STDumpXY

Return array of (x,y) vertex values making up the given shape struct as an array of structs. Use explode() to flatten the array. The x and y values of the struct can be accessed in SQL by calling StructColName.x and StructColName.y, where StructColName is the name of the column with values returned by explode(ST_DumpXY(...)).

since v2.4

Arguments:

Returns:

ST_DumpXYwithIndex

wkt = """POLYGON((-57.55608571669294 -11.287931275985708,-43.84514821669295 -11.287931275985708, 
-43.84514821669295 -23.020737845578587,-57.55608571669294 -23.020737845578587,
-57.55608571669294 -11.287931275985708))"""

df = spark.sql(f"""
    SELECT
        inline(array(dump)),
        ST_FromXY(x,y) SHAPE
    FROM (
         SELECT explode(ST_DumpXYWithIndex(ST_FromText('{wkt}'))) dump
   )
""").withMeta('Point', 4326)

org.apache.spark.sql.catalyst.expressions.STDumpXYWithIndex

Return array of (x,y,i) vertex values with indicies making up the given shape struct as an array of structs. Use explode() to flatten the array. The x, y, and i values of the struct can be accessed in SQL by calling StructColName.x, StructColName.y, StructColName.i, where StructColName is the name of the column with values returned by explode(ST_DumpXYWithIndex(...)).

since v2.4

Arguments:

Returns:

ST_EliminatePolygonPartPercent

SELECT ST_EliminatePolygonPartPercent(shape, 10.0) FROM locations
Shape Struct

org.apache.spark.sql.catalyst.expressions.STEliminatePolygonPartPercent

Remove the holes of a polygon whose area as a percentage of the whole polygon is less than this value.

since v2.3

Arguments:

ST_EliminatePolygonPartArea

SELECT ST_EliminatePolygonPartArea(shape, 1.0) FROM locations
Shape Struct

org.apache.spark.sql.catalyst.expressions.STEliminatePolygonPartArea

Remove the holes of a polygon that have area below the given threshold.

since v2.3

Arguments:

ST_Envelope

SELECT ST_Envelope(-180.0, -90.0, 180.0, 90.0);
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STEnvelope

Create an [[Envelope]] using xmin, ymin, xmax, ymax.

since v2.3

Arguments:

ST_Equals

SELECT ST_Equals(A.SHAPE, B.SHAPE, 4326) FROM polygons A, other_polygons B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STEquals

Returns true if two geometries are equal. Otherwise return false.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_FromFGDBPoint

SELECT ST_FromFGDBPoint(t.SHAPE) as SHAPE FROM t
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromFGDBPoint

Create and return a shape struct representing a point for the given struct from the file geodatabase.

Arguments:

ST_FromFGDBPolygon

SELECT ST_FromFGDBPolygon(t.SHAPE) as SHAPE FROM t
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromFGDBPolygon

Return a shape struct representing a polygon for the given struct from the file geodatabase.

Arguments:

ST_FromFGDBPolyline

SELECT ST_FromFGDBPolyline(t.SHAPE) as SHAPE FROM t
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromFGDBPolyline

Return a shape struct representing a polyline for the given struct from the file geodatabase.

Arguments:

ST_FromGeoJSON

SELECT ST_FromGeoJSON(t.geojson) as SHAPE FROM t
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromGeoJSON

Return a shape struct from GeoJSON.

Arguments:

ST_FromHex

SELECT
    ST_FromHex(H200, 200) as SHAPE, H200, TIP_AVG
FROM
      (
        SELECT
            *,
            ST_AsHex(pickup_longitude, pickup_latitude, 200.0) as H200
        FROM
            taxi
      )
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromHex

For a given hex qr code and size, return a polygon shape struct. The returned polygon is in the spatial reference of WGS_1984_Web_Mercator_Auxiliary_Sphere 3857 (102100).

Arguments:

ST_FromJSON

SELECT ST_FromJSON(t.json) as SHAPE FROM t
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromJSON

Return a shape struct from the given JSON.

Arguments:

ST_FromText

SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt)
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromText

Create and return a shape struct from the OGC Well-Known text representation.

Arguments:

ST_FromWKB

SELECT ST_FromWKB(SHAPE.WKB) as WKB FROM (SELECT ST_FromText(wkt) as SHAPE FROM (SELECT 'POINT(1.0 1.0)' wkt))
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromWKB

Create and return shape struct from the well-known binary representation of a geometry.

Arguments:

ST_FromXY

SELECT *, ST_FromXY(pickup_longitude, pickup_latitude) as SHAPE FROM taxi
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STFromXY

Create and return a shape struct based on the given longitude and latitude.

consider using ST_MakePoint

Arguments:

ST_Generalize

SELECT ST_Generalize(A.SHAPE, 0.0, false) FROM polygons A
Shape struct

org.apache.spark.sql.catalyst.expressions.STGeneralize

Return the generalization of a geometry in its shape struct representation.

since v2.1

Arguments:

ST_GeodeticArea

SELECT ST_GeodeticArea(A.SHAPE, 4326, 1) FROM polygons A
double

org.apache.spark.sql.catalyst.expressions.STGeodeticArea

Return the geodetic area of geometry.

since v2.1

ST_Grid

WITH T AS (SELECT explode(ST_Grid(ST_FromText('LineString(0.0 0.0, 1.0 1.0)'), 1.0)) GRID)
SELECT inline(array(GRID)) FROM T
Rows with an array of (row, column) and Shape struct.

org.apache.spark.sql.catalyst.expressions.STGrid

Returns an array of struct containing two columns: Array of long values representing row and column, and a shape struct. Use explode() and inline() function to flatten out.

since v2.3

Arguments:

ST_GridXY

WITH T AS (SELECT explode(sequence(0,2)) ID, ID X, ID Y)

SELECT inline(array(ST_GridXY(X, Y, 1.0))) FROM T
Rows with an array of (row, column) and a shape struct representing the grid to which the given point location belongs.

org.apache.spark.sql.catalyst.expressions.STGridXY

For the given longitude, latitude and cell size, return a shape struct containing an array with the row & column (RC) and a shape struct representing the grid to which the given point location belongs. Use inline() to flatten out.

since v2.3

Arguments:

ST_HaversineDistance

SELECT ST_HaversineDistance(X1, Y1, X2, Y2) AS haversine FROM T1
Double

Return the haversine distance in kilometers between two WGS84 points.

Arguments:

ST_Intersection

SELECT ST_Intersection(A.SHAPE, B.SHAPE, 4326, -1) as SHAPE FROM A, B
Array of struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STIntersection

Performs the Topological Intersection operation on the two given shape structs. Array of struct with wkb, xmin, ymin, xmax, ymax. Use explode() function to flatten out.

Use explode() function to flatten out.

Arguments:

ST_Intersects

SELECT ST_Intersects(A.SHAPE, B.SHAPE, 4326) FROM roads A, grids B
true or false

org.apache.spark.sql.catalyst.expressions.STIntersects

Return true if the two shape structs intersect. Return false otherwise.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_IsLowerLeftInCell

SELECT * FROM table1 t1 LEFT JOIN table2 t2 ON t1.RC = t2.RC where ST_IsLowerLeftInCell(t1.RC, 1.0, t1.SHAPE, t2.SHAPE))
Boolean

org.apache.spark.sql.catalyst.expressions.STIsLowerLeftInCell

Return true when the lower left corner of the intersection of the two shape envelops is in the given cell. Otherwise return false.

Arguments:

ST_IsPointInExtent

SELECT ST_IsPointInExtent(A.SHAPE,[-10, -10, 10, 10]) FROM points A
Boolean

org.apache.spark.sql.catalyst.expressions.STIsPointInExtent

Return a boolean indicating if a point is in a given extent, including points on the extent boundary.

since v2.1

Arguments:

ST_Length

SELECT ST_Length(A.SHAPE) FROM polygons A
Double

org.apache.spark.sql.catalyst.expressions.STLength

Return the length of a given shape struct.

since v2.1

Arguments:

ST_Line

SELECT ST_Line(array(
         ST_FromText('POINT (1.0 1.0)'),
         ST_FromText('POINT (2.0 1.0)'),
         ST_FromText('POINT (3.0 2.0)')
    ))
Shape Struct

org.apache.spark.sql.catalyst.expressions.STLine

Accepts an array of point [[SHAPE_STRUCT]] and returns a polyline [[SHAPE_STRUCT]].

since v2.2

Arguments:

ST_MakeLine

SELECT ST_MakeLine(coords) FROM df
Shape struct

org.apache.spark.sql.catalyst.expressions.STMakeLine

Return a line shape struct from an array of coordinates like [x1, y1, x2, y2, x3, y3...]

since v2.3

Arguments:

ST_MakePoint

SELECT ST_MakePoint(x, y) FROM pointsDF
Shape struct

org.apache.spark.sql.catalyst.expressions.STMakePoint

Return a point shape struct from x and y values.

since v2.3

Arguments:

ST_MakePolygon

SELECT ST_MakePolygon(coords) FROM df
Shape struct

org.apache.spark.sql.catalyst.expressions.STMakePolygon

Return a polygon shape struct from an array of coordinates like [x1, y1, x2, y2, x3, y3...]

Can be used to convert UberH3 GeoBoundaries to polygons.

since v2.3

Arguments:

ST_Overlaps

SELECT ST_Overlaps(A.SHAPE, B.SHAPE, 4326) FROM polygons A, other_polygons B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STOverlaps

Return true if one geometry overlaps another geometry. Otherwise return false.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_Project

SELECT ST_Project(SHAPE, 4326, 3857) FROM (SELECT ST_FromText('POINT (1.0 1.0)') as SHAPE)
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STProject

Re-project and return the newly re-projected shape struct based on the from wkid and to wkid.

Arguments:

ST_Project2

SELECT ST_Project(SHAPE, GEOGCS[..], 3857) FROM (SELECT ST_FromText('POINT (1.0 1.0)') as SHAPE)
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STProject2

Re-project and return the newly re-projected shape struct based on the from wktext and to wkid.

Arguments:

ST_Project3

SELECT ST_Project3(SHAPE, 4326, PROJCS[..]) FROM (SELECT ST_FromText('POINT (1.0 1.0)') as SHAPE)
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STProject3

Return the newly re-projected shape struct based on the from wkid and to wktext.

Arguments:

ST_Project4

SELECT ST_Project4(SHAPE, GEOGCS[..], PROJCS[..]) FROM (SELECT ST_FromText('POINT (1.0 1.0)') as SHAPE)
Struct with wkb, xmin, ymin, xmax, ymax

org.apache.spark.sql.catalyst.expressions.STProject4 Return the newly re-projected shape struct based on the from wktext and to wktext.

Arguments:

ST_Relate

SELECT ST_Relate(A.SHAPE, B.SHAPE, 4326, "1*1***1**") FROM linestring A, other_linestring B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STRelate

Return true if the given relation (DE-9IM value) holds for the two geometries. Otherwise return false.

https://postgis.net/docs/ST_Relate.html https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_ServiceArea

SELECT inline(ST_ServiceArea(SHAPE, 35, 125.0))

SELECT inline(ST_ServiceArea(ST_FromXY(-13161875, 4035019.53758), 35, 125.0))
Array of InternalRow with SERVICE_AREA_COST (in minutes) and SHAPE Struct.

org.apache.spark.sql.catalyst.expressions.STServiceArea

Accepts [[SHAPE_STRUCT]] in the point geometry and 3857 and return service area polygons with the cost in minutes. The output is an array of (service_area_cost and polygon). The service area polygons are hexagon based.

The algorithm used is Dijkstra's Search.

Use inline function to unpack the returned array.

since v2.4

Arguments:

ST_ServiceAreaXY

SELECT inline(ST_ServiceAreaXY(-13161875, 4035019.53758, 35, 125.0))
Array of InternalRow with SERVICE_AREA_COST (in minutes) and SHAPE Struct.

org.apache.spark.sql.catalyst.expressions.STServiceAreaXY

Accepts the explicit X, Y in the web mercator (3857) and return service area polygons with the cost in minutes. The output is an array of (service_area_cost and polygon). The service area polygons are hexagon based.

The algorithm used is Breadth First Search.

Use inline function to unpack the returned array.

since v2.4

Arguments:

ST_Simplify

SELECT ST_Simplify(A.SHAPE, 4326, false) FROM polygons A
Shape struct

org.apache.spark.sql.catalyst.expressions.STSimplify

Simplify a geometry to make it valid for a Geodatabase to store without additional processing.

since v2.1

Arguments:

ST_SimplifyOGC

SELECT ST_SimplifyOGC(A.SHAPE, 4326, false) FROM polygons A
Shape struct

org.apache.spark.sql.catalyst.expressions.STSimplifyOGC

Simplify a geometry to make it valid for a Geodatabase to store without additional processing. Follows the OGC specification for the Simple Feature Access v. 1.2.1 (06-103r4).

since v2.2

Arguments:

ST_Touches

SELECT ST_Touches(A.SHAPE, B.SHAPE, 4326) FROM polygons A, point B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STTouches

Return true if one geometry touches another geometry. Otherwise return false.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

ST_WebMercator

SELECT ST_WebMercator(lat, lon) FROM LatLonDF
Array(Double, Double)

org.apache.spark.sql.catalyst.expressions.STWebMercator

Return the conversion of Latitude and Longitude values as a WebMercator (X, Y) Coordinate Pair.

since v2.2

Arguments:

ST_Within

SELECT ST_Within(A.SHAPE, B.SHAPE, 4326) FROM point A, polygons B where A.RC = B.RC
true or false

org.apache.spark.sql.catalyst.expressions.STWithin

Return true if one geometry is within another geometry. Otherwise return false.

https://desktop.arcgis.com/en/arcmap/latest/manage-data/using-sql-with-gdbs/relational-functions-for-st-geometry.htm#

Arguments:

XToLon

SELECT XToLon(x) FROM LatLonDF
Double

org.apache.spark.sql.catalyst.expressions.XToLon

Return the conversion of a Web Mercator X value to a Longitude value.

Arguments:

YToLat

SELECT YToLat(y) FROM LatLonDF
Double

org.apache.spark.sql.catalyst.expressions.YToLat

Return the conversion of a Web Mercator Y value to a Latitude value.

Arguments:

Open Source

ESRI - Big Data Toolkit Open source acknowledgement can be found below.

BDT v2

PDF

Services Offering

ESRI Professional Services