Setup in ArcGIS Pro#
This section covers how to set up and use BDT3 in ArcGIS Pro using ArcGIS Notebooks. Running BDT3 in ArcGIS Pro allows for easy integration with arcpy and other Esri software. For more information about ArcGIS Notebooks in ArcGIS Pro, go here. To run BDT3 on Windows outside ArcGIS Pro, please see the “Setup in Windows” section.
Requirements#
ArcGIS Pro >= 2.5.
The BDT3 jar and zip files
A license for BDT3
Spark Esri wheel file
Install and Setup Spark + Hadoop#
IMPORTANT: SKIP THIS STEP IF BDT3 is Version 3.x and ArcGIS Pro is Version 3.x (ArcGIS Pro 3.x includes Apache Spark, which is compatible with BDT3)
Download and extract Spark and Hadoop from here. Be sure to verify the downloaded release per the documentation.
Create a new system variable called
SPARK_HOME
and point it to the location of the Spark and Hadoop download. Append%SPARK_HOME%\bin
toPATH
. Please see the “Setup In Windows” section for more information on how to set system variables in Windows. DO NOT place the Spark and Hadoop download in Windows program files directory. This will cause issues with Spark and Hadoop.Create a new empty folder with the path
C:\Hadoop\bin
. Visit this website and downloadwinutils.exe
for the matching version of Hadoop. Putwinutils.exe
inC:\Hadoop\bin
.Create a new system variable called
HADOOP_HOME
and point it toC:\Hadoop
. Append%HADOOP_HOME%\bin
toPATH
.
IMPORTANT: The SPARK_HOME environment variable will override the Spark installation included with ArcGIS Pro.
Please ensure the Spark version set in SPARK_HOME
is compatible with the version of BDT. If SPARK_HOME
is not set, ensure that the version of Spark included in ArcGIS is compatible with the BDT version.
Create an ArcGIS Pro Conda Environment for BDT3#
This will create a new environment for Pro’s Anaconda Python distribution designated for working with BDT and Apache Spark.
Start the Python Command Prompt from the ArcGIS folder of your Windows start menu. You will know that you are using the correct Python command prompt if the prompt’s window title points to the
proenv.exe
executable located in the ArcGIS Pro installation directory.Switch the conda environment to the default arcgis environment:
proswap arcgispro-py3
Remove any previously existing bdt3-py environment
conda remove --yes --all --name bdt3-py
Create a new conda environment called
bdt3-py
by cloningarcgispro-py3
with the following command:conda create --yes --name bdt3-py --clone arcgispro-py3
.Switch to the
bdt3-py
environment:proswap bdt3-py
.
(DO THIS STEP ONLY FOR ArcGIS Pro 2.9) 6. Define the PYSPARK_PYTHON
system environment variable and point it to the location of your bdt3 Conda environment, e.g., C:\Users\%USERNAME%\AppData\Local\ESRI\conda\envs\spark_esri2\python.exe
.
Install the Spark Esri module.#
This Python module is required to run BDT3 in Pro. It delivers functions to start and stop your spark instance and to install BDT3.
The BDT team will deliver a separate wheel file for installation. This file is only needed if running BDT3 in ArcGIS Pro. Run the below command in the bdt3 conda environment after the wheel file is moved to the current directory.
pip install spark_esri-0.7-py3-none-any.whl --no-deps
Use BDT3 in ArcGIS Pro#
Start ArcGIS Pro.
Create a new ArcGIS Notebook by clicking the “Insert” tab on the ribbon, and then clicking the “New Notebook” button.
Insert a cell into your notebook and paste the following code. Modify the
spark.jars
andspark.submit.pyFiles
to point to the BDT jar and zip respectively.
from spark_esri import spark_start, spark_stop
## spark_stop() may throw an exception if there is not an existing spark session. In that case, comment out the below line.
spark_stop()
config = {
"spark.driver.memory":"4G",
"spark.kryoserializer.buffer.max":"2024",
"spark.jars": "C:\\Users\\%USERNAME%\\bdt3\\bdt-3.3.0-3.5.1-2.12.jar",
"spark.submit.pyFiles": "C:\\Users\\%USERNAME%\\bdt-3.3.0.zip"
}
spark = spark_start(config=config)
NOTE: The file paths above must use double backslashes. Or insert an “r” before the string, e.g., r"C:\Users\%USERNAME%\bdt3\bdt-3.3.0-3.5.1-2.12.jar"
.
Run the above cell, which will start Spark inside ArcGIS Pro. This might take a minute.
The last bootstrapping step is to activate bdt3 with an active license file. Insert another cell and paste the code shown below. Modify the filepath to point to the location of your license.
import os
import bdt
from bdt import functions as F
from bdt import processors as P
bdt.auth(os.path.join("C:", os.sep, "<path>", "<to>", "bdt.lic"))
Next Steps#
Addendum#
The spark-esri python package is an open source project. Please see https://github.com/mraad/spark-esri for more information.