Setup in ArcGIS Pro#

This section covers how to set up and use BDT3 in ArcGIS Pro using ArcGIS Notebooks. Running BDT3 in ArcGIS Pro allows for easy integration with arcpy and other Esri software. For more information about ArcGIS Notebooks in ArcGIS Pro, go here. To run BDT3 on Windows outside ArcGIS Pro, please see the “Setup in Windows” section.

Requirements#

  • ArcGIS Pro >= 2.5.

  • The BDT3 jar and zip files

  • A license for BDT3

  • Spark Esri wheel file

Install and Setup Spark + Hadoop#

IMPORTANT: SKIP THIS STEP IF BDT3 is Version 3.x and ArcGIS Pro is Version 3.x (ArcGIS Pro 3.x includes Apache Spark, which is compatible with BDT3)

  1. Download and extract Spark and Hadoop from here. Be sure to verify the downloaded release per the documentation.

  2. Create a new system variable called SPARK_HOME and point it to the location of the Spark and Hadoop download. Append %SPARK_HOME%\bin to PATH. Please see the “Setup In Windows” section for more information on how to set system variables in Windows. DO NOT place the Spark and Hadoop download in Windows program files directory. This will cause issues with Spark and Hadoop.

  3. Create a new empty folder with the path C:\Hadoop\bin. Visit this website and download winutils.exe for the matching version of Hadoop. Put winutils.exe in C:\Hadoop\bin.

  4. Create a new system variable called HADOOP_HOME and point it to C:\Hadoop. Append %HADOOP_HOME%\bin to PATH.

IMPORTANT: The SPARK_HOME environment variable will override the Spark installation included with ArcGIS Pro.

Please ensure the Spark version set in SPARK_HOME is compatible with the version of BDT. If SPARK_HOME is not set, ensure that the version of Spark included in ArcGIS is compatible with the BDT version.

Create an ArcGIS Pro Conda Environment for BDT3#

This will create a new environment for Pro’s Anaconda Python distribution designated for working with BDT and Apache Spark.

  1. Start the Python Command Prompt from the ArcGIS folder of your Windows start menu. You will know that you are using the correct Python command prompt if the prompt’s window title points to the proenv.exe executable located in the ArcGIS Pro installation directory.

  2. Switch the conda environment to the default arcgis environment: proswap arcgispro-py3

  3. Remove any previously existing bdt3-py environment conda remove --yes --all --name bdt3-py

  4. Create a new conda environment called bdt3-py by cloning arcgispro-py3 with the following command: conda create --yes --name bdt3-py --clone arcgispro-py3.

  5. Switch to the bdt3-py environment: proswap bdt3-py.

(DO THIS STEP ONLY FOR ArcGIS Pro 2.9) 6. Define the PYSPARK_PYTHON system environment variable and point it to the location of your bdt3 Conda environment, e.g., C:\Users\%USERNAME%\AppData\Local\ESRI\conda\envs\spark_esri2\python.exe.

Install the Spark Esri module.#

This Python module is required to run BDT3 in Pro. It delivers functions to start and stop your spark instance and to install BDT3.

  1. The BDT team will deliver a separate wheel file for installation. This file is only needed if running BDT3 in ArcGIS Pro. Run the below command in the bdt3 conda environment after the wheel file is moved to the current directory.

pip install spark_esri-0.7-py3-none-any.whl --no-deps

Use BDT3 in ArcGIS Pro#

  1. Start ArcGIS Pro.

  2. Create a new ArcGIS Notebook by clicking the “Insert” tab on the ribbon, and then clicking the “New Notebook” button.

  3. Insert a cell into your notebook and paste the following code. Modify the spark.jars and spark.submit.pyFiles to point to the BDT jar and zip respectively.

from spark_esri import spark_start, spark_stop

## spark_stop() may throw an exception if there is not an existing spark session. In that case, comment out the below line.
spark_stop()

config = {
    "spark.driver.memory":"4G",
    "spark.kryoserializer.buffer.max":"2024",
    "spark.jars": "C:\\Users\\%USERNAME%\\bdt3\\bdt-3.3.0-3.5.1-2.12.jar",
    "spark.submit.pyFiles": "C:\\Users\\%USERNAME%\\bdt-3.3.0.zip"
}

spark = spark_start(config=config)

NOTE: The file paths above must use double backslashes. Or insert an “r” before the string, e.g., r"C:\Users\%USERNAME%\bdt3\bdt-3.3.0-3.5.1-2.12.jar".

  1. Run the above cell, which will start Spark inside ArcGIS Pro. This might take a minute.

  2. The last bootstrapping step is to activate bdt3 with an active license file. Insert another cell and paste the code shown below. Modify the filepath to point to the location of your license.

import os
import bdt
from bdt import functions as F
from bdt import processors as P
bdt.auth(os.path.join("C:", os.sep, "<path>", "<to>", "bdt.lic"))

Next Steps#

Start Using BDT with Jupyter Notebooks

Visualize Data in ArcGIS Pro with BDT

Addendum#

The spark-esri python package is an open source project. Please see https://github.com/mraad/spark-esri for more information.