Setup for Network Analysis in AWS#

The BDT Network Analysis Functions require an additional, separate license for the LMDB. Please contact the BDT team at bdt_support@esri.com if interested. This guide describes how to enable a cluster with this data.

Prerequisites#

  • This guide assumes BDT is already installed and set up in Databricks on AWS. If this is not done see the setup guide for more information.

  • This includes that the cluster has been configured with an instance profile to allow access to AWS S3.

  • For best performance, it is recommended to use a cluster with at least 64GB of memory.

  • Please be sure to obtain the file containing the LMDB from the BDT team.

    • This file should be stored in Amazon S3 Storage.

  • Download the init script and edit the name and contents to match the version, year and quarter of the LMDB provided.

1. Setup Init Script#

  1. Open the Init Script in a text editor, and locate the line in the following snippet. Replace the bracketed items with the path to the LMDB in S3, and the version, year, and quarter associated with the LMDB provided:

aws s3 cp s3://<path>/<to>/<LMDB> /data/LMDB_<version>_<year>_<quarter> --recursive

2. Upload the Script to the Databricks workspace#

  1. In the Databricks Workspace, Click on Workspace in the left-hand menu.

  2. Click on the three dots in the upper right corner, then click Import.

  3. Drag-and-drop the init script that was just downloaded into the popup window, then click Import.

drawing

3. Add Script to Cluster#

  1. In Cluster Creation or Editing, under Advanced Options Click on the Init Scripts tab

  2. Make sure the source is set to Workspace (this is the default). Then click on the folder icon.

drawing

  1. Browse to the init script that was just uploaded, select the script, then click Add.

4. Add Spark Properties#

  1. Click on the Spark tab of advanced options.

  2. Add the following properties to the Spark Config section, after updating with the version, year and quarter of the LMDB provided:

spark.bdt.lmdb.path /data/LMDB_<version>_<year>_<quarter>
spark.bdt.lmdb.map.size 304857600000
  1. Click Confirm to save the changes.

5. Start the Cluster#

  1. Click Start to restart the cluster.

  2. Once the cluster has been started, verify that the LMDB has been added by running the following in a notebook:

from bdt.functions import st_fromText, st_drive_time

df = (spark
         .createDataFrame([("POINT (-13161875 4035019.53758)",)], ["WKT"])
         .select(st_fromText(col("WKT")).alias("SHAPE")))

out_df = (df
.select(st_drive_time("SHAPE", 30, 125.0).alias("DT"))
.selectExpr("inline(DT)"))

out_df.show()