Geospatial data can be stored and analyzed in the Exasol database using the GEOMETRY datatype. In this solution, we will show you some examples of how to work with geospatial data inside a Jupyter Notebook with the help of SQL inline magic and visualize geospatial data on map using Python libraries.
To run this demo a working Jupyter notebook is required with python version 2.7 or greater. After Python and Jupyter notebook installation, we need; ipython-sql library to run SQL from Jupyter notebook, SQL Alchemy dialect to connect to EXASOL and some additional python libraries for data visualization. GeoJSON files containing spatial data for New York City needs to be downloaded in the geojsonfiles folder.
Download the following GeoJSON files in geojsonfiles folder:
- New York City Streets data:
- New York City Borough boundaries data:
- New York City Neighborhood boundaries data:
IPython-sql libraray enables the use of Jupyter magic functions. With Jupyter magic functions, Jupyter notebooks can be used for data analysis with SQL on a database. Magic functions are pre-defined functions in Jupyter kernel that executes supplied commands. They are prefaced with %
character. Usage and installation instructions can be found here. After installation run the following command:
%load_ext sql
To connect to EXASOL install the SQLAlchemy dialect for EXASOL database. Installation instructions and project details can be found here
After installation, connect to EXASOL using the following command:
%sql exa+pyodbc://USER:PASSWORD@DSN
DSN should point to your ODBC installation. For EXASOL6.2 ODBC download and installation details visit EXASOL ODBC installation
Additional python libraries are used to process and visualize geospatial data. We make use of the following python libraries for this demo:
Extensions allow to enhance features of Jupyter Notebook. They are easy to install and configure using the Nbextensions configuration
page. We have used two extensions in our demo. Remember that the purpose of these extensions is to help visualize the results and are not required to run the visualizing_spatial_queries.ipynb demo.
Installation and configuration details for these extensions can be found here
This extension allows hiding of an individual cell. All the code segments that are not necessary for this particular demo are hidden for better visualization and usability.
Limits the output of a cell. This comes in handy as large result outputs can break the notebook. Limiting the output makes it easy to render results.
For the purpose of this demo we use NYC_UBER
and NYC_TAXI
schemas from demodb.exasol.com
.
Use the following command to open a schema
%sql open schema SCHEMA_NAME
Uber pickups data is stored in UBER_TAXI_DATA
table in NYC_UBER
schema. Use DESCRIBE
to get an overview of this table
%sql describe NYC_UBER.UBER_TAXI_DATA
New York City Taxi pickups data is stored in TRIPS
table in NYC_TAXI schema
. Use DESCRIBE
to get an overview of this table
%sql describe NYC_TAXI.TRIPS
Let's go briefly through the use cases implemented in visualizing_spatial_queries.ipynb
In the first use case, we use New York City data in NYC_UBER
schema to show Uber pickups per borough in New York City. We use Uber pickups data and NYC borough data to query the total number of Uber pickups per borough using inline SQL magic. To visualize New York City borough boundaries we use New York City Borough boundaries dataset.
In the second use case, we use New York City data in NYC_UBER
schema to show Uber pickups per neighborhood in New York City. We use Uber pickups data and NYC neighborhood data to query the total number of Uber pickups per neighborhood using inline SQL magic. To visualize New York City neighborhood boundaries we use New York City Neighborhood boundaries dataset.
In the third use case, we use NYC street data and NYC Uber pickup data to visualize top streets according to number of pickups. This data is stored in our demo database in NYC_UBER
schema. To visualize New York City neighborhood boundaries we use New York City Streets dataset. Example in this query can be parameterized to view different results on the map by providing a value for variable NumberOfStreets
In the fourth use case, we make a comparison between the number of Uber and Yellow Taxi pickups. For this example we have selected Museum of the New York City in Manhattan as a pickup point. We have used geocoding to find the latitude and longitude values of a given location. We have Uber data from April-Sept 2014. By changing the value for month
within this range we can visualize different sets of geospatial data on map. Radius defines the radius
value of given lat/long point. For speed purposes its recommended to keep radius value small.
GeoJSON files used for this demo were obtained from the following sources:
- NYC borough boundary polygons: http://data.beta.nyc/dataset/nyc-borough-boundaries
- NYC neighborhood boundary polygons: http://data.beta.nyc/dataset/nyc-neighborhood-boundaries
- NYC streets multi-line data: https://data.cityofnewyork.us/City-Government/NYC-Street-Centerline-CSCL-/exjm-f27b
Currently the following external resources are unavailable therefore you need to download required GeoJSON files.