Static maps#

Static maps in geopandas#

We have already plotted basic static maps using geopandas during the previous weeks of this course. Remember that when using the plot() method in geopandas, we are actually using the tools available from matplotlib pyplot.

Let’s refresh our memory about the basics of plotting maps by creating a static accessibility map with roads and metro line on it (three layers on top of each other in the same figure). Before plotting the layers, we need to check that they are in the same coordinate reference system.

As usual, we start by importing the useful modules and reading in the input files:

import geopandas as gpd
from pyproj import CRS
import matplotlib.pyplot as plt
from pathlib import Path
import contextily as ctx

# Filepaths
data_dir = Path("../data")
grid_fp = data_dir / "TravelTimes_to_5975375_RailwayStation.shp"
roads_fp = data_dir / "roads.shp"
metro_fp = data_dir / "metro.shp"

# Read files
grid = gpd.read_file(grid_fp)
roads = gpd.read_file(roads_fp)
metro = gpd.read_file(metro_fp)

Let’s check the coordinate reference systems (crs) of the input files.

# Check the crs of each layer
print(roads.crs)
print(metro.crs)
print(grid.crs)
epsg:2392
epsg:2392
epsg:3067

Roads and the metro are in an old Finnish crs (EPSG:2392), while the grid is in ETRS89 / TM35FIN (EPSG:3067):

# Check CRS names
print("Roads crs:", CRS(roads.crs).name)
print("Metro crs:", CRS(metro.crs).name)
print("Grid crs: ", CRS(grid.crs).name)
Roads crs: KKJ / Finland zone 2
Metro crs: KKJ / Finland zone 2
Grid crs:  ETRS89 / TM35FIN(E,N)

Let’s re-project geometries to ETRS89 / TM35FIN based on the grid crs:

# Reproject geometries to ETRS89 / TM35FIN based on the grid crs:
roads = roads.to_crs(crs=grid.crs)
metro = metro.to_crs(crs=grid.crs)

Now the layers should be in the same crs

roads.crs == metro.crs == grid.crs
True

Once the data are in the same projection, we can plot them on a map.

Check your understanding

  • Make a visualization using the plot() -function in Geopandas

  • plot first the grid using “quantiles” classification scheme

  • then add roads and metro in the same plot

Plotting options for the polygon:

  • Define the classification scheme using the scheme parameter

  • Change the colormap using the cmap parameter. See colormap options from matplotlib documentation.

  • You can add a little bit of transparency for the grid using the alpha parameter (ranges from 0 to 1 where 0 is fully transparent)

Plotting options fo the lines:

For better control of the figure and axes, use the plt.subplots function before plotting the layers. See more info in matplotlib documentation.

# Create one subplot. Control figure size in here.
fig, ax = plt.subplots(figsize=(12, 8))

# Visualize the travel times into 9 classes using "Quantiles" classification scheme
grid.plot(
    ax=ax,
    column="car_r_t",
    linewidth=0.03,
    cmap="Spectral",
    scheme="quantiles",
    k=9,
    alpha=0.9,
)

# Add roads on top of the grid
# (use ax parameter to define the map on top of which the second items are plotted)
roads.plot(ax=ax, color="grey", linewidth=1.5)

# Add metro on top of the previous map
metro.plot(ax=ax, color="red", linewidth=2.5)

# Remove the empty white-space around the axes
plt.tight_layout()

# Save the figure as png file with resolution of 300 dpi
outfp = "static_map.png"
plt.savefig(outfp, dpi=300)
../../../_images/ccb111d7dab6d742662e4056fbcadad7d97da9b179760a4a423c98010fc2b6b8.png

Adding a legend#

It is possible to enable legend for a geopandas plot by setting legend=True in the plotting parameters. If plotting the figure without a classification scheme you get a color bar as the legend item and it is easy to add a label for the legend using legend_kwds. You can read more about creating a legend via geopandas in here.

# Create one subplot. Control figure size in here.
fig, ax = plt.subplots(figsize=(6, 4))

# Visualize the travel times into 9 classes using "Quantiles" classification scheme
grid.plot(
    ax=ax,
    column="car_r_t",
    linewidth=0.03,
    cmap="Spectral",
    alpha=0.9,
    legend=True,
    legend_kwds={"label": "Travel time (min)"},
)

# ax.get_legend().set_bbox_to_anchor(8)
# ax.get_legend().set_title("Legend title")

# Remove the empty white-space around the axes
plt.tight_layout()
../../../_images/991139eb6a556f428d04d5e6383ee9aa6f480dcfbe0e4fd0eb147365d0a27729.png

If plotting a map using a classification scheme, we get a different kind of ledend that shows the class values. In this case, we can control the position and title of the legend using matplotlib tools. We first need to access the Legend object and then change it’s properties.

# Create one subplot. Control figure size in here.
fig, ax = plt.subplots(figsize=(10, 5))

# Visualize the travel times into 9 classes using "Quantiles" classification scheme
grid.plot(
    ax=ax,
    column="car_r_t",
    linewidth=0.03,
    cmap="Spectral",
    scheme="quantiles",
    k=9,
    legend=True,
)

# Re-position the legend and set a title
ax.get_legend().set_bbox_to_anchor((1.3, 1))
ax.get_legend().set_title("Travel time (min)")

# Remove the empty white-space around the axes
plt.tight_layout()
../../../_images/3fe19623e1f685619c03506d532ff14fcf8847295f44fb4268b0adad71c6a957.png

You can read more info about adjusting legends in the matplotlig legend guide.

Adding basemap from external source#

It is often useful to add a basemap to your visualization that shows e.g. streets, placenames and other contextual information. This can be done easily by using ready-made background map tiles from different providers such as OpenStreetMap or Stamen Design. A Python library called contextily is a handy package that can be used to fetch geospatial raster files and add them to your maps. Map tiles are typically distributed in Web Mercator projection (EPSG:3857), hence it is often necessary to reproject all the spatial data into Web Mercator before visualizing the data.

In this tutorial, we will see how to add a basemap underneath our previous visualization. Check the input crs:

print(grid.crs)
epsg:3067

Reproject the layer to ESPG 3857 projection (Web Mercator):

# Reproject to EPSG 3857
data = grid.to_crs(epsg=3857)
print(data.crs)
epsg:3857

Now the crs is epsg:3857. Also the coordinate values in the geometry column have changed:

data.head(2)
car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt pt_r_d pt_r_t pt_r_tt to_id walk_d walk_t geometry
0 32297 43 32260 48 5785640 32616 116 147 32616 108 139 5975375 32164 459 POLYGON ((2767221.646 8489079.101, 2766716.966...
1 32508 43 32471 49 5785641 32822 119 145 32822 111 133 5975375 29547 422 POLYGON ((2767726.329 8489095.521, 2767221.646...

Next, we can plot our data using geopandas and add a basemap for our plot by using a function called add_basemap() from contextily:

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the data
data.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=9,
    alpha=0.6,
)

# Add basemap
ctx.add_basemap(ax)
../../../_images/0b8c33d8994d26d65549593bddbab1a6e96b31c2ecaabc5627aa2638f006daee.png

As we can see, now the map has a background map that is by default using the Stamen Terrain background from Stamen Design.

There are also various other possible data sources and styles for background maps.

Contextily’s tile_providers contain a list of providers and styles that can be used to control the appearence of your background map:

dir(ctx.providers)
['AzureMaps',
 'BasemapAT',
 'CartoDB',
 'CyclOSM',
 'Esri',
 'FreeMapSK',
 'Gaode',
 'GeoportailFrance',
 'HERE',
 'HEREv3',
 'HikeBike',
 'Hydda',
 'Jawg',
 'JusticeMap',
 'MapBox',
 'MapTiler',
 'MtbMap',
 'NASAGIBS',
 'NLS',
 'OPNVKarte',
 'OneMapSG',
 'OpenAIP',
 'OpenFireMap',
 'OpenRailwayMap',
 'OpenSeaMap',
 'OpenSnowMap',
 'OpenStreetMap',
 'OpenTopoMap',
 'OpenWeatherMap',
 'SafeCast',
 'Stadia',
 'Stamen',
 'Strava',
 'Thunderforest',
 'TomTom',
 'USGS',
 'WaymarkedTrails',
 'nlmaps']

There are multiple style options for most of these providers, for example:

ctx.providers.OpenStreetMap.keys()
dict_keys(['Mapnik', 'DE', 'CH', 'France', 'HOT', 'BZH', 'BlackAndWhite'])

It is possible to change the tile provider using the source -parameter in add_basemap() function. Let’s see how we can change the bacground map as the basic OpenStreetMap background:

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the data
data.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=9,
    alpha=0.6,
)

# Add basemap with basic OpenStreetMap visualization
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)
../../../_images/b88d3800600d97fff361a8a2870b5a60c6db2e5ee34162d9808c564fd66e78c4.png

Let’s take a subset of our data to see a bit better the background map characteristics:

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# Subset the data to seel only grid squares near the destination
subset = data.loc[(data["pt_r_t"] >= 0) & (data["pt_r_t"] <= 15)]

# Plot the data from subset
subset.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=5,
    alpha=0.6,
)

# Add basemap with `OSM_A` style
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)
../../../_images/806869c28222fca6a6f6bcb3c9e26c6f323b9fe1397539956c58fa0f5a46a0d8.png

As we can see now our map has much more details in it as the zoom level of the background map is larger. By default contextily sets the zoom level automatically but it is possible to also control that manually using parameter zoom. The zoom level is by default specified as auto but you can control that by passing in zoom level as numbers ranging typically from 1 to 19 (the larger the number, the more details your basemap will have).

  • Let’s reduce the level of detail from our map by passing zoom=11:

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the data from subset
subset.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=5,
    alpha=0.6,
)

# Add basemap with `OSM_A` style using zoom level of 11
ctx.add_basemap(ax, zoom=11, source=ctx.providers.OpenStreetMap.Mapnik)
../../../_images/752ddd47524db5c7b1e7c2c1958cd920a8842ee30d10497f1def81c8cae97db4.png

As we can see, the map has now less detail (a bit too blurry for such a small area).

We can also use ax.set_xlim() and ax.set_ylim() -parameters to crop our map without altering the data. The parameters takes as input the coordinates for minimum and maximum on both axis (x and y). We can also change / remove the contribution text by using parameter attribution

Let’s add details about the data source, plot the original data, and crop the map:

credits = "Travel time data by Tenkanen & Toivonen (2020), Map Data © OpenStreetMap contributors"
# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the data
data.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=9,
    alpha=0.6,
)

# Add basemap with `OSM_A` style using zoom level of 11
# Modify the attribution
ctx.add_basemap(
    ax, zoom=11, attribution=credits, source=ctx.providers.OpenStreetMap.Mapnik
)

# Crop the figure
ax.set_xlim(2760000, 2800000)
ax.set_ylim(8430000, 8470000)
(8430000.0, 8470000.0)
../../../_images/3f316a7fc75f5775169c06b7026eac15e4b8bf2e7f676c18ac437b07b1dc4a25.png

It is also possible to use many other map tiles from different Tile Map Services as the background map. A good list of different available sources can be found from here. When using map tiles from different sources, it is necessary to parse a url address to the tile provider following a format defined by the provider.

Next, we will see how to use map tiles provided by CartoDB. To do that we need to parse the url address following their definition 'https://{s}.basemaps.cartocdn.com/{style}/{z}/{x}/{y}{scale}.png' where:

  • {s}: one of the available subdomains, either [a,b,c,d]

  • {z} : Zoom level. We support from 0 to 20 zoom levels in OSM tiling system.

  • {x},{y}: Tile coordinates in OSM tiling system

  • {scale}: OPTIONAL “@2x” for double resolution tiles

  • {style}: Map style, possible value is one of:

    • light_all,

    • dark_all,

    • light_nolabels,

    • light_only_labels,

    • dark_nolabels,

    • dark_only_labels,

    • rastertiles/voyager,

    • rastertiles/voyager_nolabels,

    • rastertiles/voyager_only_labels,

    • rastertiles/voyager_labels_under

  • We will use this information to parse the parameters in a way that contextily wants them:

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# The formatting should follow: 'https://{s}.basemaps.cartocdn.com/{style}/{z}/{x}/{y}{scale}.png'
# Specify the style to use
style = "rastertiles/voyager"
cartodb_url = "https://a.basemaps.cartocdn.com/%s/{z}/{x}/{y}.png" % style

# Plot the data from subset
subset.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=5,
    alpha=0.6,
)

# Add basemap with `OSM_A` style using zoom level of 14
ctx.add_basemap(ax, zoom=14, attribution="", source=cartodb_url)

# Crop the figure
ax.set_xlim(2770000, 2785000)
ax.set_ylim(8435000, 8442500)
(8435000.0, 8442500.0)
../../../_images/68ea8fece11667a1ea04e4f48d1653e70c84a140f3f045662372e72c853f7b25.png

As we can see now we have yet again different kind of background map, now coming from CartoDB. Let’s make a minor modification and change the style from "rastertiles/voyager" to "dark_all":

# Control figure size in here
fig, ax = plt.subplots(figsize=(12, 8))

# The formatting should follow: 'https://{s}.basemaps.cartocdn.com/{style}/{z}/{x}/{y}{r}.png'
# Specify the style to use
style = "dark_all"
cartodb_url = "https://a.basemaps.cartocdn.com/%s/{z}/{x}/{y}.png" % style

# Plot the data from subset
subset.plot(
    ax=ax,
    column="pt_r_t",
    cmap="RdYlBu",
    linewidth=0,
    scheme="quantiles",
    k=5,
    alpha=0.6,
)

# Add basemap with `OSM_A` style using zoom level of 14
ctx.add_basemap(ax, zoom=13, attribution="", source=cartodb_url)

# Crop the figure
ax.set_xlim(2770000, 2785000)
ax.set_ylim(8435000, 8442500)
(8435000.0, 8442500.0)
../../../_images/f66cbd5aefd739edd515126c38f35aec334a3f74d1596060b18a37cd4aaf627e.png

Great! Now we have dark background map fetched from CartoDB. In a similar manner, you can use any map tiles from various other tile providers such as the ones listed in leaflet-providers.

Data reclassification#

Reclassifying data based on specific criteria is a common task when doing GIS analysis. The purpose of this lesson is to see how we can reclassify values based on some criteria. We could, for example, classify information based on travel times and housing prices using these criteria:

1. if travel time to my work is less than 30 minutes

    AND

    2. the rent of the apartment is less than 1000 € per month

    ------------------------------------------------------

    IF TRUE: ==> I go to view it and try to rent the apartment
    IF NOT TRUE: ==> I continue looking for something else

In this tutorial, we will:

  1. Use classification schemes from the PySAL mapclassify library to classify travel times into multiple classes.

  2. Create a custom classifier to classify travel times and distances in order to find out good locations to buy an apartment with these conditions:

    • good public transport accessibility to city center

    • bit further away from city center where the prices are presumably lower

Common classifiers#

Classification schemes for thematic maps#

PySAL -module is an extensive Python library for spatial analysis. It also includes all of the most common data classifiers that are used commonly e.g. when visualizing data. Available map classifiers in pysal’s mapclassify -module:

  • Box_Plot

  • Equal_Interval

  • Fisher_Jenks

  • Fisher_Jenks_Sampled

  • HeadTail_Breaks

  • Jenks_Caspall

  • Jenks_Caspall_Forced

  • Jenks_Caspall_Sampled

  • Max_P_Classifier

  • Maximum_Breaks

  • Natural_Breaks

  • Quantiles

  • Percentiles

  • Std_Mean

  • User_Defined

  • First, we need to read our Travel Time data from Helsinki:

import geopandas as gpd


# Read the GeoJSON file similarly as Shapefile
acc = data.copy()

# Let's see what we have
acc.head()
car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt pt_r_d pt_r_t pt_r_tt to_id walk_d walk_t geometry
0 32297 43 32260 48 5785640 32616 116 147 32616 108 139 5975375 32164 459 POLYGON ((2767221.646 8489079.101, 2766716.966...
1 32508 43 32471 49 5785641 32822 119 145 32822 111 133 5975375 29547 422 POLYGON ((2767726.329 8489095.521, 2767221.646...
2 30133 50 31872 56 5785642 32940 121 146 32940 113 133 5975375 29626 423 POLYGON ((2768231.014 8489111.905, 2767726.329...
3 32690 54 34429 60 5785643 33233 125 150 33233 117 144 5975375 29919 427 POLYGON ((2768735.702 8489128.255, 2768231.014...
4 31872 42 31834 48 5787544 32127 109 126 32127 101 121 5975375 31674 452 POLYGON ((2765724.126 8488524.148, 2765219.488...

As we can see, there are plenty of different variables (see from here the description for all attributes) but what we are interested in are columns called pt_r_tt which is telling the time in minutes that it takes to reach city center from different parts of the city, and walk_d that tells the network distance by roads to reach city center from different parts of the city (almost equal to Euclidian distance).

The NoData values are presented with value -1.

  • Thus we need to remove the No Data values first.

# Include only data that is above or equal to 0
acc = acc.loc[acc["pt_r_tt"] >= 0]
  • Let’s plot the data and see how it looks like

  • cmap parameter defines the color map. Read more about choosing colormaps in matplotlib

  • scheme option scales the colors according to a classification scheme (requires mapclassify module to be installed):

import matplotlib.pyplot as plt

# Plot using 9 classes and classify the values using "Natural Breaks" classification
acc.plot(
    column="pt_r_tt",
    scheme="Natural_Breaks",
    k=9,
    cmap="RdYlBu",
    linewidth=0,
    legend=True,
)

# Use tight layout
plt.tight_layout()
../../../_images/3a1f57a6df87e6820cedecdd874fd65a8d7dd253d63888097490fc36cfb4514d.png

As we can see from this map, the travel times are lower in the south where the city center is located but there are some areas of “good” accessibility also in some other areas (where the color is red).

  • Let’s also make a plot about walking distances:

# Plot walking distance
acc.plot(
    column="walk_d",
    scheme="Natural_Breaks",
    k=9,
    cmap="RdYlBu",
    linewidth=0,
    legend=True,
)

# Use tight layour
plt.tight_layout()
../../../_images/835edb0a23e2f1f3af1be9fc7bd0b27d85a9520528656fb5866d37862d3c6a2d.png

Okay, from here we can see that the walking distances (along road network) reminds more or less Euclidian distances.

Applying classifiers to data#

As mentioned, the scheme option defines the classification scheme using pysal/mapclassify. Let’s have a closer look at how these classifiers work.

import mapclassify
  • Natural Breaks

mapclassify.NaturalBreaks(y=acc["pt_r_tt"], k=9)
NaturalBreaks           

    Interval       Count
------------------------
[  0.00,  30.00] |   837
( 30.00,  43.00] |  2361
( 43.00,  54.00] |  2453
( 54.00,  65.00] |  2088
( 65.00,  78.00] |  1761
( 78.00,  92.00] |  1694
( 92.00, 108.00] |  1087
(108.00, 129.00] |   557
(129.00, 199.00] |   182
  • Quantiles (default is 5 classes):

mapclassify.Quantiles(y=acc["pt_r_tt"])
Quantiles               

    Interval       Count
------------------------
[  0.00,  41.00] |  2674
( 41.00,  52.00] |  2576
( 52.00,  66.00] |  2623
( 66.00,  85.00] |  2616
( 85.00, 199.00] |  2531
  • It’s possible to extract the threshold values into an array:

classifier = mapclassify.NaturalBreaks(y=acc["pt_r_tt"], k=9)
classifier.bins
array([ 28.,  41.,  53.,  65.,  77.,  90., 105., 127., 199.])
  • Let’s apply one of the Pysal classifiers into our data and classify the travel times by public transport into 9 classes

  • The classifier needs to be initialized first with make() function that takes the number of desired classes as input parameter

# Create a Natural Breaks classifier
classifier = mapclassify.NaturalBreaks.make(k=9)
  • Now we can apply that classifier into our data by using apply -function

# Classify the data
classifications = acc[["pt_r_tt"]].apply(classifier)

# Let's see what we have
classifications.head()
pt_r_tt
0 8
1 8
2 8
3 8
4 7
type(classifications)
pandas.core.frame.DataFrame

Okay, so now we have a DataFrame where our input column was classified into 9 different classes (numbers 1-9) based on Natural Breaks classification.

  • We can also add the classification values directly into a new column in our dataframe:

# Rename the column so that we know that it was classified with natural breaks
acc["nb_pt_r_tt"] = acc[["pt_r_tt"]].apply(classifier)

# Check the original values and classification
acc[["pt_r_tt", "nb_pt_r_tt"]].head()
pt_r_tt nb_pt_r_tt
0 139 8
1 133 8
2 133 8
3 144 8
4 121 7

Great, now we have those values in our accessibility GeoDataFrame. Let’s visualize the results and see how they look.

# Plot
acc.plot(column="nb_pt_r_tt", linewidth=0, legend=True)

# Use tight layout
plt.tight_layout()
../../../_images/74c3822dee6ec77bcf680274f57a496fa1df12a041cedc6b6135815629d2eb1f.png

And here we go, now we have a map where we have used one of the common classifiers to classify our data into 9 classes.

Plotting a histogram#

A histogram is a graphic representation of the distribution of the data. When classifying the data, it’s always good to consider how the data is distributed, and how the classification shceme divides values into different ranges.

  • plot the histogram using pandas.DataFrame.plot.hist

  • Number of histogram bins (groups of data) can be controlled using the parameter bins:

# Histogram for public transport rush hour travel time
acc["pt_r_tt"].plot.hist(bins=50)
<AxesSubplot:ylabel='Frequency'>
../../../_images/bbdc122f37ec008e3ec17147e1777664e89737572d584f041cd30be8cb403625.png

Let’s also add threshold values on thop of the histogram as vertical lines.

  • Natural Breaks:

# Define classifier
classifier = mapclassify.NaturalBreaks(y=acc["pt_r_tt"], k=9)

# Plot histogram for public transport rush hour travel time
acc["pt_r_tt"].plot.hist(bins=50)

# Add vertical lines for class breaks
for value in classifier.bins:
    plt.axvline(value, color="k", linestyle="dashed", linewidth=1)
../../../_images/0bb8ab1df26cb611d59fb9fe0a901322cc9bbb43a831f5d9bf5e80cb9890eb24.png
  • Quantiles:

# Define classifier
classifier = mapclassify.Quantiles(y=acc["pt_r_tt"])

# Plot histogram for public transport rush hour travel time
acc["pt_r_tt"].plot.hist(bins=50)

for value in classifier.bins:
    plt.axvline(value, color="k", linestyle="dashed", linewidth=1)
../../../_images/106037962cca30d333f228f9b1ae83d7b9c5f1abbe356de9399eea007871bf90.png

Check your understanding

Select another column from the data (for example, travel times by car: car_r_t). Do the following visualizations using one of the classification schemes available from pysal/mapclassify:

  • histogram with vertical lines showing the classification bins

  • thematic map using the classification scheme

Creating a custom classifier#

Multicriteria data classification

Let’s create a function where we classify the geometries into two classes based on a given threshold -parameter. If the area of a polygon is lower than the threshold value (average size of the lake), the output column will get a value 0, if it is larger, it will get a value 1. This kind of classification is often called a binary classification.

First we need to create a function for our classification task. This function takes a single row of the GeoDataFrame as input, plus few other parameters that we can use.

It also possible to do classifiers with multiple criteria easily in Pandas/Geopandas by extending the example that we started earlier. Now we will modify our binaryClassifier function a bit so that it classifies the data based on two columns.

  • Let’s call it custom_classifier that does the binary classification based on two treshold values:

def custom_classifier(row, src_col1, src_col2, threshold1, threshold2, output_col):
    """Custom classirifer that can be applied on each row of a pandas dataframe (axis=1).

    This function classifies data based on values in two source columns and stores the output value in the output column.
    Output values is 1 if the value in src_col1 is LOWER than the threshold1 value AND the value in src_col2 is HIGHER than the threshold2 value.
    In all other cases, output value is 0.

    Args:
        row: one row of data
        src_col1: source column name associated with threshold1
        src_col2: source column name associated with threshold2
        threshold1: upper threshold value for src_col1
        threshold2: lower threshold value for src_col2
        output_col: output column name

    Returns:
        updated row of data.
    """

    # If condition is true, assign 1 into output column
    if row[src_col1] < threshold1 and row[src_col2] > threshold2:
        row[output_col] = 1

    # Else, assign 1 into output column
    else:
        row[output_col] = 0

    # Return the updated row
    return row

Now we have defined the function, and we can start using it.

  • Let’s do our classification based on two criteria and find out grid cells where the travel time is lower or equal to 20 minutes but they are further away than 4 km (4000 meters) from city center.

  • Let’s create an empty column for our classification results called "suitable_area".

# Create column for the classification results
acc["suitable_area"] = None

# Use the function
acc = acc.apply(
    custom_classifier,
    src_col1="pt_r_tt",
    src_col2="walk_d",
    threshold1=20,
    threshold2=4000,
    output_col="suitable_area",
    axis=1,
)

# See the first rows
acc.head()
car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt pt_r_d pt_r_t pt_r_tt to_id walk_d walk_t geometry nb_pt_r_tt suitable_area
0 32297 43 32260 48 5785640 32616 116 147 32616 108 139 5975375 32164 459 POLYGON ((2767221.646 8489079.101, 2766716.966... 8 0
1 32508 43 32471 49 5785641 32822 119 145 32822 111 133 5975375 29547 422 POLYGON ((2767726.329 8489095.521, 2767221.646... 8 0
2 30133 50 31872 56 5785642 32940 121 146 32940 113 133 5975375 29626 423 POLYGON ((2768231.014 8489111.905, 2767726.329... 8 0
3 32690 54 34429 60 5785643 33233 125 150 33233 117 144 5975375 29919 427 POLYGON ((2768735.702 8489128.255, 2768231.014... 8 0
4 31872 42 31834 48 5787544 32127 109 126 32127 101 121 5975375 31674 452 POLYGON ((2765724.126 8488524.148, 2765219.488... 7 0

Okey we have new values in suitable_area -column.

  • How many Polygons are suitable for us? Let’s find out by using a Pandas function called value_counts() that return the count of different values in our column.

# Get value counts
acc["suitable_area"].value_counts()
0    13011
1        9
Name: suitable_area, dtype: int64

Okay, so there seems to be nine suitable locations for us where we can try to find an appartment to buy.

  • Let’s see where they are located:

# Plot
acc.plot(column="suitable_area", linewidth=0)

# Use tight layour
plt.tight_layout()
../../../_images/ff88bd7ac18e5ebf09b88ec19e83b5bff6a56b7438bdf4a077b32a1235512963.png

A-haa, okay so we can see that suitable places for us with our criteria seem to be located in the eastern part from the city center. Actually, those locations are along the metro line which makes them good locations in terms of travel time to city center since metro is really fast travel mode.

Other examples

Older course materials contain an example of applying a custom binary classifier on the Corine land cover data.