Nearest neighbour analysis#
The idea of neighbourhood is one of the fundamental concepts in geographic data analysis and modelling. Being able to understand how close geographic objects are to each other, or which features are neighboring a specific location is fundamental to various spatial analysis techniques, such as spatial interpolation (which we cover in Chapter 10) or understanding whether there is spatial autocorrelation (i.e. clustering) in the data (see Chapters 6 and 7 in Rey et al., 2023). Many of these techniques rely on the idea that proximity in geographic space typically indicates also similarity in attribute space. For example, it is quite typical that a neighborhood with high population density is next to another neighborhood that also has high concentration of residents (i.e. population density tends to cluster). One of the most famous notions related to this is the First law of geography which states that “everything is related to everything, but near things are more related than distant things” (Tobler, 1970). Thus, being able to understand how close neighboring geographic features are, or which objects are the closest ones to specific location is an important task in GIS.
Figure 6.45 illustrates two common ways to find nearest neighbors to specific locations. In these examples, we have two Point datasets visualized with blue circles and red rectangles that are used for doing the nearest neighbor analysis. In the first example (top row), the idea is to find the closest geometry (rectangles) for all the points in the area. Here, the nearest neighbor is determined based on distance between the points and rectangles, and the nearest neighbors are visualized with a line from every point to the closest rectangle (on the right). The bottom row shows an example in which we aim to find the closest point for each rectangle, but in this case we also apply a maximum search distance that limits the search area. Only those points that are within the search area are considered when finding the nearest neighbor, while the points outside of this area are simply ignored. As a result, the point closest to a given rectangle is visualized with a connected line (on the right). In these examples, the geographic objects are simple point like features, but similar approach can be used with any geographic features, for example by finding closest LineString or Polygon geometry to a given Point, or by finding the closest Polygon to another Polygon. In these cases, the calculations are a bit more complicated, but the basic idea is the same.
Figure 6.45. The basic idea of finding a nearest neighbour based on geographic distance.
Quite often with very large datasets, we might want to limit the search area up to a specific maximum distance. This can be due to practical reasons as it can significantly speed up the computation time, or because we have specific reasoning that makes it sensible to limit the search area. For example, if we would aim to understand how easily accessible public transportation is to citizens living in a city, it would make sense to limit the search area e.g. up to 2 km from the homes of people, because people are not willing to walk for very long distances to reach a bus stop. It’s important to notice that the distances in the calculations are commonly based on the Euclidean distance, i.e. we calculate the distances based on coordinates on a Cartesian plain, meaning that the distances do not consider changes in height (i.e. third dimension is omitted). It is of course possible also to consider 3D distances, but the most typical Python tools ignore the height information.
Nearest neighbour analysis in Python#
In Python, there are various libraries that can be used to find nearest neighbors for given set of geometries, including geopandas
, shapely
, scipy
, scikit-learn
, and pysal
among others. Here, we first introduce how geopandas
can be used to find the nearest neighbors for all Point geometries in a given GeoDataFrame based on Points in another GeoDataFrame. Then we show how to find nearest neighbor between two Polygon datasets, and finally we show how to use scipy
library to find K-Nearest Neighbors (KNN) with Point data.
In the following, we go through a very practical example that relates to our daily commute: Where is the closest public transport stop from my place of residence? Hence, our aim is to search for each building point in the Helsinki Region the closest public transport stop. In geopandas, we can find nearest neighbors for all geometries in a given GeoDataFrame using the .sjoin_nearest()
method. To test it out, let’s start by reading two datasets representing buildings and stops into GeoDataFrames, and visualize them to understand a bit better what we have:
import geopandas as gpd
import matplotlib.pyplot as plt
stops = gpd.read_file("data/Helsinki/pt_stops_helsinki.gpkg")
building_points = gpd.read_file("data/Helsinki/building_points_helsinki.zip")
print("Number of stops:", len(stops))
stops.head(2)
Number of stops: 8377
stop_name | stop_lat | stop_lon | stop_id | geometry | |
---|---|---|---|---|---|
0 | Ritarihuone | 60.16946 | 24.95667 | 1010102 | POINT (24.95667 60.16946) |
1 | Kirkkokatu | 60.17127 | 24.95657 | 1010103 | POINT (24.95657 60.17127) |
print("Number of buildings:", len(building_points))
building_points.head(2)
Number of buildings: 158731
name | geometry | |
---|---|---|
0 | None | POINT (24.85584 60.20727) |
1 | Uimastadion | POINT (24.93045 60.18882) |
As we can see, both GeoDataFrames contain Point geometries. There seems to be approximately 8400 stops and almost 159 thousand buildings in our data. Hence, we have already a fair amount of data and calculations to do, to find the nearest neighbor for each building. Let’s still visualize the GeoDataFrames next to each other so that we can see them on a map:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(15, 10))
# Plot buildings
building_points.plot(ax=ax1, markersize=0.2, alpha=0.5)
ax1.set_title("Buildings")
# Plot stops
stops.plot(ax=ax2, markersize=0.2, alpha=0.5, color="red")
ax2.set_title("Stops");
Figure 6.46. Maps representing the buildings and public transport stops which we use to find the closest stop for each building.
As mentioned earlier, finding the nearest geometries between two GeoDataFrames (here building and stop points) can be done easily using the .sjoin_nearest()
method in geopandas. As the name implies, this method is actually designed to merge data between GeoDataFrames in a similar manner as with regular .sjoin()
method. However, in this case the method is actually searching for the closest geometries instead of relying on spatial predicates, such as within. The sjoin_nearest()
can be used for different geometry types, so the input geometries do not necessarily need to be Point objects as in our example. Under the hood, the method uses a spatial index called STRTree
(Leutenegger et al., 1997) which is an efficient implementation of the R-tree dynamic index structure for spatial searching (Guttman, 1984). The STRTree is implemented in the shapely
library (used by geopandas
) and the technique makes the nearest neighbor queries very efficient. You can read more about spatial indices in Appendices section of the book. For the method to work properly, it is recommended to ensure that the both GeoDataFrames are having the same coordinate reference system (CRS), and preferably having a projected (metric) CRS because that ensures that the reported distances are meaningful (in meters) and correct. Hence, let’s start by reprojecting our latitude and longitude values into a metric system using the national EUREF-FIN coordinate reference system (EPSG code 3067) for Finland:
stops = stops.to_crs(epsg=3067)
building_points = building_points.to_crs(epsg=3067)
stops.head(2)
stop_name | stop_lat | stop_lon | stop_id | geometry | |
---|---|---|---|---|---|
0 | Ritarihuone | 60.16946 | 24.95667 | 1010102 | POINT (386623.301 6672037.884) |
1 | Kirkkokatu | 60.17127 | 24.95657 | 1010103 | POINT (386623.991 6672239.572) |
Now the GeoDataFrames are surely in the same coordinate system and we can see that the coordinates in the geometry
column have changed representing meters. Next, we will use the buildings.sjoin_nearest()
to find the closest stop for each building. Because we are interested to find the closest stop geometry for each building, the buildings
GeoDataFrame is the left hand side of the command. As inputs, we pass the stops
GeoDataFrame as well as give a name for a column which is used to store information about the distance between a given building and the closest stop (this is optional):
%time
closest = building_points.sjoin_nearest(stops, distance_col="distance")
closest
CPU times: user 3 μs, sys: 5 μs, total: 8 μs
Wall time: 19.1 μs
name | geometry | index_right | stop_name | stop_lat | stop_lon | stop_id | distance | |
---|---|---|---|---|---|---|---|---|
0 | None | POINT (381166.6 6676424.438) | 1131 | Muusantori | 60.20749 | 24.857450 | 1304138 | 92.679893 |
1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | Auroran sairaala | 60.19145 | 24.925540 | 1171122 | 400.243370 |
2 | None | POINT (386317.478 6672100.648) | 61 | Senaatintori | 60.16901 | 24.950460 | 1020450 | 109.819633 |
3 | Hartwall Arena | POINT (385225.109 6676120.56) | 532 | Veturitie | 60.20661 | 24.929680 | 1174112 | 104.632434 |
4 | Talli | POINT (385079.733 6676989.745) | 496 | Posti 1 | 60.21345 | 24.917550 | 1172143 | 472.248282 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
158726 | None | POINT (373896.051 6677472.204) | 4621 | Samis | 60.21369 | 24.720970 | 3170209 | 195.675552 |
158727 | None | POINT (372425.65 6676945.528) | 4654 | Yrjö Liipolan tie | 60.20922 | 24.695470 | 3170244 | 137.137640 |
158728 | None | POINT (374696.625 6677972.738) | 4655 | Kandidaatintie | 60.21818 | 24.736987 | 3170245 | 135.341745 |
158729 | Granhultsskolan | POINT (373287.582 6677731.639) | 4624 | Uimahalli | 60.21638 | 24.711260 | 3170212 | 99.408108 |
158730 | Kauniaisten kirkko | POINT (374112.695 6677330.017) | 4665 | Postitori | 60.21267 | 24.728250 | 3170257 | 67.790422 |
159818 rows × 8 columns
As a result, we now have found the closest stop for each building including the attributes of the closest stops that were merged into the results. The last column in the table shows the distance in meters between a given building and the closest stop. The distance is only returned upon request as we did by specifying distance_col="distance"
. The column index_right
provides information about the index number of the closest stop in the stops
GeoDataFrame. If you look carefully, you can see that the number of rows in our result has actually increased slightly from the original (158731 vs 159818). This happens because for some geometries in the buildings
GeoDataFrame, the distance between the building and two (or more) stops have been exactly the same (i.e. they are equidistant). In such cases, the sjoin_nearest()
will store both records into the results by duplicating the building information and attaching information from the stops into separate rows accordingly. In some cases, this can cause trouble for further analysis, so it is good to be careful and investigate whether any duplicate buildings have appeared into the results. If this is the case, and if the duplicates cause issues in your analysis, you might need to pick one of them for further analysis based on some criteria. A simple way is to pick the first (or last) duplicate if you do not have any specific justification for making the selection.
The %time
command at the beginning of the cell provides us some details about the time it took to find the nearest neighbors and merge the data between the two GeoDataFrames. As we can see, the computations are very efficient taking only a matter of some microseconds for almost 159 thousand observations. We can make this even faster by specifying a max_distance
parameter that specifies the maximum search distance. Here, we specify the maximum distance as 100 meters from each building:
%time
closest_limited = building_points.sjoin_nearest(
stops, max_distance=100, distance_col="distance"
)
closest_limited
CPU times: user 5 μs, sys: 2 μs, total: 7 μs
Wall time: 11 μs
name | geometry | index_right | stop_name | stop_lat | stop_lon | stop_id | distance | |
---|---|---|---|---|---|---|---|---|
0 | None | POINT (381166.6 6676424.438) | 1131 | Muusantori | 60.207490 | 24.857450 | 1304138 | 92.679893 |
10 | None | POINT (384645.078 6669763.917) | 592 | Hernesaaren laituri | 60.148287 | 24.923281 | 1204101 | 57.786201 |
12 | None | POINT (384782.782 6669707.017) | 595 | Hernesaaren laituri | 60.148680 | 24.924240 | 1204108 | 79.844881 |
13 | None | POINT (384714.47 6669710.887) | 592 | Hernesaaren laituri | 60.148287 | 24.923281 | 1204101 | 32.640335 |
16 | None | POINT (385040.806 6670639.517) | 596 | Hernesaarenkatu | 60.156110 | 24.930370 | 1204109 | 87.888087 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
158718 | None | POINT (374219.973 6677006.1) | 4651 | Kauniaisten asema | 60.210830 | 24.729330 | 3170240 | 69.803673 |
158719 | None | POINT (374231.494 6676967.402) | 4642 | Kauniaistentie | 60.209810 | 24.731510 | 3170230 | 63.384115 |
158720 | None | POINT (374602.815 6677396.18) | 4673 | Raamattuopisto | 60.213524 | 24.736685 | 3170265 | 56.594370 |
158729 | Granhultsskolan | POINT (373287.582 6677731.639) | 4624 | Uimahalli | 60.216380 | 24.711260 | 3170212 | 99.408108 |
158730 | Kauniaisten kirkko | POINT (374112.695 6677330.017) | 4665 | Postitori | 60.212670 | 24.728250 | 3170257 | 67.790422 |
40128 rows × 8 columns
As we can see, there was a slight improvement in the execution time compared to the previous call without specifying the max_distance
parameter. The difference can be more significant if you have larger datasets or more complicated geometries (e.g. Polygons). One important aspect to notice from these results is that the number of rows has decreased significantly: from 160 to 40 thousand buildings. This happens because our search distance was very low (100 meters), and as a consequence, there were many buildings that did not have any stops within 100 meter radius from them. Because the default join type in sjoin_nearest
is inner
join, all the records that did not have a stop within 100 meters were dropped. If you would like to keep all the records in the results, to e.g. investigate which buildings do not have any stops within the search radius, you can add parameter how="left"
, which will retain all buildings from the original GeoDataFrame.
In some cases, you might actually want to connect the nearest neighbors to each other with a straight line. For doing this, we need to merge also the Point geometries from the other layer into our results, which can then be used to create a LineString connecting the points to each other. This can be useful for many purposes, but in our case, we want to do this to be able to validate whether our results are correct. For merging the closest stop geometries into our results, we can take advantage of the index_right
column in our table and conduct a normal table join using the .merge()
method. Below, we create a table join between the tables using the .merge()
and use the "index_right"
column in the closest
GeoDataFrame as a key on the left table while the index of the stops
is used as the key on the right table. Notice that we only keep the geometry
columns from the stops
GeoDataFrame because all the other attributes already exist in our results:
closest = closest.merge(
stops[[stops.active_geometry_name]], left_on="index_right", right_index=True
)
closest.head()
name | geometry_x | index_right | stop_name | stop_lat | stop_lon | stop_id | distance | geometry_y | |
---|---|---|---|---|---|---|---|---|---|
0 | None | POINT (381166.6 6676424.438) | 1131 | Muusantori | 60.20749 | 24.85745 | 1304138 | 92.679893 | POINT (381256.66 6676446.317) |
1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | Auroran sairaala | 60.19145 | 24.92554 | 1171122 | 400.243370 | POINT (384973.331 6674539.973) |
2 | None | POINT (386317.478 6672100.648) | 61 | Senaatintori | 60.16901 | 24.95046 | 1020450 | 109.819633 | POINT (386277.25 6671998.462) |
3 | Hartwall Arena | POINT (385225.109 6676120.56) | 532 | Veturitie | 60.20661 | 24.92968 | 1174112 | 104.632434 | POINT (385255.784 6676220.595) |
4 | Talli | POINT (385079.733 6676989.745) | 496 | Posti 1 | 60.21345 | 24.91755 | 1172143 | 472.248282 | POINT (384607.679 6677003.267) |
As a result, we now brought a new column into our results, namely the geometry_y
. Because there was a column called geometry
in both GeoDataFrames, geopandas
automatically renamed the columns into geometry_x
and geometry_y
respectively.
Now we have all the data that we need to create a connecting LineString
between the buildings and the closest stops. We can do this by using the linestrings()
function of the shapely
library which is a fast (vectorized) way to create a number of LineString
objects based on point coordinates (the function only accepts numbers as input, i.e. not Point
objects). To extract the point coordinates from the Point
objects stored in the geometry_x
and geometry_y
columns, we use the .get_coordinates()
method of geopandas
that returns the x
and y
coordinates as Series
objects/columns. Then we convert these into numpy
arrays using the to_numpy()
method which we pass to the linestrings()
function. Finally, we store the resulting LineStrings
into a column geometry
which we set as the active geometry of the GeoDataFrame
:
from shapely import linestrings
closest["geometry"] = linestrings(
closest.geometry_x.get_coordinates().to_numpy(),
closest.geometry_y.get_coordinates().to_numpy(),
)
closest = closest.set_geometry("geometry")
closest.head()
name | geometry_x | index_right | stop_name | stop_lat | stop_lon | stop_id | distance | geometry_y | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | None | POINT (381166.6 6676424.438) | 1131 | Muusantori | 60.20749 | 24.85745 | 1304138 | 92.679893 | POINT (381256.66 6676446.317) | LINESTRING (381166.6 381256.66, 6676424.438 66... |
1 | Uimastadion | POINT (385236.565 6674238.472) | 467 | Auroran sairaala | 60.19145 | 24.92554 | 1171122 | 400.243370 | POINT (384973.331 6674539.973) | LINESTRING (385236.565 384973.331, 6674238.472... |
2 | None | POINT (386317.478 6672100.648) | 61 | Senaatintori | 60.16901 | 24.95046 | 1020450 | 109.819633 | POINT (386277.25 6671998.462) | LINESTRING (386317.478 386277.25, 6672100.648 ... |
3 | Hartwall Arena | POINT (385225.109 6676120.56) | 532 | Veturitie | 60.20661 | 24.92968 | 1174112 | 104.632434 | POINT (385255.784 6676220.595) | LINESTRING (385225.109 385255.784, 6676120.56 ... |
4 | Talli | POINT (385079.733 6676989.745) | 496 | Posti 1 | 60.21345 | 24.91755 | 1172143 | 472.248282 | POINT (384607.679 6677003.267) | LINESTRING (385079.733 384607.679, 6676989.745... |
Great! Now we have created a new geometry column that contains the lines between buildings and the closest stops. To better understand the results, let’s create a nice map that visualizes the buildings, stops and the connecting lines between the buildings and the closest stops in a single figure:
ax = closest.plot(lw=0.5, figsize=(10, 10))
ax = building_points.plot(ax=ax, color="red", markersize=2)
ax = stops.plot(ax=ax, color="black", markersize=8.5, marker="s")
# Zoom to specific area
ax.set_xlim(382000, 384100)
ax.set_ylim(6676000, 6678000);
Figure 6.47. A map showing the buildings (red points), the stops (black rectangles) and the lines between the buildings and the closest stops.
As we can see from the Figure 6.47, the nearest neighbor search have worked well as planned, and each building marked with red color has been correctly connected with a line to the closest stop. The map reveals that there are multiple isolated stops that do not have any buildings connected to them. As a practical example, this information could be used e.g. for transport planning by investigating whether these isolated stops are less used by citizens to get on board of the public transport vehicles. This information could again be used by transport planners to decide whether there is a need to maintain these isolated stops. Thus, with these rather simple computations, one can already provide useful information that has relevance in real life. Finally, because we have calculated the distance between buildings and the stops, it is easy to do some descriptive analysis based on this data providing information about levels of access to public transport in the region:
closest["distance"].describe()
count 159818.000000
mean 229.029606
std 292.348698
min 0.743490
25% 99.771301
50% 163.805853
75% 260.461391
max 7698.270635
Name: distance, dtype: float64
As we can see, the average distance to public transport in the region is around 230 meters. More than 75 % of the buildings seem to be within within 3.5 minute walking time (~260 meters with walking speed of 4.5 kmph) which indicates very good situation in terms of accessibility levels in the region overall. There seem to be some really remote buildings in the data as well, as the longest distance to closest public transport stop is more than 7 kilometers.
Nearest neighbors with Polygon and LineString data#
In some cases, you might need to find the closest neighbors for a given set of Polygons or LineStrings. Luckily, the sjoin_nearest()
method works in a similar manner with all geometry types, i.e. you can find the nearest neighbors using Point, LineString, Polygon, MultiPoint, MultiLineString and MultiPoint geometries as input. Also finding nearest neighbors between different geometry types is supported, meaning that you can for example search nearest LineStrings to Polygons, and so on. When using more complex geometries as input (e.g. LineStrings or Polygons), the nearest neighbor search uses spatial index, i.e. it creates bounding boxes around the input geometries and inserts them into an R-Tree which is used to make the search queries more efficient. However, the distance between the nearest neighbours is measured based on the true shapes of the geometric features. In the following, we demonstrate how to conduct nearest neighbor analysis with more complex geometries, such as Polygons and LineStrings.
As a real-life case, we first aim to find the closest urban park to building polygons in a neighborhood called Kamppi, which is located in Helsinki, Finland. Then, we aim to find the closest drivable road (LineString) to each building. Let’s start by reading the data and visualize it on a map:
import geopandas as gpd
buildings = gpd.read_file("data/Helsinki/Kamppi_buildings.gpkg")
parks = gpd.read_file("data/Helsinki/Kamppi_parks.gpkg")
roads = gpd.read_file("data/Helsinki/Kamppi_roads.gpkg")
buildings
osmid | building | name | geometry | |
---|---|---|---|---|
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) |
1 | 8035238 | public | Lasipalatsi | POLYGON ((385459.65 6672184.469, 385456.356 66... |
2 | 8042297 | yes | Radisson Blu Royal | POLYGON ((385104.154 6671916.693, 385101.584 6... |
3 | 14797170 | school | None | POLYGON ((384815.326 6671762.71, 384815.792 66... |
4 | 14797171 | yes | None | POLYGON ((384797.759 6671853.253, 384798.253 6... |
... | ... | ... | ... | ... |
450 | 8092998 | yes | None | POLYGON ((384747.465 6671811.996, 384744.27 66... |
451 | 8280536 | apartments | None | POLYGON ((384839.007 6671934.815, 384839.485 6... |
452 | 8525159 | civic | None | POLYGON ((385495.275 6672164.009, 385494.928 6... |
453 | 8525161 | civic | None | POLYGON ((385486.225 6672173.653, 385486.717 6... |
454 | 8535506 | civic | None | POLYGON ((385481.13 6672167.861, 385482.372 66... |
455 rows × 4 columns
# Plot buildings, parks and roads
ax = buildings.plot(color="gray", figsize=(10, 10))
ax = parks.plot(ax=ax, color="green")
ax = roads.plot(ax=ax, color="red")
Figure 6.48. A map showing the buildings with gray color and the parks (green) in the neighborhood of Kamppi, Helsinki.
Similarly as finding the nearest neighbor using Points as input data, we can use the .sjoin_nearest()
to find nearest neighbor between two Polygon datasets. Here, we find the nearest park for each building Polygon and store the distance into the column distance
:
nearest_parks = buildings.sjoin_nearest(parks, distance_col="distance")
nearest_parks
osmid_left | building | name_left | geometry | index_right | osmid_right | leisure | name_right | distance | |
---|---|---|---|---|---|---|---|---|---|
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) | 12 | 1227991181 | park | Kaartin lasaretin puisto | 100.208527 |
1 | 8035238 | public | Lasipalatsi | POLYGON ((385459.65 6672184.469, 385456.356 66... | 1 | 8042613 | park | Simonpuistikko | 16.284929 |
2 | 8042297 | yes | Radisson Blu Royal | POLYGON ((385104.154 6671916.693, 385101.584 6... | 8 | 37390082 | park | None | 40.039501 |
3 | 14797170 | school | None | POLYGON ((384815.326 6671762.71, 384815.792 66... | 5 | 26999855 | park | None | 0.000000 |
4 | 14797171 | yes | None | POLYGON ((384797.759 6671853.253, 384798.253 6... | 5 | 26999855 | park | None | 14.873403 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
450 | 8092998 | yes | None | POLYGON ((384747.465 6671811.996, 384744.27 66... | 5 | 26999855 | park | None | 70.819624 |
451 | 8280536 | apartments | None | POLYGON ((384839.007 6671934.815, 384839.485 6... | 8 | 37390082 | park | None | 38.574646 |
452 | 8525159 | civic | None | POLYGON ((385495.275 6672164.009, 385494.928 6... | 1 | 8042613 | park | Simonpuistikko | 32.792083 |
453 | 8525161 | civic | None | POLYGON ((385486.225 6672173.653, 385486.717 6... | 1 | 8042613 | park | Simonpuistikko | 90.919207 |
454 | 8535506 | civic | None | POLYGON ((385481.13 6672167.861, 385482.372 66... | 1 | 8042613 | park | Simonpuistikko | 87.821936 |
455 rows × 9 columns
print("Maximum distance:", nearest_parks["distance"].max().round(0))
print("Average distance:", nearest_parks["distance"].mean().round(0))
Maximum distance: 229.0
Average distance: 61.0
Now we have found the nearest park for each building, and as we can see on average the closest park seem to be 61 meters away from the buildings while the longest distance from one of the buildings to the closest park seems to be 229 meters. In a similar manner, we can also find the nearest road from each building as follows:
nearest_roads = buildings.sjoin_nearest(roads, distance_col="distance")
nearest_roads
osmid_left | building | name_left | geometry | index_right | osmid_right | name_right | highway | distance | |
---|---|---|---|---|---|---|---|---|---|
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) | 182 | [126894680, 126894676, 126894678, 126894679] | Eerikinkatu | residential | 11.181066 |
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) | 24 | [126894680, 126894676, 126894678, 126894679] | Eerikinkatu | residential | 11.181066 |
1 | 8035238 | public | Lasipalatsi | POLYGON ((385459.65 6672184.469, 385456.356 66... | 15 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 52.015824 |
1 | 8035238 | public | Lasipalatsi | POLYGON ((385459.65 6672184.469, 385456.356 66... | 33 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 52.015824 |
2 | 8042297 | yes | Radisson Blu Royal | POLYGON ((385104.154 6671916.693, 385101.584 6... | 83 | [37135576, 8035726, 37135575] | Salomonkatu | residential | 6.659959 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
452 | 8525159 | civic | None | POLYGON ((385495.275 6672164.009, 385494.928 6... | 107 | 51707742 | Yrjönkatu | residential | 88.553223 |
453 | 8525161 | civic | None | POLYGON ((385486.225 6672173.653, 385486.717 6... | 15 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 90.569914 |
453 | 8525161 | civic | None | POLYGON ((385486.225 6672173.653, 385486.717 6... | 33 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 90.569914 |
454 | 8535506 | civic | None | POLYGON ((385481.13 6672167.861, 385482.372 66... | 15 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 96.128437 |
454 | 8535506 | civic | None | POLYGON ((385481.13 6672167.861, 385482.372 66... | 33 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 96.128437 |
703 rows × 9 columns
As a result, we now have found the nearest road for each building. We have now 703 rows of data which means that for some buildings there have been more than one road that are exactly the same distance apart. To better understand how the spatial join between the buildings and roads have been conducted, we can again visualize the nearest neighbors with a straight line. To do this, we first bring the geometries from the roads
GeoDataFrame into the same table with the buildings:
nearest_roads = nearest_roads.merge(
roads[["geometry"]], left_on="index_right", right_index=True
)
nearest_roads.head(3)
osmid_left | building | name_left | geometry_x | index_right | osmid_right | name_right | highway | distance | geometry_y | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) | 182 | [126894680, 126894676, 126894678, 126894679] | Eerikinkatu | residential | 11.181066 | LINESTRING (385040.141 6671566.384, 385034.832... |
0 | 11711721042 | yes | Nice Bike Pyörähuolto | POINT (384966.661 6671503.786) | 24 | [126894680, 126894676, 126894678, 126894679] | Eerikinkatu | residential | 11.181066 | LINESTRING (384942.149 6671500.856, 384950.743... |
1 | 8035238 | public | Lasipalatsi | POLYGON ((385459.65 6672184.469, 385456.356 66... | 15 | [42574048, 42574049, 28920739, 77891210, 26999... | Arkadiankatu | secondary | 52.015824 | LINESTRING (385285.226 6672266.801, 385296.799... |
Now we have the geometry_x
column representing the building geometries and the geometry_y
column representing the road geometries (LineStrings). To visualize the connecting lines between buildings and roads, we first need to create geometries that connect the building and closest road geometry from the locations where the distance is shortest. To do this, we can take advantage of a handy function called shortest_line()
from the shapely
library that returns a LineString object between the input geometries showing the shortest distance between them. Based on these, we can create a connecting line between a given building and the closest road. Finally, we create a new GeoDataFrame
called connectors
out of these lines and also store the length of the LineString
s as a separate column:
from shapely import shortest_line
# Generate LineString between nearest points of two geometries
connectors = nearest_roads.apply(
lambda row: shortest_line(row["geometry_x"], row["geometry_y"]), axis=1
)
# Create a new GeoDataFrame out of these geometries
connectors = gpd.GeoDataFrame({"geometry": connectors}, crs=roads.crs)
connectors["distance"] = connectors.length
connectors.head()
geometry | distance | |
---|---|---|
0 | LINESTRING (384966.661 6671503.786, 384960.444... | 11.181066 |
0 | LINESTRING (384966.661 6671503.786, 384960.444... | 11.181066 |
1 | LINESTRING (385487.966 6672217.975, 385460.972... | 52.015824 |
1 | LINESTRING (385487.966 6672217.975, 385460.972... | 52.015824 |
2 | LINESTRING (385050.507 6671936.92, 385046.795 ... | 6.659959 |
Great, now we have a new GeoDataFrame that represents the connectors between the buildings and the drivable roads. Finally, we can visualize the buildings, roads and these connectors to better understand the exact points where the distance between a given building and the closest road is shortest:
m = buildings.explore(color="gray", tiles="CartoDB Positron")
m = roads.explore(m=m, color="red")
m = connectors.explore(m=m, color="green")
m