Geocoding is the process of transforming place names or addresses into coordinates. In this section we will learn how to geocode addresses using
geopandas and geopy 1.
Geopy and other geocoding libaries (such as geocoder) make it easy to locate the coordinates of addresses, cities, countries, and landmarks across the globe using web services (“geocoders”). In practice, geocoders are often Application Programming Interfaces (APIs) where you can send requests, and receive responses in the form of place names, addresses and coordinates. Geopy offers access to several geocoding services, such as Photon 2 and Nominatim 3 that rely on data from OpenStreetMap, among various other services. Check the geopy documentation 1 for more a list of supported geocoding services and usage details.
In this lesson we will use the Nominatim geocoder for locating a relatively small number of addresses. The Nominatim API is not meant for super heavy use. Nominatim doesn’t require the use of an API key, but the usage of the Nominatim service is rate-limited to 1 request per second (3600 / hour). Users also need to provide an identifier for their application, and give appropriate attribution to using OpenStreetMap data. You can read more about Nominatim usage policy in here 4. When using Nominatim via
geopy, we can specify a custom a custom
user_agent parameter to idenfy our application, and we can add a
timeout to allow enough time to get the response from the service.
We will geocode addresses stored in a text file called
addresses.txt. These addresses are located in the Helsinki Region in Southern Finland. The first rows of the data look like this:
id;addr 1000;Itämerenkatu 14, 00101 Helsinki, Finland 1001;Kampinkuja 1, 00100 Helsinki, Finland 1002;Kaivokatu 8, 00101 Helsinki, Finland 1003;Hermannin rantatie 1, 00580 Helsinki, Finland
We have an
id for each row and an address on column
addr. Let’s first read the data into a pandas DataFrame using the
# Import necessary modules import pandas as pd import geopandas as gpd from shapely.geometry import Point # Filepath fp = "data/Helsinki/addresses.txt" # Read the data data = pd.read_csv(fp, sep=";")
Let’s check that we imported the file correctly.
|0||1000||Itämerenkatu 14, 00101 Helsinki, Finland|
|1||1001||Kampinkuja 1, 00100 Helsinki, Finland|
|2||1002||Kaivokatu 8, 00101 Helsinki, Finland|
|3||1003||Hermannin rantatie 1, 00580 Helsinki, Finland|
|4||1005||Tyynenmerenkatu 9, 00220 Helsinki, Finland|
Now we have our data in a
DataFrame and we can geocode our addresses using the the geocoding function in
geopandas that uses
geopy package in the background. The function geocodes a list of addresses (strings) and returns a
GeoDataFrame with the geocoded result.
Here we import the geocoding function and geocode the addresses using Nominatim. The addressess are in the column
addr. Note that we need to provide a custom string (name of your application) in the
user_agent parameter. We can also add the
timeout-parameter to specify how many seconds to wait for a response from the service.
# Import the geocoding tool from geopandas.tools import geocode # Geocode addresses using Nominatim. # You can provide your own geo = geocode( data["addr"], provider="nominatim", user_agent="pythongis_book", timeout=10 )
|0||POINT (24.91556 60.16320)||Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns...|
|1||POINT (24.93166 60.16905)||Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp...|
|2||POINT (24.94168 60.16996)||Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel...|
|3||POINT (24.97783 60.18892)||Hermannin rantatie, Verkkosaari, Kalasatama, S...|
|4||POINT (24.92151 60.15662)||9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E...|
And Voilà! As a result we have a GeoDataFrame that contains an
address-column with the geocoded addresses and a
geometry column containing
Point-objects representing the geographic locations of the addresses. Notice that these addresses are not the original addresses, but those identified by Nominatim. We can join the data from the original text file to the geocoded result to get the address idss and original addresses along.
In this case, we can join the information using the
.join() function because the original data frame and the geocoded output have an identical index and an identical number of rows.
join = geo.join(data)
|0||POINT (24.91556 60.16320)||Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns...||1000||Itämerenkatu 14, 00101 Helsinki, Finland|
|1||POINT (24.93166 60.16905)||Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp...||1001||Kampinkuja 1, 00100 Helsinki, Finland|
|2||POINT (24.94168 60.16996)||Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel...||1002||Kaivokatu 8, 00101 Helsinki, Finland|
|3||POINT (24.97783 60.18892)||Hermannin rantatie, Verkkosaari, Kalasatama, S...||1003||Hermannin rantatie 1, 00580 Helsinki, Finland|
|4||POINT (24.92151 60.15662)||9, Tyynenmerenkatu, Jätkäsaari, Länsisatama, E...||1005||Tyynenmerenkatu 9, 00220 Helsinki, Finland|
Here we can see the geocoded address (column
address) and original address (column
addr) side-by side and verify that the results look correct for the five first rows. Note that in some cases, Nominatim has identified a spesific point-of-interest such as a restaurant as the exact location. Finally, we can save the geocoded addresses to a file.
# Output file path outfp = "data/Helsinki/addresses.shp" # Save to Shapefile join.to_file(outfp)
That’s it. Now we have successfully geocoded those addresses into Points and made a Shapefile out of them. Easy isn’t it!
Nominatim works relatively nicely if you have well defined and well-known addresses such as the ones that we used in this tutorial. In practice, the address needs to exist in the OpenStreetMap database. Sometimes, however, you might want to geocode a “point-of-interest”, such as a museum, only based on it’s name. If the museum name is not on OpenStreetMap, Nominatim won’t provide any results for it, but you might be able to geocode the place using some other geocoder.