Introduction to spatial data analysis with geopandas#

In this chapter, we will first learn how geometric objects are represented in Python using a library called shapely. After this, we will use geopandas 1 as our main tool for spatial data analysis. In the first part of this book, we covered the basics of data analysis using the pandas library. Geopandas extends the capacities of pandas with geospatial operations. The main data structures in geopandas are GeoSeries and GeoDataFrame which extend the capabilities of Series and DataFrames from pandas. This means that we can use all our pandas skills also when working with geopandas. The main difference between geopandas GeoDataFrames and pandas DataFrames is that a GeoDataFrame should contain one column for geometries. By default, the name of this column is 'geometry'. The geometry column is a GeoSeries which contains the geometries (points, lines, polygons, multipolygons etc.) as shapely objects.

Representing vector geometries with shapely#

A core Python library for representing vector data in geospatial domain is called shapely 2. Although shapely library can be a bit hidden from most Python GIS user nowadays, it is one of the fundamental dependencies of geopandas library which is the go-to library when working with (vector) spatial data in Python. Hence, basic knowledge of shapely is fundamental to understand how geometries are stored and handled in geopandas. In the following, we give a quick overview, how to create geometries using shapely.

Creating Point geometries#

When creating geometries in Python, we first need to import the geometric object class (such as Point) that we want to create from shapely.geometry which contains all possible geometry types. After importing the Point class, creating a point is easy: we just pass x and y coordinates into the Point() -class (with a possible z -coordinate) which will create the point for us:

from shapely.geometry import Point

point = Point(2.2, 4.2)
point3D = Point(9.26, -2.456, 0.57)

point
../../../_images/00-introduction-to-geopandas_4_0.svg

As we see here, Jupyter notebook is able to display the shape of the point directly on the screen when we call it. The point object here is represented as it has been defined in the Simple Features Access Specification. Under the hood shapely actually uses a C++ library called GEOS 3 to construct the geometries, which is one of the standard libraries behind various Geographic Information Systems, such as QGIS 4. We can use the print statement to get the coordinate information of these objects in WKT format:

print(point)
print(point3D)
POINT (2.2 4.2)
POINT Z (9.26 -2.456 0.57)

3D-point can be recognized from the capital Z -letter in front of the coordinates. Extracting the coordinates of a Point can be done in a couple of different ways. We can use the coords attribute contains the coordinate information as a CoordinateSequence which is a specific data type of Shapely. In addition, we can also directly use the attributes x and y to get the coordinates directly as plain decimal numbers.

list(point.coords)
[(2.2, 4.2)]
print(point.x, point.y)
2.2 4.2

Points and other shapely objects have many useful built-in attributes and methods 5, such as calculating the Euclidian distance between points or creating a buffer from the point that converts the point into a circle Polygon with specific radius. However, all of these functionalities are integrated into geopandas and we will go through them later in the book.

Creating LineString geometries#

Creating LineString -objects is fairly similar to creating Shapely Points. Now, instead using a single coordinate-tuple we can construct the line using either a list of shapely Point -objects or pass the points as coordinate-tuples:

from shapely.geometry import Point, LineString

point1 = Point(2.2, 4.2)
point2 = Point(7.2, -25.1)
point3 = Point(9.26, -2.456)

line = LineString([point1, point2, point3])
line_from_tuples = LineString([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])
line
../../../_images/00-introduction-to-geopandas_12_0.svg
print(line)
LINESTRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456)

As we can see from above, the WKT representation of the line -variable constitutes of multiple coordinate-pairs. LineString -object has many useful built-in attributes and methods similarly as Point -objects. It is for instance possible to extract the coordinates, calculate the length of the LineString, find out the centroid of the line, create points along the line at specific distance, calculate the closest distance from a line to specified Point, or simplify the geometry. A full list of functionalities can be read from shapely documentation 2. Most of these functionalities are directly implemented in geopandas (see next chapter), hence you very seldom need to parse these information directly from the shapely geometries yourself. However, here we go through a few of them for reference. We can extract the coordinates of a LineString similarly as with Point:

list(line.coords)
[(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)]

As a result, we have a list of coordinate tuples (x,y) inside a list. If you need to access all x -coordinates or all y -coordinates of the line, you can do it directly using the xy attribute:

xcoords = list(line.xy[0])
ycoords = list(line.xy[1])

print(xcoords)
print(ycoords)
[2.2, 7.2, 9.26]
[4.2, -25.1, -2.456]

It is possible to retrieve specific attributes such as length of the line and center of the line (centroid) straight from the LineString object itself:

length = line.length
centroid = line.centroid
print(f"Length of our line: {length:.2f} units")
print(f"Centroid: {centroid}")
Length of our line: 52.46 units
Centroid: POINT (6.229961354035622 -11.89241115757239)

As you can see, the centroid of the line is again a Shapely Point object. In practice, you would rarely access these attributes directly from individual shapely geometries, but we can do the same things for a set of geometries at once using geopandas.

Creating Polygon geometries#

Creating a Polygon -object continues the same logic how Point and LineString were created. A Polygon can be created by passing a list of Point objects or a list of coordinate-tuples as input for the Polygon class. Polygon needs at least three coordinate-tuples to form a surface. In the following, we use the same points from the earlier LineString example to create a Polygon:

poly = Polygon([point1, point2, point3])
poly
../../../_images/00-introduction-to-geopandas_22_0.svg
print(poly)
POLYGON ((2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2))

Notice that the Polygon WKT representation has double parentheses around the coordinates (i.e. POLYGON ((<values in here>)) ). This is because Polygon can also have holes inside of it. As the help of the Polygon 6 object tells us (output here is slightly simplified from the original), a Polygon is constructed from exterior and interior (optional) which are the attributes of the Polygon object. The interior can be used to create holes inside the Polygon:

class Polygon(shapely.geometry.base.BaseGeometry)
 |  Polygon(shell=None, holes=None)
 |  
 |  A two-dimensional figure bounded by a linear ring
 |  
 |  A polygon has a non-zero area. It may have one or more negative-space
 |  "holes" which are also bounded by linear rings. If any rings cross each
 |  other, the feature is invalid and operations on it may fail.
 |  
 |  Attributes
 |  ----------
 |  exterior : LinearRing
 |      The ring which bounds the positive space of the polygon.
 |  interiors : sequence
 |      A sequence of rings which bound all existing holes.
 |  
 |  Parameters
 |  ----------
 |  shell : sequence
 |     A sequence of (x, y [,z]) numeric coordinate pairs or triples.
 |     Also can be a sequence of Point objects.
 |  holes : sequence
 |      A sequence of objects which satisfy the same requirements as the
 |      shell parameters above

If we want to create a Polygon with a hole, we can do this by using parameters shell for the exterior and holes for the interiors. Let’s see how we can create a Polygon with a single hole by passing the coordinates in decimal degrees (lon, lat). Notice, that because a Polygon can have multiple holes, the hole_coords variable below contains nested square brackets ([[ ]]), which is due to the possibility of having multiple holes in a single Polygon:

border_coords = [(-180, 90), (-180, -90), (180, -90), (180, 90)]
hole_coords = [[(-170, 80), (-170, -80), (170, -80), (170, 80)]]
poly2 = Polygon(shell=border_coords, holes=hole_coords)
poly2
../../../_images/00-introduction-to-geopandas_26_0.svg
print(poly2)
POLYGON ((-180 90, -180 -90, 180 -90, 180 90, -180 90), (-170 80, -170 -80, 170 -80, 170 80, -170 80))

As we can see the Polygon has now two different tuples of coordinates. The first one represents the outerior and the second one represents the interior, i.e. a hole inside of the Polygon. We can access different attributes directly from the Polygon object itself that can be really useful for many analyses, such as area, centroid, bounding box, exterior, and exterior-length (and you guessed it, these are available directly in geopandas as well). See a full list of methods in the shapely documentation 2. Here, we can see a few of the available attributes and how to access them:

print("Polygon centroid: ", poly2.centroid)
print("Polygon Area: ", poly2.area)
print("Polygon Bounding Box: ", poly2.bounds)
print("Polygon Exterior: ", poly2.exterior)
print("Polygon Exterior Length: ", poly2.exterior.length)
Polygon centroid:  POINT (-0 -0)
Polygon Area:  10400.0
Polygon Bounding Box:  (-180.0, -90.0, 180.0, 90.0)
Polygon Exterior:  LINEARRING (-180 90, -180 -90, 180 -90, 180 90, -180 90)
Polygon Exterior Length:  1080.0

As we can see above, it is again fairly straightforward to access different attributes from the Polygon -object. Notice, that the length and area information are presented here in decimal degrees because our input coordinates were passed as longitudes and latitudes. We can get this information in more sensible format (in meters and m2) when we start working with data in a projected coordinate system later in the book.

Now you have learned how all the basic geometries can be created in Python using shapely. As a one last thing related to vector geometries, we will show you one useful tool that comes with shapely called box 7. The box -function can be used for creating a rectangular bounding box 8 based on minimum and maximum x and y coordinates, i.e. giving coordinate information of the bottom-left and top-right corners of the rectangle. Next, we will use box instead normal Polygon constructorto create the same polygon exterior as above:

from shapely.geometry import box

min_x, min_y = -180, -90
max_x, max_y = 180, 90
poly3 = box(minx=min_x, miny=min_y, maxx=max_x, maxy=max_y)
poly3
../../../_images/00-introduction-to-geopandas_31_0.svg
print(poly3)
POLYGON ((180 -90, 180 90, -180 90, -180 -90, 180 -90))

As we can see, creating a rectangular bounding box Polygon can be done very easily by merely passing four coordinates to the box -function. Quite handy! In practice, the box function is quite useful for example when you want to select geometries from specific area of interest. In these cases, you only need to find out the coordinates of two points on the map to be able create the polygon, which you can use to select the data.

Creating MultiPoint, MultiLineString and MultiPolygon geometries#

Creating a collection of Point, LineString or Polygon objects is very straightforward now as you have seen how to create the basic geometric objects. In the Multi -versions of these geometries, you just pass a list of points, lines or polygons to the MultiPoint, MultiLineString or MultiPolygon constructors as shown below:

from shapely.geometry import MultiPoint, MultiLineString, MultiPolygon

multipoint = MultiPoint([Point(2, 2), Point(3, 3)])
multipoint
../../../_images/00-introduction-to-geopandas_36_0.svg
multiline = MultiLineString(
    [LineString([(2, 2), (3, 3)]), LineString([(4, 3), (6, 4)])]
)
multiline
../../../_images/00-introduction-to-geopandas_37_0.svg
multipoly = MultiPolygon(
    [Polygon([(0, 0), (0, 4), (4, 4)]), Polygon([(6, 6), (6, 12), (12, 12)])]
)
multipoly
../../../_images/00-introduction-to-geopandas_38_0.svg

Question 5.1#

Create these shapes using Shapely!

  • Triangle

  • Square

  • Circle

# Use this cell to enter your solution.
# Solution


# Triangle
Polygon([(0, 0), (2, 4), (4, 0)])

# Square
Polygon([(0, 0), (0, 4), (4, 4), (4, 0)])

# Circle (using a buffer around a point)
point = Point((0, 0))
point.buffer(1)
../../../_images/00-introduction-to-geopandas_41_0.svg

Getting started with geopandas#

Figure 6.1. Geometry column in a GeoDataFrame.

Figure 6.1. Geometry column in a GeoDataFrame.

Similar to importing import pandas as pd, we will import geopandas as gpd:

import geopandas as gpd

Reading a Shapefile#

Esri Shapefile is the default file format when reading in data usign geopandas, so we only need to pass the file path in order to read in our data:

from pathlib import Path

input_folder = Path("../data/NLS")
fp = input_folder / "m_L4132R_p.shp"
# Read file using gpd.read_file()
data = gpd.read_file(fp)

Let’s check the data type:

type(data)
geopandas.geodataframe.GeoDataFrame

Here we see that our data -variable is a GeoDataFrame. GeoDataFrame extends the functionalities of pandas.DataFrame in a way that it is possible to handle spatial data using similar approaches and datastructures as in pandas (hence the name geopandas).

Let’s check the first rows of data:

data.head()
TEKSTI RYHMA LUOKKA TASTAR KORTAR KORARV KULKUTAPA KOHDEOSO AINLAHDE SYNTYHETKI ... KARTOGLK ALUEJAKOON VERSUH SUUNTA SIIRT_DX SIIRT_DY KORKEUS ATTR2 ATTR3 geometry
0 None 64 32421 5000 0 0.0 0 1812247077 1 20180125 ... 0 0 0 0 0 0 0.0 0 0 POLYGON ((379394.248 6689991.936, 379389.790 6...
1 None 64 32421 5000 0 0.0 0 1718796908 1 20180120 ... 0 0 0 0 0 0 0.0 0 0 POLYGON ((378980.811 6689359.377, 378983.401 6...
2 None 64 32421 20000 0 0.0 0 411167695 1 20180120 ... 0 0 0 0 0 0 0.0 0 0 POLYGON ((378804.766 6689256.471, 378817.107 6...
3 None 64 32421 20000 0 0.0 0 411173768 1 20180120 ... 0 0 0 0 0 0 0.0 0 0 POLYGON ((379229.695 6685025.111, 379233.366 6...
4 None 64 32421 20000 0 0.0 0 411173698 1 20180120 ... 0 0 0 0 0 0 0.0 0 0 POLYGON ((379825.199 6685096.247, 379829.651 6...

5 rows × 21 columns

  • Check all column names:

data.columns.values
array(['TEKSTI', 'RYHMA', 'LUOKKA', 'TASTAR', 'KORTAR', 'KORARV',
       'KULKUTAPA', 'KOHDEOSO', 'AINLAHDE', 'SYNTYHETKI', 'KUOLHETKI',
       'KARTOGLK', 'ALUEJAKOON', 'VERSUH', 'SUUNTA', 'SIIRT_DX',
       'SIIRT_DY', 'KORKEUS', 'ATTR2', 'ATTR3', 'geometry'], dtype=object)

As you might guess, the column names are in Finnish. Let’s select only the useful columns and rename them into English:

data = data[["RYHMA", "LUOKKA", "geometry"]]

Define new column names in a dictionary:

colnames = {"RYHMA": "GROUP", "LUOKKA": "CLASS"}

Rename:

data.rename(columns=colnames, inplace=True)

Check the output:

data.head()
GROUP CLASS geometry
0 64 32421 POLYGON ((379394.248 6689991.936, 379389.790 6...
1 64 32421 POLYGON ((378980.811 6689359.377, 378983.401 6...
2 64 32421 POLYGON ((378804.766 6689256.471, 378817.107 6...
3 64 32421 POLYGON ((379229.695 6685025.111, 379233.366 6...
4 64 32421 POLYGON ((379825.199 6685096.247, 379829.651 6...

Check your understanding#

Figure out the following information from our input data using your pandas skills:

  • Number of rows?

  • Number of classes?

  • Number of groups?

print("Number of rows", len(data["CLASS"]))
print("Number of classes", data["CLASS"].nunique())
print("Number of groups", data["GROUP"].nunique())
Number of rows 4311
Number of classes 20
Number of groups 1

It is always a good idea to explore your data also on a map. Creating a simple map from a GeoDataFrame is really easy: you can use .plot() -function from geopandas that creates a map based on the geometries of the data. Geopandas actually uses matplotlib for plotting which we introduced in Lesson 7 of the Geo-Python course.

Let’s try it out, and plot our GeoDataFrame:

data.plot()
<AxesSubplot:>
../../../_images/00-introduction-to-geopandas_66_1.png

Voilá! As we can see, it is really easy to produce a map out of your Shapefile with geopandas. Geopandas automatically positions your map in a way that it covers the whole extent of your data.

If you are living in the Helsinki region, you might recognize the shapes plotted on the map!

Geometries in geopandas#

Geopandas takes advantage of Shapely’s geometric objects. Geometries are stored in a column called geometry that is a default column name for storing geometric information in geopandas.

Let’s print the first 5 rows of the column ‘geometry’:

data["geometry"].head()
0    POLYGON ((379394.248 6689991.936, 379389.790 6...
1    POLYGON ((378980.811 6689359.377, 378983.401 6...
2    POLYGON ((378804.766 6689256.471, 378817.107 6...
3    POLYGON ((379229.695 6685025.111, 379233.366 6...
4    POLYGON ((379825.199 6685096.247, 379829.651 6...
Name: geometry, dtype: geometry

As we can see the geometry column contains familiar looking values, namely Shapely Polygon -objects. Since the spatial data is stored as Shapely objects, it is possible to use Shapely methods when dealing with geometries in geopandas.

Let’s have a closer look at the polygons and try to apply some of the Shapely methods we are already familiar with.

Let’s start by checking the area of the first polygon in the data:

# Access the geometry on the first row of data
data.at[0, "geometry"]
../../../_images/00-introduction-to-geopandas_72_0.svg
# Print information about the area
print("Area:", round(data.at[0, "geometry"].area, 0), "square meters")
Area: 76.0 square meters

Let’s do the same for the first five rows in the data;

  • Iterate over the GeoDataFrame rows using the iterrows() -function that we learned during the Lesson 6 of the Geo-Python course.

  • For each row, print the area of the polygon (here, we’ll limit the for-loop to a selection of the first five rows):

# Iterate over rows and print the area of a Polygon
for index, row in data[0:5].iterrows():

    # Get the area from the shapely-object stored in the geometry-column
    poly_area = row["geometry"].area

    # Print info
    print(
        "Polygon area at index {index} is: {area:.0f} square meters".format(
            index=index, area=poly_area
        )
    )
Polygon area at index 0 is: 76 square meters
Polygon area at index 1 is: 2652 square meters
Polygon area at index 2 is: 3186 square meters
Polygon area at index 3 is: 13075 square meters
Polygon area at index 4 is: 3981 square meters

As you see from here, all pandas methods, such as the iterrows() function, are directly available in Geopandas without the need to call pandas separately because Geopandas is an extension for pandas.

In practice, it is not necessary to use the iterrows()-approach to calculate the area for all features. Geodataframes and geoseries have an attribute area which we can use for accessing the area for each feature at once:

data.area
0          76.027392
1        2652.054186
2        3185.649995
3       13075.165279
4        3980.682621
            ...     
4306     2651.800270
4307      376.503380
4308      413.942555
4309     3487.927677
4310     1278.963199
Length: 4311, dtype: float64

Let’s next create a new column into our GeoDataFrame where we calculate and store the areas of individual polygons:

# Create a new column called 'area'
data["area"] = data.area

Check the output:

data["area"]
0          76.027392
1        2652.054186
2        3185.649995
3       13075.165279
4        3980.682621
            ...     
4306     2651.800270
4307      376.503380
4308      413.942555
4309     3487.927677
4310     1278.963199
Name: area, Length: 4311, dtype: float64

These values correspond to the ones we saw in previous step when iterating rows.

Let’s check what is the min, max and mean of those areas using familiar functions from our previous Pandas lessions.

# Maximum area
round(data["area"].max(), 2)
4084558.15
# Minimum area
round(data["area"].min(), 2)
0.67
# Average area
round(data["area"].mean(), 2)
11522.29

Writing data into a file#

It is possible to export GeoDataFrames into various data formats using the to_file() method. In our case, we want to export subsets of the data into Shapefiles (one file for each feature class).

Let’s first select one class (class number 36200, “Lake water”) from the data as a new GeoDataFrame:

# Select a class
selection = data.loc[data["CLASS"] == 36200]

Check the selection:

selection.plot()
<AxesSubplot:>
../../../_images/00-introduction-to-geopandas_89_1.png
  • write this layer into a new Shapefile using the gpd.to_file() -function:

# Create a output path for the data
output_folder = Path("results")

if not output_folder.exists():
    output_folder.mkdir()

output_fp = output_folder / "Class_36200.shp"
# Write those rows into a new file (the default output file format is Shapefile)
selection.to_file(output_fp)

Question 6.2#

Read the output Shapefile in a new geodataframe, and check that the data looks ok.

# Use this cell to enter your solution.
# Solution

temp = gpd.read_file(output_fp)

# Check first rows
temp.head()
GROUP CLASS area geometry
0 64 36200 1318.878221 POLYGON ((379089.473 6687069.722, 379093.838 6...
1 64 36200 22918.867073 POLYGON ((376732.156 6687178.141, 376731.301 6...
2 64 36200 5759.318345 POLYGON ((377939.741 6684539.678, 377929.192 6...
3 64 36200 265899.648379 POLYGON ((372948.857 6688594.047, 372935.951 6...
4 64 36200 128221.314258 POLYGON ((370900.963 6689201.649, 370890.077 6...
# Solution

# You can also plot the data for a visual check
temp.plot()
<AxesSubplot:>
../../../_images/00-introduction-to-geopandas_96_1.png

Grouping the GeoDataFrame#

Next we will automate the file export task. we will group the data based on column CLASS and export a shapefile for each class.

This can be achieved using the groupby() familiar from the pandas library and also available in GeoDataFrames. This function groups data based on values on selected column(s).

Before continuing, let’s check the first rows of our input data:

data.head()
GROUP CLASS geometry area
0 64 32421 POLYGON ((379394.248 6689991.936, 379389.790 6... 76.027392
1 64 32421 POLYGON ((378980.811 6689359.377, 378983.401 6... 2652.054186
2 64 32421 POLYGON ((378804.766 6689256.471, 378817.107 6... 3185.649995
3 64 32421 POLYGON ((379229.695 6685025.111, 379233.366 6... 13075.165279
4 64 32421 POLYGON ((379825.199 6685096.247, 379829.651 6... 3980.682621

The CLASS column in the data contains information about different land use types. With .unique() -function we can quickly see all different values in that column:

# Print all unique values in the column
data["CLASS"].unique()
array([32421, 32200, 34300, 34100, 34700, 32500, 32112, 32111, 32611,
       32612, 32800, 32900, 35300, 35412, 35411, 35421, 33000, 33100,
       36200, 36313])

Now we can use that information to group our data and save all land use types into different layers:

# Group the data by class
grouped = data.groupby("CLASS")

# Let's see what we have
grouped
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f2b1b55a0d0>

As we can see, groupby -function gives us an object called DataFrameGroupBy which is similar to list of keys and values (in a dictionary) that we can iterate over.

Check group keys:

grouped.groups.keys()
dict_keys([32111, 32112, 32200, 32421, 32500, 32611, 32612, 32800, 32900, 33000, 33100, 34100, 34300, 34700, 35300, 35411, 35412, 35421, 36200, 36313])

The group keys are unique values from the column by which we grouped the dataframe.

Check how many rows of data each group has:

# Iterate over the grouped object
for key, group in grouped:

    # Let's check how many rows each group has:
    print("Terrain class:", key)
    print("Number of rows:", len(group), "\n")
Terrain class: 32111
Number of rows: 1 

Terrain class: 32112
Number of rows: 1 

Terrain class: 32200
Number of rows: 2 

Terrain class: 32421
Number of rows: 110 

Terrain class: 32500
Number of rows: 2 

Terrain class: 32611
Number of rows: 257 

Terrain class: 32612
Number of rows: 11 

Terrain class: 32800
Number of rows: 80 

Terrain class: 32900
Number of rows: 28 

Terrain class: 33000
Number of rows: 5 

Terrain class: 33100
Number of rows: 118 

Terrain class: 34100
Number of rows: 3005 

Terrain class: 34300
Number of rows: 1 

Terrain class: 34700
Number of rows: 3 

Terrain class: 35300
Number of rows: 134 

Terrain class: 35411
Number of rows: 35 

Terrain class: 35412
Number of rows: 449 

Terrain class: 35421
Number of rows: 5 

Terrain class: 36200
Number of rows: 56 

Terrain class: 36313
Number of rows: 8 

There are, for example, 56 lake polygons in the input data.

We can also check how the last group looks like (we have the variables in memory from the last iteration of the for-loop):

group.head()
GROUP CLASS geometry area
4303 64 36313 POLYGON ((377127.305 6688073.257, 377116.045 6... 9619.307973
4304 64 36313 POLYGON ((371141.897 6677999.999, 371139.757 6... 25266.167705
4305 64 36313 POLYGON ((371498.720 6680399.799, 371497.585 6... 364.087680
4306 64 36313 POLYGON ((375668.607 6682942.062, 375671.489 6... 2651.800270
4307 64 36313 POLYGON ((368411.063 6679328.990, 368411.424 6... 376.503380

Notice that the index numbers refer to the row numbers in the original data -GeoDataFrame.

Check also the data type of the group:

type(group)
geopandas.geodataframe.GeoDataFrame

As we can see, each set of data are now grouped into separate GeoDataFrames, and we can save them into separate files.

Saving multiple output files#

Let’s export each class into a separate Shapefile. While doing this, we also want to create unique filenames for each class.

When looping over the grouped object, information about the class is stored in the variable key, and we can use this information for creating new variable names inside the for-loop. For example, we want to name the shapefile containing lake polygons as “terrain_36200.shp”.

String formatting

There are different approaches for formatting strings in Python. Here are a couple of different ways for putting together file-path names using two variables:

basename = "terrain"
key = 36200

# OPTION 1. Concatenating using the `+` operator:
out_fp = basename + "_" + str(key) + ".shp"

# OPTION 2. Positional formatting using `%` operator
out_fp = "%s_%s.shp" %(basename, key)
    
# OPTION 3. Positional formatting using `.format()`
out_fp = "{}_{}.shp".format(basename, key)

Read more from here: https://pyformat.info/

Let’s now export terrain classes into separate Shapefiles.

  • First, create a new folder for the outputs:

# Determine output directory
output_folder = Path("results")

# Create a new folder called 'Results'
result_folder = output_folder / "Results"

# Check if the folder exists already
if not result_folder.exists():

    print("Creating a folder for the results..")
    # If it does not exist, create one
    result_folder.mkdir()
Creating a folder for the results..

At this point, you can go to the file browser and check that the new folder was created successfully.

  • Iterate over groups, create a file name, and save group to file:

# Iterate over the groups
for key, group in grouped:
    # Format the filename
    output_name = Path("terrain_{}.shp".format(key))

    # Print information about the process
    print("Saving file", output_name.name)

    # Create an output path
    outpath = result_folder / output_name

    # Export the data
    group.to_file(outpath)
Saving file terrain_32111.shp
Saving file terrain_32112.shp
Saving file terrain_32200.shp
Saving file terrain_32421.shp
Saving file terrain_32500.shp
Saving file terrain_32611.shp
Saving file terrain_32612.shp
Saving file terrain_32800.shp
Saving file terrain_32900.shp
Saving file terrain_33000.shp
Saving file terrain_33100.shp
Saving file terrain_34100.shp
Saving file terrain_34300.shp
Saving file terrain_34700.shp
Saving file terrain_35300.shp
Saving file terrain_35411.shp
Saving file terrain_35412.shp
Saving file terrain_35421.shp
Saving file terrain_36200.shp
Saving file terrain_36313.shp

Excellent! Now we have saved those individual classes into separate Shapefiles and named the file according to the class name. These kind of grouping operations can be really handy when dealing with layers of spatial data. Doing similar process manually would be really laborious and error-prone.

Save attributes to a text file#

We can also extract basic statistics from our geodataframe, and save this information as a text file.

Let’s summarize the total area of each group:

area_info = grouped.area.sum().round()
area_info
CLASS
32111        1834.0
32112        2148.0
32200      105737.0
32421      702073.0
32500      109747.0
32611    13135597.0
32612      107343.0
32800     1465278.0
32900      617209.0
33000      659465.0
33100     3777595.0
34100    12381611.0
34300        1627.0
34700        2786.0
35300     1382940.0
35411      411198.0
35412     4710133.0
35421       67864.0
36200     9986966.0
36313       43459.0
Name: area, dtype: float64

Save area info to csv using pandas:

# Create an output path
area_info.to_csv(result_folder / "terrain_class_areas.csv", header=True)

Footnotes#


1

https://geopandas.org/

2(1,2,3)

https://shapely.readthedocs.io/en/stable/manual.html

3

https://trac.osgeo.org/geos

4

http://www.qgis.org/en/site/

5

https://shapely.readthedocs.io/en/stable/manual.html#general-attributes-and-methods

6

https://shapely.readthedocs.io/en/stable/manual.html#polygons

7

https://shapely.readthedocs.io/en/stable/manual.html#shapely.geometry.box

8

https://en.wikipedia.org/wiki/Minimum_bounding_box