Geometric data manipulations#

Here we demonstrate some of the most common geometry manipulation functions available in geopandas. We will continue exploring the census tract data from Austin, Texas. It is often useful to do geometric manipulations on administrative borders for further analysis and visualization purposes. We will learn how to generate centroids, different outlines and buffer zones for the polygons.

import geopandas as gpd
import matplotlib.pyplot as plt
from pathlib import Path

# Define path do the data
data_folder = Path("data/Austin")
fp = data_folder / "austin_pop_density_2019.gpkg"

# Read in the data and check the contents
data = gpd.read_file(fp)
data.head()

	pop2019	tract	area_km2	pop_density_km2	geometry
0	6070.0	002422	4.029772	1506.288769	POLYGON ((615643.487 3338728.496, 615645.477 3...
1	2203.0	001751	1.532030	1437.961408	POLYGON ((618576.586 3359381.053, 618614.330 3...
2	7419.0	002411	3.960344	1873.322183	POLYGON ((619200.163 3341784.654, 619270.849 3...
3	4229.0	000401	2.181762	1938.341868	POLYGON ((621623.757 3350508.165, 621656.294 3...
4	4589.0	002313	2.431208	1887.538655	POLYGON ((621630.247 3345130.744, 621717.926 3...

For the purposes of geometric manipulations, we are mainly interested in the geometry column which contains the polygon geometries. Remember, that the data type of the geometry-column is GeoSeries. Individual geometries are eventually shapely objects and we can use all of shapely’s tools for geometry manipulation directly via geopandas.

# Check contents of the geometry column
data["geometry"].head()

  POLYGON ((615643.487 3338728.496, 615645.477 3...
  POLYGON ((618576.586 3359381.053, 618614.330 3...
  POLYGON ((619200.163 3341784.654, 619270.849 3...
  POLYGON ((621623.757 3350508.165, 621656.294 3...
  POLYGON ((621630.247 3345130.744, 621717.926 3...
Name: geometry, dtype: geometry

# Check data type of the geometry column
type(data["geometry"])

geopandas.geoseries.GeoSeries

# Check data type of a value in the geometry column
type(data["geometry"].values[0])

shapely.geometry.polygon.Polygon

Let’s first plot the original geometries. We can use the in-built plotting function in geopandas to plot the geometries, and matplotlib.pyplot to turn off axis lines and labels.

data.plot(facecolor="none", linewidth=0.2)

plt.axis("off")
plt.show()

../../../_images/d5d853c63cecdc5e834dbf6781e01f40ae340d4507091e36b72e4a6a4cea070b.png

Figure 6.13. Basic plot of the census tracts.

Centroid#

Extracting the centroid of geometric features is useful in many cases. Geometric centroid can, for example, be used for locating text labels in visualizations. We can extract the center point of each polygon via the centroid-attribute of the geometry-column. The data should be in a projected coordinate reference system when calculating the centroids. If trying to calculate centroids based on latitude and longitude information, geopandas will warn us that the results are likely incorrect. Our sample data are in WGS 84 / UTM zone 14N (EPSG:32614), which is a projected , and we can proceed to calculating the centroids.

data.crs.name

'WGS 84 / UTM zone 14N'

data["geometry"].centroid.head()

  POINT (616990.190 3339736.002)
  POINT (619378.303 3359650.002)
  POINT (620418.753 3342194.171)
  POINT (622613.506 3351414.386)
  POINT (622605.359 3343869.554)
dtype: geometry

We can also apply the method directly to the GeoDataFrame to achieve the same result using the syntax data.centroid. At the same time, we can also plot the centroids for a visual check.

data.centroid.plot(markersize=1)

plt.axis("off")
plt.show()

../../../_images/dedf7abd12114e5e003cf7f542e349546b128a007b4aaf41489c7108aceb9c58.png

Figure 6.14. Basic plot of census tract centroids.

Unary union#

We can generate a joint outline for the administrative areas through creating a geometric union among all geometries. This could be useful, for example, for visualizing the outlines of a study area. The unary_union returns a single geometry object, which is automatically visualized when running the code in a Jupyter Notebook.

data.unary_union

../../../_images/b40acc829f8daf55b34e27b2805455da1b21e0fd6dbc71ac264264b733d414ed.svg

Figure 6.15. Union of all of census tract polygon geometries.

Simplifying geometries#

Geometry simplification is a useful process especially when visualizing data that has very detailed geometry. With our sample data, we can generate simplified version of the outline extent. The tolerance parameter controls the level of simplification.

data.unary_union.simplify(tolerance=1000)

../../../_images/0837e75803c6b3e30a33ba820542ca99f34b81cb355c28ba56aeba28438ed3e4.svg

Figure 6.16. Simplified union of the census tract polygons.

Bounding polygon#

Bounding polygons are useful in many cases for describing the approximate extent of geographic data. A minimum bounding rectangle, also called a bounding box or an envelope is the smallest rectangular polygon surrounding a geometric object. In a GeoDataFrame, the envelope attribute returns the bounding rectangle for each geometry.

data.envelope.head()

  POLYGON ((615643.487 3337909.895, 618358.033 3...
  POLYGON ((618529.497 3358797.000, 620192.632 3...
  POLYGON ((619198.456 3340875.421, 621733.880 3...
  POLYGON ((621599.087 3350329.320, 623714.366 3...
  POLYGON ((621630.247 3343015.679, 624133.189 3...
dtype: geometry

In order to get the bounding rectangle for the whole layer, we first create an union of all geometries using unary_union, and then create the bounding rectangle for that polygon.

data.unary_union.envelope

../../../_images/b45ac1f1341ebf5ebd8cb1b69568543615877a68131677a7cd95836f46c07fd9.svg

Figure 6.17. Minimum bounding box for the census tracts.

Corner coordinates of the bounding box for a GeoDataFrame can be fetched via the total_bounds attribute.

data.total_bounds

array([ 608125.39434001, 3337909.89495403,  629828.38846075,
       3370513.6825782 ])

The bounds attribute returns the bounding coordinates of each feature.

data.bounds.head()

	minx	miny	maxx	maxy
0	615643.487492	3.337910e+06	618358.032738	3.341257e+06
1	618529.497100	3.358797e+06	620192.631882	3.360614e+06
2	619198.455980	3.340875e+06	621733.879615	3.343443e+06
3	621599.086586	3.350329e+06	623714.365506	3.352436e+06
4	621630.247042	3.343016e+06	624133.188692	3.345131e+06

Convex hull#

A bit more detailed delineation of the data extent can be extracted using a convex hull which represents the smalles possible polygon that contains all points in an object. If we apply the convex hull method on the whole GeoDataFrame, we will get a GeoSeries containing a convex hull for each polygon separately.

data.convex_hull.head()

  POLYGON ((616870.883 3337909.895, 616852.964 3...
  POLYGON ((619496.705 3358797.000, 618962.703 3...
  POLYGON ((619848.500 3340875.421, 619811.394 3...
  POLYGON ((622145.426 3350329.320, 622132.429 3...
  POLYGON ((623931.770 3343015.679, 622426.307 3...
dtype: geometry

In order to create a covex hull for the whole extent, we need to first create an union of all polygons.

data.unary_union.convex_hull

../../../_images/44098a9aa149ec3de0a915dbfc3369ae21aac21fba34b8b1752489f90b84b4be.svg

Figure 6.18. Smallest convex polygon for the census tracts.

Buffer#

Buffering is a common spatial operation that has a multitude of use cases in spatial analyses. For example, in transport network analyses, it is good to fetch the transport network also from outside the study area in order to capture routes that go beyond the study area border. The distance parameter in the buffer function defines the radius or the buffer (according to the coordinate reference system of the data). Applying the buffer function on the entire data frame will produce separate buffers for each census tract.

# 1000 m buffer for each polygon
data.buffer(1000).plot(edgecolor="white")

plt.axis("off")
plt.show()

../../../_images/954b7a7c4450970b657af9222f0a73493b7f097e14c23b4789f09e8df76431b8.png

Figure 6.19. 1km buffer for each census tract.

If we want one buffer for the whole area, we first need to combine the geometries into one object before the buffer analysis.

# 1000 m buffer for each polygon
data.unary_union.buffer(1000)

../../../_images/0f0fbecf2d561602ecdf8b34cec1919cb57a9298e7146734f8af7288b3e5d6ec.svg

Figure 6.20. 1km buffer for each census tract.

Dissolving and merging geometries#

Spatial data aggregation refers to combining geometries into coarser spatial units based on some attributes. The process may also include the calculation of summary statistics.

In pandas, we learned how to group and aggregate data using the groupbymethod. In geopandas, there is a function called dissolve() that groups the data based on an anttribute column and unions the geometries for each group in that attribute. At the same time, we can also get summary statistics of the attributes. Read more about the details of the dissolve-function and related aggregation options in the geopandas online documentation [1].

To exceplify how dissolve works with our sample data, let’s create create a new column to indicate census tracts with above average population density. We can do this by adding a new empty column dense and adding values that indicate above and below average population densities per census tract.

# Create a new column and add a constant value
data["dense"] = 0

# Filter rows with above average pop density and update the column dense
data.loc[data["pop_density_km2"] > data["pop_density_km2"].mean(), "dense"] = 1

# Check number of rows per category
data.dense.value_counts()

0    86
1    44
Name: dense, dtype: int64

Now we have a new column with value 1 indicating above average population density which we can use for dissolving the data into two groups using the .dissolve() funcition. At the same time, we can sum up the population and area columns valuens using the aggfunc parameter. The aggregation requires that we do a selection of the numerical columns we want to include in the output.

# Conduct the aggregation
dissolved = data[["pop2019", "area_km2", "dense", "geometry"]].dissolve(
    by="dense", aggfunc="sum"
)

# Check the result
dissolved

	geometry	pop2019	area_km2
dense
0	MULTIPOLYGON (((614108.230 3339640.551, 614288...	368992.0	231.131494
1	MULTIPOLYGON (((612263.531 3338931.800, 612265...	242943.0	71.234570

The dissolved data should have as many rows of data as there were unique values in the column - one row for each unique value. Our data have been compressed into two geometric objects and the column used for dissolving the data can now be found in the index. Attribute columns represent the sum of the values per group. We can reset the index and insert the categorical information into a new column after which we can do a quick visualization of the result.

dissolved = dissolved.reset_index()

dissolved.plot(column="dense")

plt.axis("off")
plt.show()

../../../_images/f1e2e1c1e573b2c3fdba47708068fb91e344ff6a2a873c0444662302f4ad633b.png

Figure 6.21. Dissolved census tract geometries.

Geometric data manipulations

Contents