Introduction to geographic data objects in Python#

In this chapter, we will learn how geometric objects (vector) are represented in Python using a library called shapely [1]. In the following parts of this chapter, we will use a library called geopandas extensively which uses these shapely geometries to represent the geographic features in the data. Understanding how these geometric objects work and can be created in Python is extremely useful, because these objects are the fundamental buildings blocks that enable us doing geographic data analysis.

Representing vector geometries with shapely#

Shapely is a fundamental Python package for representing vector data geometries on a computer. Basic knowledge of shapely is important for using higher-level tools that depend on it, such as geopandas.

Under the hood shapely actually uses a C++ library called GEOS [2] to construct the geometries, which is one of the standard libraries behind various Geographic Information Systems (GIS) software, such as PostGIS [3] or QGIS [4]. Objects and methods available in shapely adhere mainly to the Open Geospatial Consortium’s Simple Features Access Specification [5] making them compatible with various GIS tools.

In this section, we give a quick overview of creating geometries using shapely. For a full list of shapely objects and methods, see the shapely user manual online [1].

Creating point geometries#

When creating geometries with shapely, we first need to import the geometric object class (such as Point) that we want to create from shapely.geometry which contains all possible geometry types. After importing the Point class, creating a point is easy: we just pass x and y coordinates into the Point() -class (with a possible z -coordinate) which will create the point for us:

from shapely.geometry import Point

point = Point(2.2, 4.2)
point3D = Point(9.26, -2.456, 0.57)

point
../../../_images/e32cd8f99dad3181abe54558bec3343c07326535ae3840b0547bebefe82994d5.svg

Figure 6.1. ADD PROPER FIGURE CAPTION!.

Jupyter Notebook is automatically able to visualize the point shape on the screen. We can use the print statement to get the text representation of the point geometry as Well Known Text (WKT) [6]. The letter Z indicates 3D coordinates.

print(point)
print(point3D)
POINT (2.2 4.2)
POINT Z (9.26 -2.456 0.57)

There are different approaches for extracting the coordinates of a Point as numerical values. The property the coords gives us to access the coordinates of the point geometry as a CoordinateSequence which data structure for storing a list of coordinates. For our purposes, we can convert the coords into a list to access its contents.

list(point.coords)
[(2.2, 4.2)]

We can also access the coordinates directly using the x and y properties of the Point object.

print(point.x)
print(point.y)
2.2
4.2

Points and other shapely objects have many useful built-in attributes and methods. See shapely documentation [1] for a full list. For example, it is possible to calculate the Euclidian distance between points, or to create a buffer polygon for the point object. However, all of these functionalities are integrated into geopandas and we will go through them later in the book.

Creating LineString geometries#

Creating LineString -objects is fairly similar to creating Point-objects. We need at least two points for creating a line. We can construct the line using either a list of Point-objects or pass the point coordiantes as coordinate-tuples to the LineString constructor.

from shapely.geometry import Point, LineString

point1 = Point(2.2, 4.2)
point2 = Point(7.2, -25.1)
point3 = Point(9.26, -2.456)

line = LineString([point1, point2, point3])
line_from_tuples = LineString([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])
line
../../../_images/0dd31ba3b4a057207acbb77864aa8a612f4e0618c9d38c1859f6940fa0b4ba45.svg

Figure 6.2. ADD PROPER FIGURE CAPTION!.

print(line)
LINESTRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456)

As we can see from above, the WKT representation of the line -variable constitutes of multiple coordinate-pairs. LineString -objects have many useful built-in attributes and methods similarly as Point -objects. It is for instance possible to extract the coordinates, calculate the length of the LineString, find out the centroid of the line, create points along the line at specific distance, calculate the closest distance from a line to specified Point, or simplify the geometry. A full list of functionalities can be read from shapely documentation [1]. Most of these functionalities are directly implemented in geopandas (see next chapter), hence you very seldom need to parse these information directly from the shapely geometries yourself. However, here we go through a few of them for reference. We can extract the coordinates of a LineString similarly as with Point:

list(line.coords)
[(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)]

As a result, we have a list of coordinate tuples (x,y) inside a list. If you need to access all x -coordinates or all y -coordinates of the line, you can do it directly using the xy attribute:

xcoords = list(line.xy[0])
ycoords = list(line.xy[1])

print(xcoords)
print(ycoords)
[2.2, 7.2, 9.26]
[4.2, -25.1, -2.456]

It is possible to retrieve specific attributes such as length of the line and center of the line (centroid) straight from the LineString object itself:

length = line.length
centroid = line.centroid
print(f"Length of our line: {length:.2f} units")
print(f"Centroid: {centroid}")
Length of our line: 52.46 units
Centroid: POINT (6.229961354035622 -11.892411157572392)

As you can see, the centroid of the line is again a Shapely Point object. In practice, you would rarely access these attributes directly from individual shapely geometries, but we can do the same things for a set of geometries at once using geopandas.

Creating Polygon geometries#

Creating a Polygon -object continues the same logic of creating Point and LineString objects. A Polygon can be created by passing a list of Point objects or a list of coordinate-tuples as input for the Polygon class. Polygon needs at least three coordinate-tuples to form a surface. In the following, we use the same points from the earlier LineString example to create a Polygon.

from shapely.geometry import Polygon

poly = Polygon([point1, point2, point3])
poly
../../../_images/da6cc4b41d5635608d8abef45299f288edcded636e47bdb44f966e42a049139d.svg

Figure 6.3. ADD PROPER FIGURE CAPTION!.

print(poly)
POLYGON ((2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2))

Notice that the Polygon WKT representation has double parentheses around the coordinates (i.e. POLYGON ((<values in here>)) ). This is because Polygon can also have holes inside of it. A Polygon is constructed from exterior ring and and optiona interior ring, that can be used to represent a hole in the polygon. You can get more information about the Polygon object by running help(poly) of from the shapely online documentation [7]. Here is a simplified extract from the output of help(Polygon):

class Polygon(shapely.geometry.base.BaseGeometry)
 |  Polygon(shell=None, holes=None)
 |  
 |  A two-dimensional figure bounded by a linear ring
 |  
 |  A polygon has a non-zero area. It may have one or more negative-space
 |  "holes" which are also bounded by linear rings. If any rings cross each
 |  other, the feature is invalid and operations on it may fail.
 |  
 |  Attributes
 |  ----------
 |  exterior : LinearRing
 |      The ring which bounds the positive space of the polygon.
 |  interiors : sequence
 |      A sequence of rings which bound all existing holes.
 |  
 |  Parameters
 |  ----------
 |  shell : sequence
 |     A sequence of (x, y [,z]) numeric coordinate pairs or triples.
 |     Also can be a sequence of Point objects.
 |  holes : sequence
 |      A sequence of objects which satisfy the same requirements as the
 |      shell parameters above

If we want to create a Polygon with a hole, we can do this by using parameters shell for the exterior and holes for the interiors.

Let’s see how we can create a Polygon with a hole in it. Notice, that because a Polygon can have multiple holes, the hole_coords variable below contains nested square brackets ([[ ]]), which is due to the possibility of having multiple holes in a single Polygon. First, let’s define the coordinates for the exterior and interior rings.

# Define the exterior
exterior = [(-180, 90), (-180, -90), (180, -90), (180, 90)]

# Define the hole
hole = [[(-170, 80), (-170, -80), (170, -80), (170, 80)]]

Create the polygon with and without the hole:

poly_without_hole = Polygon(shell=exterior)
poly_without_hole
../../../_images/0c8eee59459dc7e63cbad9080b1bbd42471bf15baf616f167bc1dc1e657169ef.svg

Figure 6.4. ADD PROPER FIGURE CAPTION!.

poly_with_hole = Polygon(shell=exterior, holes=hole)
poly_with_hole
../../../_images/6f6100c83c6e9e25a08a33f0c4e527c1191df18d9e6d73b7081a96ba6cd3a8d5.svg

Figure 6.5. ADD PROPER FIGURE CAPTION!.

Let’s also check how the WKT representation of the polygon looks like.

print(poly_with_hole)
POLYGON ((-180 90, -180 -90, 180 -90, 180 90, -180 90), (-170 80, -170 -80, 170 -80, 170 80, -170 80))

As we can see the Polygon has now two different tuples of coordinates. The first one represents the outer ring and the second one represents the inner ring, i.e. the hole.

There are many useful attributes and methods related to shapely Polygon, such as area, centroid, bounding box, exterior, and exterior-length. For a full list, the shapely documentation [1]. Same attributes and methods are also available when working with polygon data in geopandas. Here are a couple of polygon attributes that are often useful when doing geographic data analysis.

print("Polygon centroid: ", poly.centroid)
print("Polygon Area: ", poly.area)
print("Polygon Bounding Box: ", poly.bounds)
print("Polygon Exterior: ", poly.exterior)
print("Polygon Exterior Length: ", poly.exterior.length)
Polygon centroid:  POINT (6.22 -7.785333333333334)
Polygon Area:  86.789
Polygon Bounding Box:  (2.2, -25.1, 9.26, 4.2)
Polygon Exterior:  LINEARRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2)
Polygon Exterior Length:  62.16395199996553

Notice, that the length and area information are presented here in decimal degrees because our input coordinates were passed as longitudes and latitudes. We can get this information in more sensible format (in meters and m2) when we start working with data in a projected coordinate system later in the book.

Box polygons that represent the minimum bounding box of given coordinates are useful in many applications. shapely.box can be used for creating rectangular box polygons based on on minimum and maximum x and y coordinates that represent the coordinate information of the bottom-left and top-right corners of the rectangle. Here we will use shapely.box to re-create the same polygon exterior.

from shapely.geometry import box

min_x, min_y = -180, -90
max_x, max_y = 180, 90
box_poly = box(minx=min_x, miny=min_y, maxx=max_x, maxy=max_y)
box_poly
../../../_images/bbbaabd9c9764a5c6312b184edfec6ee149aaf885d0c82c66c70117446765f2f.svg

Figure 6.6. ADD PROPER FIGURE CAPTION!.

print(box_poly)
POLYGON ((180 -90, 180 90, -180 90, -180 -90, 180 -90))

In practice, the box function is quite useful for example when you want to select geometries from a specific area of interest. In these cases, you only need to find out the coordinates of two points on the map to be able create the bounding box polygon.

Creating MultiPoint, MultiLineString and MultiPolygon geometries#

Creating a collection of Point, LineString or Polygon objects is very straightforward now as you have seen how to create the basic geometric objects. In the Multi -versions of these geometries, you just pass a list of points, lines or polygons to the MultiPoint, MultiLineString or MultiPolygon constructors as shown below:

from shapely.geometry import MultiPoint, MultiLineString, MultiPolygon

multipoint = MultiPoint([Point(2, 2), Point(3, 3)])
multipoint
../../../_images/5a04d2580c39a24d4e6daa0ac3fc9dbac6ac9233d21d09f7a4bd6569b2262c2e.svg

Figure 6.7. ADD PROPER FIGURE CAPTION!.

multiline = MultiLineString(
    [LineString([(2, 2), (3, 3)]), LineString([(4, 3), (6, 4)])]
)
multiline
../../../_images/41d98b21fe358b313b3278412bd8f396115daa33c290a8772068f937da6f8942.svg

Figure 6.8. ADD PROPER FIGURE CAPTION!.

multipoly = MultiPolygon(
    [Polygon([(0, 0), (0, 4), (4, 4)]), Polygon([(6, 6), (6, 12), (12, 12)])]
)
multipoly
../../../_images/f82cecad590ba7cabd2265d56f7ebc29926dbb25f84401d0cc8aee60fffa3240.svg

Figure 6.9. ADD PROPER FIGURE CAPTION!.

Question 6.1#

Create these shapes using Shapely!

  • Triangle

  • Square

  • Circle

# Solution

# Triangle
Polygon([(0, 0), (2, 4), (4, 0)])

# Square
Polygon([(0, 0), (0, 4), (4, 4), (4, 0)])

# Circle (using a buffer around a point)
point = Point((0, 0))
point.buffer(1)
../../../_images/71bc7c1a65c43f200556b28f610aa20ca637dcf97ec620ac4e5fad136d5a5b97.svg

Footnotes#