Representing geographic data in vector format#

In this section, you will learn how geometric objects, such as Point, LineString and Polygon, are represented on a computer in vector format. Creating and representing vector-based geometric objects is most commonly done using the shapely [1] library which is one of the fundamental libraries in Python GIS ecosystem when working with geographic data. Thus, basic knowledge of shapely is highly useful when using higher-level tools that depend on it, such as geopandas, which we will use extensively in the following sections of this book for geographic data analysis.

Under the hood shapely uses a C++ library called GEOS [2] to construct the geometries. GEOS is one of the standard libraries behind various GIS software, such as PostGIS [3] and QGIS [4]. Objects and methods available in shapely adhere mainly to the Open Geospatial Consortium’s Simple Features Access Specification [5] (see Chapter 5.2), making them compatible with various GIS tools. Next, we will give a quick overview of how to create different kinds of geometries using shapely that can then be used in geographical data analysis and computation.

Creating Point geometries#

When creating geometries with shapely, we first need to import the geometric object class which we want to create, such as Point, LineString or Polygon. Let’s start by creating a simple Point object. First, we need to import the Point class which we can then use to create the point geometry. When creating the geometry, we need to pass the x and y coordinates (with a possible z -coordinate) into the Point() -class constructor, which will create the point geometry for us, as follows:

from shapely import Point

point = Point(2.2, 4.2)
point3D = Point(9.26, -2.456, 0.57)
point
../../../_images/e32cd8f99dad3181abe54558bec3343c07326535ae3840b0547bebefe82994d5.svg

Figure 6.1. A visual representation of a Point geometry.

As we can see in the online version, Jupyter Notebook is able to visualize the point shape on the screen. This point demonstrates a very simple geographic object that we can start using in geographic data analysis. Notice that without information about a coordinate reference system (CRS) attached to the geometry, these coordinates are ultimately just arbitrary numbers that do not represent any specific location on Earth. We will learn later in the book, how it is possible to specify a CRS for a set of geometries.

We can use the print() command to get the text representation of the point geometry as Well Known Text (WKT) [6]. In the output, the letter Z after the POINT indicates that the geometry contains coordinates in three dimensions (x, y, z):

print(point)
print(point3D)
POINT (2.2 4.2)
POINT Z (9.26 -2.456 0.57)

It is also possible to access the WKT character string representation of the geometry using the .wkt attribute:

point.wkt
'POINT (2.2 4.2)'

Points and other shapely objects have many useful built-in attributes and methods for extracting information from the geometric objects, such as the coordinates of a point. There are different approaches for extracting coordinates as numerical values from the shapely objects. One of them is a property called .coords. It returns the coordinates of the point geometry as a CoordinateSequence which is a dedicated data structure for storing a list of coordinates. For our purposes, we can convert the .coords into a list that makes the values visible and make it easy to access the contents:

type(point.coords)
shapely.coords.CoordinateSequence
list(point.coords)
[(2.2, 4.2)]

It is also possible to access the coordinates directly using the x and y properties of the Point object:

print(point.x)
print(point.y)
2.2
4.2

For a full list of general attributes and methods for shapely objects, see shapely documentation [1]. For example, it is possible to calculate the Euclidean distance between points, or to create a buffer polygon for the point object. All of these attributes and methods can be accessed via the geopandas library, and we will go through them later in the book.

Creating LineString geometries#

Creating a LineString -object is very similar to creating a Point-object. To create a LineString, we need at least two points that are connected to each other, which thus constitute a line. We can construct the line using either a list of Point-objects or pass the point coordinates as coordinate-tuples to the LineString constructor:

from shapely import Point, LineString

point1 = Point(2.2, 4.2)
point2 = Point(7.2, -25.1)
point3 = Point(9.26, -2.456)

line = LineString([point1, point2, point3])
line_from_tuples = LineString([(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)])
line
../../../_images/0dd31ba3b4a057207acbb77864aa8a612f4e0618c9d38c1859f6940fa0b4ba45.svg

Figure 6.2. A visual representation of a LineString geometry.

line.wkt
'LINESTRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456)'

As we can see, the WKT representation of the line variable consists of multiple coordinate-pairs. LineString -objects have many useful built-in attributes and methods similarly as Point -objects. It is for instance possible to extract the coordinates, calculate the length of the LineString, find out the centroid of the line, create points along the line at specific distance, calculate the closest distance from a line to specified point, or simplify the geometry. See the shapely documentation [1] for full details. Most of these functionalities are directly implemented in geopandas that will be introduced in the next chapter. Hence, you seldom need to parse this information directly from the shapely geometries yourself. However, here we go through a few of them for reference. We can extract the coordinates of a LineString similarly as with Point:

list(line.coords)
[(2.2, 4.2), (7.2, -25.1), (9.26, -2.456)]

As a result, we have a list of coordinate tuples (x,y) inside a list. If you need to access all x -coordinates or all y -coordinates of the line, you can do it directly using the xy attribute:

xcoords = list(line.xy[0])
ycoords = list(line.xy[1])

print(xcoords)
print(ycoords)
[2.2, 7.2, 9.26]
[4.2, -25.1, -2.456]

It is possible to retrieve specific attributes such as length of the line and the center of the line (centroid) straight from the LineString object itself:

length = line.length
centroid = line.centroid
print(f"Length of our line: {length:.2f} units")
print(f"Centroid: {centroid}")
Length of our line: 52.46 units
Centroid: POINT (6.229961354035622 -11.892411157572392)

As you can see, the centroid of the line is again a shapely.geometry.Point object. This is useful, because it means that you can continue working with the line centroid having access to all of the methods that come with the shapely Point object.

Creating Polygon geometries#

Creating a Polygon -object continues the same logic as when creating Point and LineString objects. A Polygon can be created by passing a list of Point objects or a list of coordinate-tuples as input for the Polygon class. Polygon needs at least three coordinate-tuples to form a surface. In the following, we use the same points from the earlier LineString example to create a Polygon.

from shapely import Polygon

poly = Polygon([point1, point2, point3])
poly
../../../_images/da6cc4b41d5635608d8abef45299f288edcded636e47bdb44f966e42a049139d.svg

Figure 6.3. A visual representation of a Polygon geometry.

poly.wkt
'POLYGON ((2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2))'

Notice that the Polygon WKT representation has double parentheses around the coordinates (i.e. POLYGON ((<values in here>)) ). The current set of coordinates represents the outlines of the shape, i.e. the exterior of the polygon. However, a Polygon can also contain an optional interior rings, that can be used to represent holes in the polygon. You can get more information about the Polygon object by running help(poly) of from the shapely online documentation [7]. Here is a simplified extract from the output of help(Polygon):

class Polygon(shapely.geometry.base.BaseGeometry)
 |  Polygon(shell=None, holes=None)
 |  
 |  A two-dimensional figure bounded by a linear ring
 |  
 |  A polygon has a non-zero area. It may have one or more negative-space
 |  "holes" which are also bounded by linear rings. If any rings cross each
 |  other, the feature is invalid and operations on it may fail.
 |  
 |  Attributes
 |  ----------
 |  exterior : LinearRing
 |      The ring which bounds the positive space of the polygon.
 |  interiors : sequence
 |      A sequence of rings which bound all existing holes.
 |  
 |  Parameters
 |  ----------
 |  shell : sequence
 |     A sequence of (x, y [,z]) numeric coordinate pairs or triples.
 |     Also can be a sequence of Point objects.
 |  holes : sequence
 |      A sequence of objects which satisfy the same requirements as the
 |      shell parameters above

If we want to create a polygon with a hole, we can do this by using parameters shell for the exterior and holes for the interiors as follows. Notice that because a Polygon can have multiple holes, the holes_coordinates variable below contains nested square brackets ([[ ]]), which is due to the possibility of having multiple holes in a single Polygon. First, let’s define the coordinates for the exterior and interior rings:

# Define the exterior coordinates
exterior = [(-180, 90), (-180, -90), (180, -90), (180, 90)]

# Define the hole coordinates (a single hole in this case)
holes_coordinates = [[(-170, 80), (-170, -80), (170, -80), (170, 80)]]

The attribute exterior contains the x and y coordinates of all the corners of the polygon as a list of tuples. For instance, the first tuple (-180, 90) contains coordinates for the top-left corner of the polygon. Similarly, the holes_coordinates variable contains the corner coordinates of a single polygon (inside the nested list) which will represent a single hole within our Polygon.

With the four coordinate tuples of the exterior, we can first create a polygon without a hole:

poly_without_hole = Polygon(shell=exterior)
poly_without_hole
../../../_images/0c8eee59459dc7e63cbad9080b1bbd42471bf15baf616f167bc1dc1e657169ef.svg

Figure 6.4. A Polygon geometry (exterior).

In a similar manner, we can make a Polygon with holes by passing the holes_coordinates variable into the parameter holes:

poly_with_hole = Polygon(shell=exterior, holes=holes_coordinates)
poly_with_hole
../../../_images/6f6100c83c6e9e25a08a33f0c4e527c1191df18d9e6d73b7081a96ba6cd3a8d5.svg

Figure 6.5. A Polygon geometry with a hole inside (exterior and interior).

As we can see, now the Polygon contains a large hole, and the actual geometry is located at the borders, resembling a picture frame. Let’s also take a look how the WKT representation of the polygon looks like (from running poly_with_hole.wkt):

POLYGON ((-180 90, -180 -90, 180 -90, 180 90, -180 90),
         (-170 80, -170 -80, 170 -80, 170 80, -170 80))

As we can see the Polygon has now two different tuples of coordinates. The first one represents the outer ring and the second one represents the inner ring, i.e. the hole.

There are many useful attributes and methods related to shapely Polygon, such as area, centroid, bounding box, exterior, and exterior-length. For full details, see the shapely documentation [1]. These attributes and methods are also available when working with polygon data in geopandas. Here are a couple of useful polygon attributes to remember:

print("Polygon centroid: ", poly.centroid)
print("Polygon Area: ", poly.area)
print("Polygon Bounding Box: ", poly.bounds)
print("Polygon Exterior: ", poly.exterior)
print("Polygon Exterior Length: ", poly.exterior.length)
Polygon centroid:  POINT (6.22 -7.785333333333334)
Polygon Area:  86.789
Polygon Bounding Box:  (2.2, -25.1, 9.26, 4.2)
Polygon Exterior:  LINEARRING (2.2 4.2, 7.2 -25.1, 9.26 -2.456, 2.2 4.2)
Polygon Exterior Length:  62.16395199996553

Notice, that the length and area information are presented here based on the units of the input coordinates. In our case, the coordinates actually represent longitude and latitude values. Thus, the length and area are represented as decimal degrees in this case. We can turn this information into a more sensible format (such as meters or square meters) when we start working with data in a projected coordinate system.

Box polygons that represent the minimum bounding box of given coordinates are useful in many applications. shapely.box can be used for creating rectangular box polygons based on on minimum and maximum x and y coordinates that represent the coordinate information of the bottom-left and top-right corners of the rectangle. Here we will use shapely.box to re-create the same polygon exterior.

from shapely.geometry import box

min_x, min_y = -180, -90
max_x, max_y = 180, 90
box_poly = box(minx=min_x, miny=min_y, maxx=max_x, maxy=max_y)
box_poly
../../../_images/bbbaabd9c9764a5c6312b184edfec6ee149aaf885d0c82c66c70117446765f2f.svg

Figure 6.6. A Polygon geometry created with the box helper class.

box_poly.wkt
'POLYGON ((180 -90, 180 90, -180 90, -180 -90, 180 -90))'

In practice, the box function is quite useful, for example, when you want to select geometries from a specific area of interest. In such cases, you only need to find out the coordinates of two points on the map (bottom-left and top-righ corners) to be able create the bounding box polygon.

Creating MultiPoint, MultiLineString and MultiPolygon geometries#

Creating a collection of Point, LineString or Polygon objects is very straightforward now as you have seen how to create the basic geometric objects. In the Multi -versions of these geometries, you just pass a list of points, lines or polygons to the MultiPoint, MultiLineString or MultiPolygon constructors as shown below:

from shapely import MultiPoint, MultiLineString, MultiPolygon

multipoint = MultiPoint([Point(2, 2), Point(3, 3)])
multipoint
../../../_images/5a04d2580c39a24d4e6daa0ac3fc9dbac6ac9233d21d09f7a4bd6569b2262c2e.svg

Figure 6.7. A MultiPoint geometry consisting of two Point objects.

multiline = MultiLineString(
    [LineString([(2, 2), (3, 3)]), LineString([(4, 3), (6, 4)])]
)
multiline
../../../_images/41d98b21fe358b313b3278412bd8f396115daa33c290a8772068f937da6f8942.svg

Figure 6.8. A MultiLineString geometry consisting of two LineString objects.

multipoly = MultiPolygon(
    [Polygon([(0, 0), (0, 4), (4, 4)]), Polygon([(6, 6), (6, 12), (12, 12)])]
)
multipoly
../../../_images/f82cecad590ba7cabd2265d56f7ebc29926dbb25f84401d0cc8aee60fffa3240.svg

Figure 6.9. A MultiPolygon geometry consisting of two Polygon objects.

Question 6.1#

Create examples of these shapes using your shapely skills:

  • Triangle

  • Square

  • Circle

Hide code cell content
# Solution

# Triangle
Polygon([(0, 0), (2, 4), (4, 0)])

# Square
Polygon([(0, 0), (0, 4), (4, 4), (4, 0)])

# Circle (using a buffer around a point)
point = Point((0, 0))
point.buffer(1)
../../../_images/3db38bc027258752da41a2374aa88038256abfcf04fdc0c875817836807e1336.svg

Footnotes#