Creating subplots#

At this point you should know the basics of making plots with matplotlib. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this section, we will show how to visualize data using pandas and matplotlib and create multi-panel plots such as the one below.

Figure 4.10. An example of seasonal temperatures for 2012-2013 using pandas and Matplotlib.

Figure 4.10. An example of seasonal temperatures for 2012-2013 using pandas and Matplotlib.

Preparing the data for plotting#

We will start again by reading in the data file.

import pandas as pd
import matplotlib.pyplot as plt

fp = "data/029740.txt"

data = pd.read_csv(
    fp,
    sep=r"\s+",
    na_values=["*", "**", "***", "****", "*****", "******"],
    usecols=["YR--MODAHRMN", "TEMP", "MAX", "MIN"],
    parse_dates=["YR--MODAHRMN"],
    index_col="YR--MODAHRMN",
)

After reading the file, we can new rename the TEMP column as TEMP_F, since we will later convert our temperatures from Fahrenheit to Celsius.

new_names = {"TEMP": "TEMP_F"}
data = data.rename(columns=new_names)

At this point we can quickly check the first rows of data to see whether the expected changes have occurred.

data.head()
TEMP_F MAX MIN
YR--MODAHRMN
1952-01-01 00:00:00 36.0 NaN NaN
1952-01-01 06:00:00 37.0 NaN 34.0
1952-01-01 12:00:00 39.0 NaN NaN
1952-01-01 18:00:00 36.0 39.0 NaN
1952-01-02 00:00:00 36.0 NaN NaN

Next, we have to deal with no-data values. Let’s start by checking how many no-data values we have.

print("Number of no-data values per column: ")
print(data.isna().sum())
Number of no-data values per column: 
TEMP_F      3579
MAX       900880
MIN       900896
dtype: int64

So, there are 3579 missing values in the TEMP_F column and we should remove those. We need not worry about the no-data values in MAX and MIN columns since we will not use them for the plots produced below. We can remove rows from our DataFrame where TEMP_F is missing values using the .dropna() method.

data.dropna(subset=["TEMP_F"], inplace=True)
print("Number of rows after removing no data values:", len(data))
Number of rows after removing no data values: 928188

Question 4.2#

How many rows of data would remain if we removed all rows with any no-data values from our data (including no-data values in the MAX and MIN columns)? If you test this, be sure to save the modified DataFrame with another variable name or do not use the inplace parameter.

Hide code cell content
# Solution

len(data.dropna())
20433

Now that we have loaded the data, we can convert the temperature values from Fahrenheit to Celsius, like we have in earlier chapters.

data["TEMP_C"] = (data["TEMP_F"] - 32.0) / 1.8

We can once again now check the contents of our DataFrame.

data.head()
TEMP_F MAX MIN TEMP_C
YR--MODAHRMN
1952-01-01 00:00:00 36.0 NaN NaN 2.222222
1952-01-01 06:00:00 37.0 NaN 34.0 2.777778
1952-01-01 12:00:00 39.0 NaN NaN 3.888889
1952-01-01 18:00:00 36.0 39.0 NaN 2.222222
1952-01-02 00:00:00 36.0 NaN NaN 2.222222

Subplots#

Having processed and cleaned the data we can now continue working with it and learn how to create figures that contain subplots. Subplots are used to display multiple plots in different panels of the same figure, as shown at the start of this section (Figure 4.10).

We can start with creating the subplots by dividing the data in the data file into different groups. In this case we can divide the temperature data into temperatures for the four different seasons of the year. We can make the following selections:

  • Winter (December 2012 - February 2013)

  • Spring (March 2013 - May 2013)

  • Summer (June 2013 - August 2013)

  • Autumn (September 2013 - November 2013)

winter = data.loc[(data.index >= "201212010000") & (data.index < "201303010000")]
winter_temps = winter["TEMP_C"]

spring = data.loc[(data.index >= "201303010000") & (data.index < "201306010000")]
spring_temps = spring["TEMP_C"]

summer = data.loc[(data.index >= "201306010000") & (data.index < "201309010000")]
summer_temps = summer["TEMP_C"]

autumn = data.loc[(data.index >= "201309010000") & (data.index < "201312010000")]
autumn_temps = autumn["TEMP_C"]

Let’s have a look at the data from two different seasons to see whether the preceding step appears to have worked as expected.

ax1 = winter_temps.plot()
../../../_images/abfa59e24a22cfb583d3b5ac43ca527cfc8d65734a19c29dd02f5a8be3aeb948.png

Figure 4.11. Winter temperatures for 2012-2013.

ax2 = summer_temps.plot()
../../../_images/6b4b26e2c6ea52d7acbf4e2a8f3d8228b611387b7ded0bb8dfaff02bdb929718.png

Figure 4.12. Summer temperatures for 2012-2013.

Based on the plots above it looks that the correct seasons have been plotted and the temperatures between winter and summer are quite different, as we would expect. One thing we might need to consider with this is that the y-axis range currently varies between the two plots and we may want to define axis ranges that ensure the data are plotted with the same y-axis ranges in all subplots. This will help make it easier to visually compare the temperatures between seasons.

Finding the data bounds#

In order to define y-axis limits that will include the data from all of the seasons and be consistent between subplots we first need to find the minimum and maximum temperatures from all of the seasons. In addition, we should consider that it would be beneficial to have some extra space (padding) between the y-axis limits and those values, such that, for example, the maximum y-axis limit is five degrees higher than the maximum temperature and the minimum y-axis limit is five degrees lower than the minimum temperature. We can do that below by calculating the minimum of each seasons minimum temperature and subtracting five degrees.

# Find lower limit for y-axis
min_temp = min(
    winter_temps.min(), spring_temps.min(), summer_temps.min(), autumn_temps.min()
)
min_temp = min_temp - 5.0

# Find upper limit for y-axis
max_temp = max(
    winter_temps.max(), spring_temps.max(), summer_temps.max(), autumn_temps.max()
)
max_temp = max_temp + 5.0

# Print y-axis min, max
print(f"Minimum temperature: {min_temp}")
print(f"Maximum temperature: {max_temp}")
Minimum temperature: -35.0
Maximum temperature: 35.0

We can now use this temperature range to standardize the y-axis ranges of our plots.

Displaying multiple subplots in a single figure#

With the data split into seasons and y-axis range defined we can now continue to plot data from all four seasons the same figure. We will start by creating a figure containing four subplots in a two by two panel using the .subplots() function from matplotlib. In the .subplots() function, the user can specify how many rows and columns of plots they want to have in their figure. We can also specify the size of our figure with the figsize parameter that takes the width and height values (in inches) as input.

fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axs
array([[<Axes: >, <Axes: >],
       [<Axes: >, <Axes: >]], dtype=object)
../../../_images/9d5994cf4358947767f0cf35e7c05b44619214eddbe6809021c6d0ba72f1a9fa.png

Figure 4.13. Empty figure template with a 2x2 subplot panel.

We can see from the output of the code cell that we now have a list containing two nested lists, where the first nested list contains the axes for column 1 and 2 of row 1 and the second contains the axes for columns 1 and 2 of row 2.

To make it easier to keep track of things, we can parse these axes into their own variables as follows.

ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]

Now we have four different axis variables for the different panels in our figure. Next we can use these axes to plot the seasonal temperature data. We can start by plotting the data for the different seasons with different colors for each of the lines, and we can specify the y-axis limits to be the same for all of the subplots.

  • We can use the c parameter to change the color of the line. You can define colors using RBG color codes, but it is often easier to use one of the matplotlib named colors [1].

  • We can also change the line width or weight using the lw parameter.

  • The ylim parameter can be used to define the y-axis limits.

Putting all of this together in a single code cell we have the following:

# Create the figure and subplot axes
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

# Define variables to more easily refer to individual axes
ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]

# Set plot line width
line_width = 1.5

# Plot data
winter_temps.plot(ax=ax11, c="blue", lw=line_width, ylim=[min_temp, max_temp])
spring_temps.plot(ax=ax12, c="orange", lw=line_width, ylim=[min_temp, max_temp])
summer_temps.plot(ax=ax21, c="green", lw=line_width, ylim=[min_temp, max_temp])
autumn_temps.plot(ax=ax22, c="brown", lw=line_width, ylim=[min_temp, max_temp])

# Display the plot
# Note: This is not required, but suppresses text from being printed
# in the output cell
plt.show()
../../../_images/cf094d4c9a3ae5b39fd738305ca9915d86cf656014a33a8be4c043f0e55ab4dd.png

Figure 4.14. Seasonal temperatures for 2012-2013 plotted in a 2x2 panel.

Great, now we have all the plots in same figure! However, we can see that there are some problems with our x-axis labels and a few other missing plot items we should add.

Let’s recreate the plot and make some improvements. In this version of the plot we will:

  • Modify the x- and y-axis labels using the xlabel and ylabel parameters in the .plot() function.

  • Enable grid lines on the plot using the grid=True parameter for the .plot() function.

  • Add a figure title using the fig.suptitle() function.

  • Rotate the x-axis labels using the plt.setp() function.

  • Add a text label for each plot panel using the .text() function.

# Create the figure and subplot axes
fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

# Define variables to more easily refer to individual axes
ax11 = axs[0][0]
ax12 = axs[0][1]
ax21 = axs[1][0]
ax22 = axs[1][1]

# Set plot line width
line_width = 1.5

# Plot data
winter_temps.plot(
    ax=ax11,
    c="blue",
    lw=line_width,
    ylim=[min_temp, max_temp],
    ylabel="Temperature [°C]",
    grid=True,
)
spring_temps.plot(
    ax=ax12, c="orange", lw=line_width, ylim=[min_temp, max_temp], grid=True
)
summer_temps.plot(
    ax=ax21,
    c="green",
    lw=line_width,
    ylim=[min_temp, max_temp],
    xlabel="Date",
    ylabel="Temperature [°C]",
    grid=True,
)
autumn_temps.plot(
    ax=ax22,
    c="brown",
    lw=line_width,
    ylim=[min_temp, max_temp],
    xlabel="Date",
    grid=True,
)

# Set figure title
fig.suptitle("2012-2013 Seasonal temperature observations" "- Helsinki-Vantaa airport")

# Rotate the x-axis labels so they don't overlap
plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)

# Season label text
ax11.text(pd.to_datetime("20130215"), -25, "Winter")
ax12.text(pd.to_datetime("20130515"), -25, "Spring")
ax21.text(pd.to_datetime("20130815"), -25, "Summer")
ax22.text(pd.to_datetime("20131115"), -25, "Autumn")

# Display the figure
plt.show()
../../../_images/55a7ec211a1949989dcc0cfb304d24d125e109efe1a4cb21301a802f1ee1cca6.png

Figure 4.15. Seasonal temperatures for 2012-2013 plotted with season names and gridlines visible.

The new version of the figure essentially conveys the same information as the first version, but the additional plot items help to make it easier to see the plot values and immediately understand the data being presented. Not bad.

Question 4.3#

Visualize only the winter and summer temperatures in a one by two panel figure. Save the resulting figure as a .png file.

Hide code cell content
# Solution

# Create the new figure and subplots
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))

# Rename the axes for ease of use
ax1 = axs[0]
ax2 = axs[1]

# Set plot line width
line_width = 1.5

# Plot data
winter_temps.plot(
    ax=ax1,
    c="blue",
    lw=line_width,
    ylim=[min_temp, max_temp],
    xlabel="Date",
    ylabel="Temperature [°C]",
    grid=True,
)
summer_temps.plot(
    ax=ax2,
    c="green",
    lw=line_width,
    ylim=[min_temp, max_temp],
    xlabel="Date",
    grid=True,
)

# Set figure title
fig.suptitle(
    "2012-2013 Winter and summer temperature observations" "- Helsinki-Vantaa airport"
)

# Rotate the x-axis labels so they don't overlap
plt.setp(ax1.xaxis.get_majorticklabels(), rotation=20)
plt.setp(ax2.xaxis.get_majorticklabels(), rotation=20)

# Season label text
ax1.text(pd.to_datetime("20130215"), -25, "Winter")
ax2.text(pd.to_datetime("20130815"), -25, "Summer")

plt.show()
../../../_images/483ce00afa9381361bfbd483d35a5533f41678acf06de8b4ffadee442fd90cc2.png

Footnotes#