Calculating on a pivot table
Pivot tables are filled with summary statistics, but they are only a first step to finding something insightful. Often you'll need to perform further calculations on them. A common thing to do is to find the rows or columns where the highest or lowest value occurs.
Recall from Chapter 1 that you can easily subset a Series or DataFrame to find rows of interest using a logical condition inside of square brackets. For example: series[series > value].
pandas is loaded as pd and the DataFrame temp_by_country_city_vs_year is available.
The .head() for this DataFrame is shown below, with only a few of the year columns displayed:
| country | city | 2000 | 2001 | 2002 | … | 2013 |
|---|---|---|---|---|---|---|
| Afghanistan | Kabul | 15.823 | 15.848 | 15.715 | … | 16.206 |
| Angola | Luanda | 24.410 | 24.427 | 24.791 | … | 24.554 |
| Australia | Melbourne | 14.320 | 14.180 | 14.076 | … | 14.742 |
| Sydney | 17.567 | 17.854 | 17.734 | … | 18.090 | |
| Bangladesh | span translate="no">Dhaka | 25.905 | 25.931 | 26.095 | … | 26.587 |
This exercise is part of the course
Data Manipulation with pandas
Exercise instructions
- Calculate the mean temperature for each year, assigning to
mean_temp_by_year. - Filter
mean_temp_by_yearfor the year that had the highest mean temperature. - Calculate the mean temperature for each city (across columns), assigning to
mean_temp_by_city. - Filter
mean_temp_by_cityfor the city that had the lowest mean temperature.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the worldwide mean temp by year
mean_temp_by_year = temp_by_country_city_vs_year.____
# Filter for the year that had the highest mean temp
print(mean_temp_by_year[____])
# Get the mean temp by city
mean_temp_by_city = temp_by_country_city_vs_year.____
# Filter for the city that had the lowest mean temp
print(mean_temp_by_city[____])