
Pandas¶
Tomas Beuzen, September 2020
These exercises complement Chapter 7.
Exercises¶
1.¶
In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.
Start by importing pandas with the alias pd.
# Your answer here.
2.¶
The dataset we’ll be working with has the following columns:
column |
description |
|---|---|
country |
Country Name |
food_category |
Food Category |
consumption |
Consumption (kg/person/year) |
co2_emmission |
Co2 Emission (Kg CO2/person/year) |
Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv
# Your answer here.
6.¶
How many different kinds of foods are there in the dataset? How many countries are in the dataset?
# Your answer here.
7.¶
What is the maximum co2_emmission in the dataset and which food type and country does it belong to?
# Your answer here.
8.¶
How many countries produce more than 1000 Kg CO2/person/year for at least one food type?
# Your answer here.
11.¶
What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?
# Your answer here.
12.¶
What is the total emissions of all other (non-meat) products in the dataset combined?
# Your answer here.
Solutions¶
1.¶
In this set of practice exercises we’ll be investigating the carbon footprint of different foods. We’ll be leveraging a dataset compiled by Kasia Kulma and contributed to R’s Tidy Tuesday project.
Start by importing pandas with the alias pd.
import pandas as pd
2.¶
The dataset we’ll be working with has the following columns:
column |
description |
|---|---|
country |
Country Name |
food_category |
Food Category |
consumption |
Consumption (kg/person/year) |
co2_emmission |
Co2 Emission (Kg CO2/person/year) |
Import the dataset as a dataframe named df from this url: https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv
url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv"
df = pd.read_csv(url)
df
| country | food_category | consumption | co2_emmission | |
|---|---|---|---|---|
| 0 | Argentina | Pork | 10.51 | 37.20 |
| 1 | Argentina | Poultry | 38.66 | 41.53 |
| 2 | Argentina | Beef | 55.48 | 1712.00 |
| 3 | Argentina | Lamb & Goat | 1.56 | 54.63 |
| 4 | Argentina | Fish | 4.36 | 6.96 |
| ... | ... | ... | ... | ... |
| 1425 | Bangladesh | Milk - inc. cheese | 21.91 | 31.21 |
| 1426 | Bangladesh | Wheat and Wheat Products | 17.47 | 3.33 |
| 1427 | Bangladesh | Rice | 171.73 | 219.76 |
| 1428 | Bangladesh | Soybeans | 0.61 | 0.27 |
| 1429 | Bangladesh | Nuts inc. Peanut Butter | 0.72 | 1.27 |
1430 rows × 4 columns
4.¶
What is the type of data in each column of df?
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1430 entries, 0 to 1429
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 1430 non-null object
1 food_category 1430 non-null object
2 consumption 1430 non-null float64
3 co2_emmission 1430 non-null float64
dtypes: float64(2), object(2)
memory usage: 44.8+ KB
6.¶
How many different kinds of foods are there in the dataset? How many countries are in the dataset?
print(f"There are {df['food_category'].nunique()} foods.")
print(f"There are {df['country'].nunique()} countries.")
There are 11 foods.
There are 130 countries.
7.¶
What is the maximum co2_emmission in the dataset and which food type and country does it belong to?
df.iloc[df['co2_emmission'].idxmax()]
country Argentina
food_category Beef
consumption 55.48
co2_emmission 1712
Name: 2, dtype: object
8.¶
How many countries produce more than 1000 Kg CO2/person/year for at least one food type?
df.query("co2_emmission > 1000")
| country | food_category | consumption | co2_emmission | |
|---|---|---|---|---|
| 2 | Argentina | Beef | 55.48 | 1712.00 |
| 13 | Australia | Beef | 33.86 | 1044.85 |
| 57 | USA | Beef | 36.24 | 1118.29 |
| 90 | Brazil | Beef | 39.25 | 1211.17 |
| 123 | Bermuda | Beef | 33.15 | 1022.94 |
9.¶
Which country consumes the least amount of beef per person per year?
(df.query("food_category == 'Beef'")
.sort_values(by="consumption")
.head(1))
| country | food_category | consumption | co2_emmission | |
|---|---|---|---|---|
| 1410 | Liberia | Beef | 0.78 | 24.07 |
10.¶
Which country consumes the most amount of soybeans per person per year?
(df.query("food_category == 'Soybeans'")
.sort_values(by="consumption", ascending=False)
.head(1))
| country | food_category | consumption | co2_emmission | |
|---|---|---|---|---|
| 1010 | Taiwan. ROC | Soybeans | 16.95 | 7.63 |
11.¶
What is the total emissions of all the meat products (Pork, Poultry, Fish, Lamb & Goat, Beef) in the dataset combined?
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][df['food_category'].isin(meat)].sum()
74441.13
12.¶
What is the total emissions of all other (non-meat) products in the dataset combined?
meat = ['Poultry', 'Pork', 'Fish', 'Lamb & Goat', 'Beef']
df["co2_emmission"][~df['food_category'].isin(meat)].sum()
31927.98