Diagram of penguin head with indication of bill length and bill depth.
The code for this python
example, has been adapted from an article by Ekta Sharma on the Palmer Penguins dataset.
import pandas as pd
import os
# add the R df into Python
penguins_df = r.penguin_df
Lets take a quick look at the data using describe()
, this will give us an indication of the variables and their respective values:
penguins_df[["species", "sex", "body_mass_g", "flipper_length_mm", "bill_length_mm"]].dropna().describe(include='all')
## species sex body_mass_g flipper_length_mm bill_length_mm
## count 333 333 333.000000 333.000000 333.000000
## unique 3 2 NaN NaN NaN
## top Adelie male NaN NaN NaN
## freq 146 168 NaN NaN NaN
## mean NaN NaN 4207.057057 200.966967 43.992793
## std NaN NaN 805.215802 14.015765 5.468668
## min NaN NaN 2700.000000 172.000000 32.100000
## 25% NaN NaN 3550.000000 190.000000 39.500000
## 50% NaN NaN 4050.000000 197.000000 44.500000
## 75% NaN NaN 4775.000000 213.000000 48.600000
## max NaN NaN 6300.000000 231.000000 59.600000
Lets take a closer look into the data.
Grouping the penguins according to species demonstrates a particular relationship between weight an flipper length, where Adelie female penguins appear to be the lightest and have the shortest flippers.
(penguins_df
.dropna()
.groupby(["species", "sex"])
.agg({"body_mass_g": "mean", "flipper_length_mm": "mean", "sex": "count"})
.sort_index()
)
## body_mass_g flipper_length_mm sex
## species sex
## Adelie female 3368.835616 187.794521 73
## male 4043.493151 192.410959 73
## Chinstrap female 3527.205882 191.735294 34
## male 3938.970588 199.911765 34
## Gentoo female 4679.741379 212.706897 58
## male 5484.836066 221.540984 61
It seems that the Gentoo is the largest penguin species. We can also take a closer look at their distribution along with the overall distribution:
larger = penguins_df[penguins_df.species=="Gentoo"].dropna()
larger
## species island bill_length_mm ... body_mass_g sex year
## 152 Gentoo Biscoe 46.1 ... 4500 female 2007
## 153 Gentoo Biscoe 50.0 ... 5700 male 2007
## 154 Gentoo Biscoe 48.7 ... 4450 female 2007
## 155 Gentoo Biscoe 50.0 ... 5700 male 2007
## 156 Gentoo Biscoe 47.6 ... 5400 male 2007
## .. ... ... ... ... ... ... ...
## 270 Gentoo Biscoe 47.2 ... 4925 female 2009
## 272 Gentoo Biscoe 46.8 ... 4850 female 2009
## 273 Gentoo Biscoe 50.4 ... 5750 male 2009
## 274 Gentoo Biscoe 45.2 ... 5200 female 2009
## 275 Gentoo Biscoe 49.9 ... 5400 male 2009
##
## [119 rows x 8 columns]
Let’s move on to some plots,this time using ggplot for visualising the overall distribution of body mass for the Gentoo species:
penguin_plot <- py$larger %>%
filter(!is.na(sex)) %>%
ggplot(aes(body_mass_g, fill = sex)) +
geom_density(color = "white", alpha = 0.5) +
scale_fill_manual(values = c("darkorange","purple")) +
labs(x = "Body Mass (g)")+
theme_minimal()
penguin_plot