 My class mate says this is adorable!!

Jesse Onland
2022-05-12

Introduction
I live near Toronto. It’s springtime, and currently about 30 °C. In my opinion, Toronto is too hot in the summer and too cold in the winter. I’d like to know which cities have the least deviation from a tolerable average temperature.

Data
First, I created a CSV file comprising all the information in the Wikpedia article List of cities by average temperature.

city_temps
rowwise() |>
transmute(
city,
avg=year,
range=max(c_across(jan:dec)) – min(c_across(jan:dec))
)

Algiers
17.4
14.0

Tamanrasset
21.7
16.1

Reggane
28.3
23.8

Luanda
25.8
6.5

Cotonou
27.2
3.3

Parakou
26.8
4.9

Summary Statistics
Now we can investigate the distribution of each of our two variables.

Here’s the default summary:

city_temps |>
select(avg, range) |>
summary()
## avg range
## Min. :-14.40 Min. : 0.70
## 1st Qu.: 12.45 1st Qu.: 5.65
## Median : 18.60 Median :12.10
## Mean : 18.00 Mean :13.75
## 3rd Qu.: 25.65 3rd Qu.:21.00
## Max. : 30.50 Max. :58.10
Which cities correspond to the extremes for each variable?

city_temps |>
filter(
avg %in% (city_temps |> pull(avg) |> range()) ||
range %in% (city_temps |> pull(range) |> range())
) |>
arrange(avg)

Gjoa Haven
-14.4
42.0

Yakutsk
-8.8
58.1

Honiara
26.5
0.7

Assab
30.5
8.7

Let’s see the values for Toronto as a baseline, and save them for later:

city_temps |>
filter(city==”Toronto”)

toronto_avg
filter(city==”Toronto”) |>
pull(avg)

toronto_range
filter(city==”Toronto”) |>
pull(range)
By global standards, Toronto is cool on average, but in keeping with my subjective perception, the deviation from that average over the year is quite large.

Plots
Let’s look at a scatter plot with marginal histograms:

plot
ggplot(aes(x=avg, y=range)) +
geom_point(alpha=0.33) +
geom_vline(
xintercept=toronto_avg,
linetype=”dashed”,
alpha=0.33
) +
geom_hline(
yintercept=toronto_range,
linetype=”dashed”,
alpha=0.33
) +
labs(
title=”Average Temperature vs Range by City”,
x=”Average Temperature (°C)”,
y=”Difference Between Hottest and Coldest Months (°C)”
) +
theme_tufte()

plot
ggMarginal(type=”histogram”, fill=”transparent”, size=10)

plot

Here Toronto is indicated by the dashed lines.

We can see there’s a negative association between a city’s average temperature and the range of temperatures experienced there. In particular, there’s a big cluster of very hot cities which have little difference between their hottest and coldest months.

Ten tropical cities fall into both the hottest decile and the least varying decile:

city_temps |>
filter(
range quantile(city_temps\$avg, 0.9)
) |>
select(city)

Lodwar

Palembang

Pontianak

Kuala Lumpur

Malé

Lanka Colombo

Panama City

Barranquilla

While these cities see very little temperature variation throughout the year, they are much too hot.

Zooming In
The area of this plot I’m most interested in is the vertical slice around Toronto. Let’s see the same plot, including only the cities within one degree of Toronto’s average temperature. We’ll exclude the marginal histograms but add labels to the cities.

city_temps |>
filter(abs(avg – toronto_avg)
ggplot(aes(x=avg, y=range, label=city)) +
geom_point(alpha=0.33) +
geom_text(size=4, nudge_x=0.01, hjust=”left”) +
geom_vline(
xintercept=toronto_avg,
linetype=”dashed”,
alpha=0.33
) +
geom_hline(
yintercept=toronto_range,
linetype=”dashed”,
alpha=0.33
) +
labs(
title=”Average Temperature vs Range by City (Detail 1)”,
x=”Average Temperature (°C)”,
y=”Difference Between Hottest and Coldest Months (°C)”
) +
theme_tufte()

So it seems that La Paz, Edinburgh, or Dublin might be good options.

But which cities are the best? These would be the ones with the smallest range for a given maximum average. Let’s find them.

Finding the Cities with the Nicest Weather
We want to know, for each maximum average temperature, the city that has the minimum range of temperatures. These are the cities that form the “bottom-left edge” of our first plot.

Nine cities fit this criterion:

city_temps |>
arrange(avg) |>
cbind(city_temps |> arrange(avg) |> pull(range) |> cummin()) |>
rename(running_min=4) |>
filter(range==running_min) |>
select(city)

Gjoa Haven

Dikson

Nuuk

Reykjavík

Stanley

La Paz

Cusco

Bogotá

Honiara

Of these, the first two have temperatures which are more variable than Toronto, so we can remove them from consideration.

Let’s plot the final seven candidates:

city_temps |>
arrange(avg) |>
cbind(city_temps |> arrange(avg) |> pull(range) |> cummin()) |>
rename(running_min=4) |>
filter(range==running_min) |>
select(-running_min) |>
filter(range
ggplot(aes(x=avg, y=range, label=city)) +
geom_point(alpha=0.33) +
geom_text(size=4, nudge_x=0.5, hjust=”left”) +
geom_vline(
xintercept=toronto_avg,
linetype=”dashed”,
alpha=0.33
) +
scale_x_continuous(expand=expansion(mult=0.15)) +
labs(
title=”Average Temperature vs Range by City (Detail 2)”,
x=”Average Temperature (°C)”,
y=”Difference Between Hottest and Coldest Months (°C)”
) +
theme_tufte()

Again we see that La Paz has a similar overall average temperature to Toronto, but much less annual variability. Cusco and Bogotá are warmer but even less variable.

Reykjavík and Stanley are colder than Toronto, and while they represent a smaller decrease in variability compared to La Paz, Cusco, and Bogotá, they have the benefit (for me) of being 98%+ English-speaking.

Nuuk and Honiara are right out.

Next Steps
It would be interesting to use detailed time series for each city and a utility function on temperatures (perhaps including wind chill and humidex) to determine which cities are truly mean-variance optimal.

Of course, one should probably not choose a place to live based solely on the weather.