[update: I’ve made some big improvements to this viz since, and posted instructions for reusing it. See “Rain Redux” for details]
Every now and then, I find myself wanting to figure out how likely rain is on a day I want to plan an event on. There’s an enormous amount of historical weather data online, but I haven’t been able to find anything in a format that’s convenient for just looking at and getting a sense of. Weather Underground has some nice graphics for temperature, but doesn’t show precipitation. Weatherbase shows precipitation per month, but I can’t find a way to break it down per day, which is kind of important around the beginning and end of summer.
After a particularly frustrating time staring at lists of numbers while planning an October event, I decided to make a better tool myself:
This averages the past 10 years of NOAA’s global weather data for Sea-Tac Airport, which is the nearest station with a really long history. The size of each day’s square represents the proportion of years in which there’s been any rain, snow, hail or thunderstorm on that day of the month. The darkness represents the median amount of rain, and if you click through for the interactive version you can see the actual numbers by pointing your mouse at any marker.
It’s important to remember that so far this is only based on the past 10 years of data, simply because each year has to be downloaded separately from a rather slow system. If other people find it useful, I’ll expand it to a few decades, because 10 years still has rather a lot of noise, and is nowhere near enough to be unbiased by El Niño and the Pacific Decadal Oscillation. If I do expand the range, I’ll have to be careful about how far to go back, on account of climate change.
- For my use at least, the most important thing is that I can quickly look across this and compare individual days or broader times of year. I had originally intended to make this as a single line graph, with a whole year across the x axis and the chance of rain each day on the y, but I found this view in Tableau much easier to read as a calendar.
- The occasional huge storm skews rainfall averages a lot, especially in November, so I used the median precipitation amount
- I get very frustrated with the weather forecasts and statistics that flag every day on which a drop of rain fell from the sky as a “rainy day” – that’s not a useful guide to whether I’d be able to hold an event outside or not. So I used the combination of size and colour to make the days on which it often rains but only a tiny amount visually almost disappear, while the days on which it regularly rains a significant amount are big clear marks.
- At the same time, I don’t really care about the difference between just enough rain to cancel a barbecue and an enormous downpour, so I have allowed the colour scale to saturate: anything above 0.12″ of rain in a day shows up as the darkest colour, even though the highest median is 0.38″ (for December 28).
I think this matches my intuitive feel for Seattle’s climate better than simply reporting the number of days on which it rained or the median amount of rain separately. What do you think?
That NOAA site allows me to download each year of data from each weather station as a separate file, in a rather odd format. I used Excel to string 10 years of data together into one file, and then loaded it into Tableau, which makes manipulating the data a lot easier. 3 fields still required ugly, hacky unpacking, at least one of which I’m pretty sure could be done in a better way:
- The dates are simply a YYYYMMDD string, so I had to write a function to take those apart into a year, month and day and then put those back together as a date in Tableau’s format. The catch is that I haven’t yet figured out how to make it display the month names instead of numbers – when I try, the first 12 days of each month disappear. I think the software is confusing them with months, and it might be better if I wrote a script outside Tableau to take care of this.
- The rainfall amounts have a one-letter flag giving quite important information about their completeness and reliability. This is very useful, but in their infinite wisdom NOAA concatenated that onto the numbers. Fortunately Tableau makes it quite easy to separate off the last character, and then I wrote a long nested-IF function to keep, adjust or discard each number based on that flag.
- There is a single column that concatenates true/false data (as zeroes and ones) for whether each day had fog, rain, snow, hail, thunderstorms or tornadoes. I got stuck into taking that apart character by character before realising that it’s effectively a binary encoding. I bet this could be decomposed in a tidier way than I’ve done it.
The good news is that once I had all of this set up, I could easily add more years (or replace the raw data entirely to make this for another location), and the conversions were automatic. If only I could find a source that lets me download multiple years in a single file, updates would be really easy.