The Fatality Analysis Reporting System (FARS) is a yearly published dataset from the US National Highway Traffic Safety Administration (NHTSA) that lists all reported traffic accident fatalities in the country along with a series of associated data, such as weather conditions, types of vehicles involved and whether there were drunk drivers.
1. Data cleanup:
The cleanup was quite easy and most of the code looks something like this:
accident.short <- subset(accident.short, !(CITY == 0000 | CITY == 9997 | CITY == 9898 | CITY == 9999))
In this case, all elements from the CITY variable that were 000, 9997, etc were removed because they represent incomplete or unknown data according to the FARS User Manual.
2. Asking the right questions:
Out of the +35K reported incidents, I was left with about half of that amount after the data cleanup. If you took a look at the User Manual then you'd realize that the report is actually quite large, and beyond the basic statistically relevant questions (which are actually answered here), there's a lot of narratives that can be explored, for example:
- What's the drunk-sober incident ratio?
- Is there a correlation between weather conditions and incidents? What types of weather are more prone to cause incidents?
- Is one time of the year "deadlier" than others?
- Can we look at a daily trend? What about a monthly trend?
- What types of vehicles are more frequently involved in incidents?
- What types of roads are more dangerous?
- Are some states more deadly? Could this have to do with specific state laws?
The list can go on and on, and it's really a matter of deciding what kind of stories we want to tell with this data. So far I've been trying to visually answer some of these questions.
This is an early-stage screenshot of a daily view of fatalities for all of 2015:
It's difficult to see a trend here, but I like the aesthetics of the line because it looks like a heartbeat signature, and that can be tied to a self explanatory metaphor.
Thinking about how to further explore the data, it seemed that using a scroll to dispatch events and load new data could be a good approach. I had never used scroll libraries but it was suggested that I use scrollMagic, which I must say was a bit difficult to understand and implement at first whilst using d3. I still have issues dispatching events when scrolling back up, because the SVG elements don't transition smoothly.
The picture right above shows a cleaner version of the visualization, with nicer looking axes and a better narrative. It's also easier to see the trend in fatalities, where August is the month where more people died according to the data.
This other view shows a disaggregation of incidents by type of weather (only 4 types shown here). Interestingly enough, clear weather conditions are by far the most prevalent conditions for incidents, although as expected, there's a small peak during the winter and transition months for rain and snow.
3. Some technical aspects
I've been also enjoying this project because it's been a technical challenge. It has made me feel more comfortable using d3.js, learning the new syntax for v4, and also managing some important programming practices. For instance, I created separate modules with functions that handle the data transformation. Other small details like rescaling SVG paths from daily to monthly data points, posed some unexpected difficulties. To look deeper into my code you can check the repo here, and a live demo here.
4. Final thoughts