The National Electronic Injury Surveillance System (NEISS) collects information at a sample of hospitals across the United States about injuries associated with consumer products. Working with a subset of that data, I designed a method for visualizing paired categorical data that reveals key data points and encourages exploration.
Finding some interesting variables
I started by taking a look at the most recent data summary released by the NEISS, the 2015 Data Highlights. According to their estimates, Sports and Recreational Equipment was the product category associated with the most injuries in 2015.
Taking a closer look at that category in particular, I focused on the top 10 types of sports and recreation equipment (by number of injuries).
The data highlights didn’t provide anything more than summaries of estimated injuries by age and product category, so I dug into the raw data to explore other variables, including patient gender, diagnosis, body part affected, consumer product involved, incident locale, patient race and ethnicity, and treatment date.
I chose to focus on affected body part, as it seemed particularly relevant to the sports and recreation category, giving me a paired categorical dataset that looked something like this:
Looking at injured body part and sport/recreation equipment categories, there were some obvious questions to tackle with the visualization:
- Which sports were associated with the most injuries?
- What were the most frequently injured body parts?
- What is the distribution of injuries by body part and sport?
- Injuries of which body parts are most frequently associated with each sport?
- Which sport is responsible for the most injuries of each body part?
Creating a single visual treatment that addresses each question
To explore these questions, I created a hybrid chart by combining a traditional two-dimensional heat map matrix with a proportional bubble chart.
Each position in the matrix contains a circle, the area of which represents the number of injuries associated with a given body part and sport:
I used a grayscale gradient to reinforce the magnitude of each value primarily encoded by circle area, highlighting larger values and attenuating smaller values.
I added circles to represent row (body part) and column (sport) totals, and ordered them to facilitate visual comparison among categories. Since we’re not particularly good at judging the value of circle area, I explicitly annotated the top data points with their numerical values:
Then gave each sport a color, coloring the top data point for each sport to facilitate visual comparison within that category, and circling the top data point for each body part to facilitate visual comparison within that category.
The result is a sort of hybrid proportional bubble/matrix chart that reveals the general distribution of the data, with some extra layers of encoding to highlight key data points.
Of course, this approach is limited to fairly small datasets, and there are other known methods for visualizing paired categorical data. A stacked bar chart is a simple alternative, regular heat maps matrices are a decent option, and Robert Kosara and Caroline Ziemkiewicz developed parallel sets specifically for visualizing large and complex categorical datasets.
This approach does have its upsides, though, and besides…bubble charts are eye-catching.
Make your own proportional bubble chart
Notes
- When I say “sport”, I really mean “sport and recreation equipment”.
- The data used in this visualization should not be taken as representative of the US population, as it’s from only a sample of emergency rooms.