Jan 14, 2021

Interpreting Spatial Data in the Age of COVID-19

As 2020 has come to an end, many are eager to leave the mess of COVID-19 behind with the new year and gain a fresh start. Unfortunately, new cases are still soaring across the United States and, even with vaccines, pre-COVID life is likely to remain elusive through much of 2021. Luckily, there are many tools available to track the spread of COVID-19 while we wait for normalcy. Health organizations, major news outlets, and even search engines have COVID-19 dashboards that are updated daily. Despite displaying the same data each dashboard uses its own blend of data aggregation and visualization technique, some of which are easier to digest than others. This post will review spatial visualization methods and give recommendations on how to interpret spatial data to get the most out of these dashboards.

1) Bubble Charts vs. Choropleths
A quick search of “covid cases” on any major search engine will return multiple dashboards that heavily rely on two map visualizations – bubble charts and choropleths. Bubble charts are just as they sound, a map that displays a circle or “bubble” centered over a geographic region and sized according to that region’s statistic. Choropleths are fundamentally different, relying on coloring regions based on a range of numeric values displayed in a legend on the side. Choropleths are usually criticized for their geographic bias – it’s much easier to read the number of COVID cases in larger states like California than for smaller ones like Rhode Island. When interpreting choropleths, it’s important to remember the eye is naturally drawn to larger regions and smaller regions deserve equal scrutiny. Bubble charts get around this issue by placing the focus on the size of the bubble, not the underlying region with which the bubble is associated. However, a quick search of “covid cases” on Google will reveal bubble charts’ major flaw – without altering the scale of data to fit the visualization (a process called normalization), the bubbles can easily overlap and make the visualization largely unreadable. Most COVID-19 dashboards provide raw data without normalization, which often leads to a mess of overlapping bubbles like the picture below. Because of this issue, avoid dashboards that rely heavily on bubble charts in favor of more readable chloropleths.

Bubble chart from Google’s COVID-19 dashboard.
Choropleth map from searching “covid cases” on Yahoo.com

2) Geographic boundaries are often arbitrary
As was hinted above, the underlying geography represented in a spatial visualization can affect how to interpret the data. Spatial data is often tied to arbitrary geographic boundaries. When considering cumulative COVID-19 cases per region as seen in the maps above, it would also be useful to account for the population size and density per region. However, each state varies in both geographic and population sizes. It would make more sense to create new boundaries for the map so each region encompasses equal population sizes, but since US COVID-19 data is broken up by counties and states this is nearly impossible. Instead, the data can be displayed as a rate of COVID cases per 100,000 population. John’s Hopkin’s dashboard of the United States has several maps that display cases per population at the county level. Just remember that not every county is the same size geographically, which can lead to a bias toward larger counties during interpretation.

3) Temporal aggregations are important
When looking through dashboard views it’s important to keep in mind the aggregations of statistics by time. Most dashboards have a default view of the cumulative COVID-19 cases by region, but also provide more statistics aggregated to the last day, week, or two weeks to show recent trends. However, these dashboards typically don’t have the same aggregations and that can make a huge difference in display. For example, John’s Hopkin’s dashboard has a map of new cases by population for the last day, the CDC provides a map of new cases by population over the last seven days, and Google’s dashboard provides an aggregation for the last two weeks. Depending on the trends in a specific location, these can provide drastically different maps. Data of COVID-19 cases within the last day can be biased from collection and reporting procedures – not all counties report data as quickly as others. This can distort the map and cause drastic day to day changes. Aggregating counts to the past 7 or 14 days account for this issue but have their own nuances for interpretation. A surge in cases one week will lag in the two week display, but a one week surge doesn’t indicate the emergence of a long-term trend. None of these aggregations are wrong, and each have their own merits, so keep in mind how the data is aggregated when making interpretations.

New cases by population over the past day by John’s Hopkin’s

4) Keep your original question(s) in mind
It is easy to get lost in the multitude of visualizations that make up most COVID-19 dashboards and forget why you went to the page in the first place. Are you interested in the risk of getting COVID-19 from going into the office, or visiting friends or family? Or perhaps you want to view nation-wide trends in cases over the past few months? Keeping the original question in mind can help narrow down the search for the visualization that will help you the most. To understand your local risk, a cumulative case map aggregating to the state level will not be nearly as helpful as one that aggregates to the county level over the past two weeks. For nation-wide trends, many dashboards include state level histograms that record cases over time which are more useful than a map of cumulative cases – just remember these visualizations are still impacted by geographic bias. Whatever your question, keep it in mind so you can filter out the visualizations that are not useful to you.

5) Ask what a visualization does not provide
A useful tip for finding the right visualization might be to ask what information it does not provide. A map of cumulative cases does not indicate the date of a region’s first recorded case, leaving out information on the intensity of the pandemic by location. For example, COVID-19 will have a much different impact on two states with equal current case counts if one recorded a steady number of cases over a greater period of time than another that had a sharp increase over the past few months. Asking what a visualization doesn’t provide can contextualize its limitations and make sure you are getting the information you are most interested in.

Hopefully you will find these recommendations useful as we wait for this pandemic to end. Stay safe and healthy!

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]