COVIDeracy: Data Literacy in Uncertain Times
At CTData, we are data people and believe data is for everyone.
We know access to reliable and trustworthy resources is essential to increasing data literacy and informing decision making, especially during a global health crisis. We also know unless you were trained as an epidemiologist, it can be difficult to understand how to read and interpret the information being disseminated.
Over the past couple of months, we have been scraping local and national COVID-19 data, staying up to date on resources, and following policy developments to try to understand what is happening. Even for us, it can sometimes feel overwhelming—and we are paid and trained to do this! We hope that by breaking down key questions to consider and nuances in the data, that you will feel more informed and less confused, too.
The questions below provide guidance on what to ask yourself as you explore COVID-19 reports and visualizations. These questions are helpful to ask during any data project, not just during a global pandemic or with health-related data.
What is the purpose of the report or visualization?
CTData is a data intermediary with the mission to make public data available and accessible. We typically take existing public data and visualize it to make it easier for users to understand, regardless of their background or familiarity with the subject matter. Throughout this process, our role is to remain neutral rather than to promote certain programs or policies, but visualizing information has many benefits, which include:
Sharing information so people can make their own decisions.
Persuading readers to see a key metric or insight.
Evaluating a program or policy.
Connecticut Department of Public Health (CT DPH) and other related public data about COVID-19 can be found on the state’s open data portal. You can find CTData visualizations about COVID-19 trends over time, demographics, and economic impact online here.
Who funded and who created the deliverable, and how might that influence the results?
We all view the world from a different lens shaped by our own experiences. While we like to think that all data work is unbiased, the reality is that humans make decisions and our unique lens influences each decision. To help tease out bias, we must ask ourselves:
What is the mission of the funder or the organization who developed the work?
Does the funder or partner have an advocacy agency they are working to advance?
What related content knowledge, expertise, and/or previous experience does the funder or partner have?
What information has been included and what is missing?
The rapidly changing, high-stakes environment of COVID-19 has highlighted the importance of understanding what variables a dataset includes and what the constraints are for each data source. Questions we have been asking ourselves about the COVID-19 data include:
What variables are included in the data set?
The first CT DPH report on March 21, 2020 originally included the number of tests completed, confirmed cases, hospitalizations, and deaths. Today, thanks to feedback from partners and the great work of CT DPH, confirmed cases and deaths are also available by age group, gender, race, and ethnicity. Data is also available for confirmed cases and deaths by nursing home across the state.
At what level of geography is each variable available?
Not all data points are currently publicly available at the state, county, and town levels, which impacts what claims can be made and how local leaders can use the information. For example, some users have asked to make data available for metro-areas, but this decision has not been approved at this time. Given the small number of cases in many Connecticut towns, the decision to withhold disaggregated data at the town level is likely to protect the identities of those impacted by COVID-19. As of May 6, 2020, data is publicly available by:
What data is missing?
Data currently reported by CT DPH only represents confirmed cases, hospitalizations, and deaths. Given a national shortage of testing available, it is likely that the current data represents undercounts of the number of cases in our communities. Only people who have been tested and found to be positive are included in the numbers. People who may be asymptomatic or not showing enough symptoms to be tested are not included in the data.
Reports from CT DPH include cases “pending address validation” and also cases and deaths with race and ethnicity reported as “unknown.” It is likely that the percentages of missing data are lower for deaths than confirmed cases since address, race, and ethnicity are required fields on CT death certificates. As of the May 3, 2020 CT DPH update:
1.2% (n=363) of confirmed cases were pending address validation
0.1% (n=3) of deaths were pending address validation
37.1% (n=11,123) of race and ethnicity for confirmed cases were unknown
2.7% (n=68) of race and ethnicity for deaths were unknown
How and when was the information collected and reported?
As mentioned previously, the data that is publicly available has changed over time, as well as how each metric is defined and reported. For example:
The data is constantly changing, which can be confusing for users. According to the CT DPH daily reports, “Day-to-day changes reflect newly reported cases, deaths, and tests that occurred over the last several days to week. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected.”
Testing data used to represent the number of tests completed with each test representing one person. Now that some people have been tested repeatedly, the number of tests is no longer equivalent to the number of people tested. For example, as of May 3, 2020, 105,330 tests were completed; the number of people tested is less than that number.
According to the CT DPH daily reports, “for public health surveillance, COVID-19-associated deaths include persons who tested positive for COVID-19 around the time of death (confirmed) and persons whose death certificate lists COVID-19 disease as a cause of death or a significant condition contributing to death (probable).” It's possible that individuals who were not tested but had COVID-19 listed as their cause of death were not included in the original case counts.
Furthermore, reporting lags between states and the Centers for Disease Control and Prevention (CDC) could impact the numbers, depending on what data source you are using. According to the New York Times, “The speed of that data reporting varies considerably by state. In Connecticut, for example, where reported coronavirus deaths are high, the CDC statistics include zero reported deaths from any cause since Feb. 1, because of reporting lags.”
Another thing to consider is whether the data presented is the raw counts or if it has been translated to another metric like a rate or a logarithmic scale.
Counts, like the number of cases or deaths, allow readers to easily understand change over time but are not great for comparisons. Percentages and rates, on the other hand, provide a more standardized way to compare how COVID-19 is impacting different geographies or subpopulations of our communities.
Let’s look at a simple example, in the chart below, to understand how this works. When looking at the number of hospitalizations by county, it appears that Fairfield, New Haven, and Hartford counties were the most impacted. However, when you convert that number to the percentage of hospitalizations for confirmed cases, you see a different picture with Middlesex, New London, and Tolland having a greater percentage of cases hospitalized.
Looking at the numbers by percentages and rates is especially important when disaggregating the data by demographics to understand how COVID-19 intersects with health equity. For example, the data suggests that black individuals are experiencing higher rates of COVID-19 contraction and mortality, which are both greater than the percent of population in the state.
What outside factors might influence the findings?
While we cannot attribute causality from aggregate numbers alone, there are a variety of outside factors that are likely influencing the findings. These include:
Increases in testing—as more tests are completed, there is the potential to detect more cases, which will increase the number of confirmed cases. These people would still have been positive without the test but not included in the official data reports.
Executive Orders for social distancing—as COVID-19 spread rapidly through our communities, nation, and world, leaders enacted stay at home orders, asking residents to stay home and closing businesses, schools, and many public spaces. The intention of these orders was to slow the spread and, hopefully, flatten the curve.
Health equity—as mentioned previously, certain groups in our communities have been impacted more severely than others. These disparities arise from systemic and institutionalized inequities impacting people of color, incarcerated individuals, and those living in nursing homes. For example:
Black individuals are experiencing higher rates of COVID-19 contraction and mortality, which are both greater than the percent of the state’s population. This reveals a health equity concern rooted in systemic and institutionalized racism and the unequal distribution of services and resources that must be addressed in our communities.
Individuals living in jails and prisons in Connecticut are staying in confined spaces unable to socially distance themselves. Because of this, as of April 21, 2020, “if CT prisons and jails were a town, they’d have the highest COVID-19 infection rate in the state.”
Nursing home residents account for over half of COVID-19 deaths in Connecticut, with homes hit the hardest having had “more staffing and infection control problems before [the] pandemic.”
Headlines, news briefs, articles, graphs—it can all feel overwhelming and difficult to navigate. But when we ask ourselves the right questions and utilize trusted data sources, we can begin to deepen our understanding of the current situation. More importantly, we can use this information to inspire action and create necessary changes to improve the wellbeing of our communities. You can always count on CTData to provide accurate, updated information, so make sure to check out our website, blog, and newsletter. You can also follow us on Twitter, LinkedIn, and Facebook for real-time updates and feel free to share any tips with us about how you critically consume data!