Geospatial and gender-based causal inference epidemiological analysis for asthma hospitalization and mortality
Causal Inference, Binary Decision-Making, Multiple Hypothesis Testing
Data Overview
The data chosen is a subset of the CDC’s Annual State-Level National Chronic Disease Indicators (CDI) public dataset. The development of this research included the census of United States hospitalizations and deaths as provided by state hospital discharge data from AHRQ, death certificate data from vital statistics agencies, and population estimates from the U.S. Census Bureau or suitable alternatives, as reported by the CDC.
There are no known groups that were systematically excluded from the study. Selection bias may be a concern if states are better or worse at providing the relevant data. It is unknown whether the agencies involved in data collection specifically notified their patients of data collection; however, the data has been anonymized by providing aggregates on the state level. Interpretable data is provided as annual hospitalization and death rates due to asthma among the general hospitalization and death rates per state. This may affect the interpretation of our findings by lack of more detailed patient information to analyze the impact of all possible confounders. The CDC website includes some important comments on the limitations of the data as follows:
“The use of a population based-measure can be misleading as it is affected by changes in prevalence over space or time. As one person can have multiple hospitalizations for asthma in a single calendar year, this indicator describes rate of events, not rate of persons hospitalized.” “The reliability of death certificate data for asthma has been questioned, particularly for older age groups… leading to an overestimation of asthma deaths among people age 55 and older.” (CDC)
As the data is from a national census and we are not analyzing asthma prevalence in a larger population beyond recorded hospitalizations and deaths, selection bias and convenience sampling were not of concern in our study.
Considering the limitations outlined by the CDC above, we believe more specific information about mortality and hospitalization rate for different age groups would allow for more accurate analysis of both of our questions. Additionally, it may be useful to have more specific (e.g. city-based) location data, since air quality likely varies between cities. Different stratifications of gender and ethnicity per instance would have been useful to analyze in tandem.
Research Questions
More report formatting in progress! I wanted to get some of this out there.