It is fascinating, inspiring (and disappointing) to see effective responses to the Covid pandemic from other countries. Great partnerships and effective connection of governmental leadership, industrial production, and healthcare information can combine to combat the pandemic.
Taiwan has had only 446 cases and 7 deaths, for 24 million residents, since the start of the pandemic, despite their proximity to, and the frequent travel and many flights to and from China.
One my good colleagues Dr. Patrick Guffey turned me on to this website, that takes publicly available Public Health data and turns it into a graph projecting the current R(0) or infectivity rate, per state. I have found it to be compelling, and it reflects what we are hearing in reports from various states.
Consider adding this to your usual litany of sites monitoring the pandemic. I come and refresh my website view each day.
The Covid-19 pandemic is still quite uncontrolled in the US.
In this post, we’re going to walk through an analysis that was conducted by the UCHealth data science team looking at “leading indicators” that could help us to plan for a coming spike in COVID-19 inpatient hospitalizations before we actually see an influx of bed demand.
Perhaps, if we start to see more patients reporting a cough, fever, chills, and other flu symptoms, we would expect that this may indicate a growing spread of COVID-19. However, can we actually use the prevalence of these symptoms to predict how many ICU beds will be needed for COVID-19? What about less common symptoms of COVID-19, such as loss of smell or taste, that have been shown to be more predictive of COVID-19 infection?
While this may sound like a relatively straightforward question, there are a number of confounding effects that make it difficult. The above graphic shows the number of patients making an outpatient or virtual office visit due to a fever. As expected, there is a general downward trend as the seasonal influenza season subsides. However, there also appears to be a “spike” in reports of fever in early March in our Northern Colorado geography (orange line). Could this spike be quantified for future predictions?
Defining a “symptom” in our Epic electronic health system is complex. For example, symptoms can be documented as the “reason for visit”, but a medical assistant may or may not choose to report all symptoms as the visit reason. Besides “reason for visit”, our Epic team has developed a COVID-19 symptoms checklist that screens patients at check-in (completed by front desk staff). This list was expanded substantially in the midst of the epidemic based on new evidence (for example, loss of smell). The consequence is that we saw an increase in reporting of these symptoms in April, due to the new data fields, while our actual number of COVID-19 inpatient cases was declining. In short, there is a significant amount of noise to parse through before arriving at a prediction we can trust.
How did we go about identifying the signal from the noise? Knowing that there was no “right” answer, we tested different approaches. I’m going to focus here on the most recent modeling attempt that we have found to be most insightful. We started with the premise that the correlations between our independent variables (reported reason for visit, reported COVID-19 symptoms, and documentation of ICD-10 billing codes indicative of confirmed or potential COVID-19 infection) and our dependent variable (number of COVID-19 inpatient hospitalizations) would change over time due to trends in seasonal influenza and introduction of new codes/data elements in our EMR system. We therefore constructed separate linear regression models for the months of March (when the epidemic hit and we did not yet have IT system capabilities for tracking many symptoms), April (when COVID-19 cases hit their peak and then declined, accompanying a ramp-up in new IT system capabilities), and May (something of a “steady state” when seasonal influenza had passed and no major IT updates were made regarding COVID-19 symptoms or billing codes).
We wanted to test a large number of independent variables, and therefore chose to use a linear regression method known as LASSO regression instead of the traditional OLS modeling technique. LASSO regression introduces a regularization parameter that penalizes large coefficients in the model. Instead of optimizing to minimize prediction error, the model minimizes the below cost function:
Y: Dependent variable
X: Independent variable
β: Regression coefficient
λ: Regularization parameter
n: Number of observations
p: Number of independent variables in the model
In plain English: we reduced the complexity of the model and thus reduced the chance of spurious correlation or the influence of random “noise” in the data.
Our independent variables were reported outpatient symptoms and diagnoses in the seven days prior to the index date, and our dependent variable was the number of COVID-19 hospitalizations in the seven days after the index date. For example, on May 1 we fit the numbers of reported symptoms and documented ICD-10 codes from the prior 7 days (4/24-4/30) to the number of hospitalizations in the next 7 days (5/1 – 5/7). An astute reader will note that our modeling approach violates one of the tenets of linear regression modeling in that the observations are not mutually independent, but rather a time series. To mitigate this issue, as well as the small number of observations in a given month, we used a procedure drawing bootstrapped samples from each month 100 times, and for each sample, using a 5-fold cross validation process to determine the optimal regularization parameter, fit a LASSO regression model. A bootstrap sample is a random sample of the same size as your original data drawn at random with replacement from the original data, so in some samples data points for 5/1, 5/2, and 5/3 will all be included, some may only include 5/1, and some may include none of those data points.
Once again giving a simple English translation for those less interested in the modeling approach: we introduced some randomness to our data to give ourselves better confidence in our estimates of the linear correlation between each variable and our outcome of number of future COVID-19 hospitalizations.
The below table summarizes, by month, the average correlation coefficient from all of the LASSO regression models fit to bootstrapped samples of data from that month, sorted in decreasing order by the value in May. Please interpret the nomenclature as follows:
reason_visit: Indicates the variable is the reported reason for visit in an outpatient or virtual encounter
symptom: Indicates the variable is one of the COVID-19 symptoms selected from a checklist by clinicians at the beginning of outpatient/virtual encounters
icd: Indicates the variable is documentation of an ICD-10 code referencing confirmed or suspected cases of COVID-19
reason_visit_SHORTNESS OF BREATH
symptom_Shortness of breath
symptom_Loss of smell
symptom_Bruising or bleeding
symptom_Loss of taste
The strongest positive correlation with future COVID-19 hospitalizations in the month of May was “cough” as the reason for visit. At first, the trend in this correlation over time seems counterintuitive. Why would we see such a strong negative correlation in the month of March but a strong positive correlation in the month of May? Well, a reasonable hypothesis has to do with the ramp-up in COVID-19 testing coinciding with the end of the 2019-2020 seasonal flu. In March, we saw an overall decline in patients seeking outpatient care for a cough, likely due to both the end of seasonal flu and social distancing keeping patients from seeking treatment at medical facilities, while we simultaneously initiated widespread COVID-19 testing at our inpatient facilities and saw a rapid rise in confirmed cases. In May, by comparison, there was no noise from the seasonal flu influenza and no significant backlog in testing to ramp up.
We can also look at the distribution of the regression coefficient for the cough variable in our bootstrapped samples to better establish our confidence in the value. The below histogram shows the distribution of the coefficient across all 100 bootstrapped samples for the months of March (blue), April (orange), and May (green). Notice that for a large number of samples from March and April, the coefficient is near 0, while for the month of May it ranges consistently between 5-10. What does this mean? It means that a few data points in March and April are likely having a disproportional impact on the estimate of the linear correlation, while the correlation in May is more consistent regardless of which dates are sampled.
Examining the scatterplot for the month of May, we see that this linear correlation does appear quite consistent across the time period.
After all of this analysis, what are our big takeaways? Can we take our regression model for the month of May and start using it to predict bed demand? Unfortunately, this would be unwise. One month of data is too limited a timeframe for us to be confident in our model. While we see a significant correlation between patients seeking treatment for a cough and inpatient COVID-19 hospitalizations in the month of May, both variables declined over the majority of the timeframe. We would feel significantly more confident in our model if we observed a spike in inpatient hospitalizations preceded by a large number of patients reporting in outpatient settings with a cough, as opposed to the continuous decline. Hopefully, this never happens, but we believe a second wave of COVID-19 infections is very probable by at least next Fall or Winter. Our plan is to continue to update our model with new data, potentially including new data sources such as patient engagement with our Patient Line call center resources or Livi chatbot feature, through the next wave of infections and observe performance before deploying to assist in the management of hospital resources.
Here is a new term for you: Doomscrolling. I am guilty of this, until I become aware of it and have to wrench myself away. It is a like car-crash in slow motion and you want to know how this horror story ends.
Time for more data surfing! UCHealth’s overall visit volume (including in-person and video visits and scheduled phone visits) have recovered about 80-90% of pre-pandemic levels.
Today, we’re looking at visit volumes among different age groups of patients. Keep in mind, UCHealth is primarily an adult hospital. Our partner, Childrens Hospital of Colorado, sees most of the pediatric population regionally. We do have some pediatric practices, and of course our extensive family medicine primary care practices also see pediatric patients. This will explain the low volume of pediatric visits below. On the other end, only 3.9% of UCHealth patients are over age 85.
So, what happened to visit volume with each of these age groups?
Turns out, the curve for EVERY age group is similar! Green is age 40-65 and about 1/3 (our largest fraction) of our patient population. Fuchsia is 65-85 and our second largest, purple is 18-40, orange is under 18, red is over 85. The curves start at different points, but follow the same trajectories. That divot on the right side is Memorial day, clinics closed, so 4/5 of the weekly volume that week.
Here is the Home telehealth Video Visit volume! Some interesting findings here. You notice that fuchsia and purple switched places, meaning that a much higher proportion of 18-40 year old patients chose Video Visits compared to 65-85 year old patients. All the other curves stayed in their relative positions. Furthermore, EVERY age group had a proportional bump up in video visits, even those over 85! Finally, the video visit curve is falling back, about to 50% of the peak (so far). It will be interesting to track this in the coming month or 2 and see where we end up, after in-person visits are fully ramped up again.
CMIO’s take? Who knows? Another example to show that we are going to bed with a cliff-hanger every night. I wonder what happens next. The good news: I’m feeling good about having a better handle, even after a few short months, of what Covid-19 can throw at us. Ain’t data cool?
We are well into our fourth month of this pandemic. Looking at our graph, purple shows influenza B peaking in December, influenza A peaking in February, and leaving aside an artifactual spike in mid March, when we started co-testing for major respiratory viruses at the same time we started testing fro Covid-19 in earnest, all other viruses have dissipated. Then you see this impressive bump in Covid-19 illness, peaking in mid April, in our organization. Keep in mind, this is just POSITIVE tests for Covid-19 RNA in patients seen at UCHealth. Because we care for 1.9 million patients in Colorado, though, it is a reasonably large population sample. Furthermore, Covid-19 tests were SCARCE prior to mid March, and numerous patients were likely developing Covid symptoms in February (see below).
So, how has this affected our visits and our telehealth efforts? Purple shows you the dramatic dip with in-person outpatient visits, and the gradual climb back toward baseline. Then there is the green line of home telehealth video visits, going from nearly nothing to about 20,000 weekly in early to mid March, with gradual falling off in the past 8 weeks and it seems we might stabilize near 10,000 visits weekly. This is still about 100x the volume of video visits prior to the pandemic.
Then there are the other trend lines that are interesting: Red is the ongoing volume of Patient messages before and during the pandemic. Leaving aside the bump in mid May (not sure why: perhaps related to a system-broadcast), our baseline of 22,000 messages per week increased to 30,000, about 33% increase in volume, starting to rise on Feb 22. This pre-dated by THREE WEEKS the steep decline of in-person visits and the upswing of telehealth visits on Mar 14, and the Colorado Stay at Home order of Mar 26.
Even more interesting: telephone volume in blue, saw a tiny bump on Mar 14, but then was unchanged during the entire period. By contrast, in fuchsia Scheduled telephone visits (billable as of mid March per CMS rules), appeared in early April.
In one graph, you can see: online patient messaging demand scaling up, phone calls being static, scheduled phone calls appearing when billable, on top of the change for in-person and video visits.
Some hidden factors at work here: UCHealth set up a Covid-19 nurse advice line; those calls are not visible on any line in this graph, and those hard-working nurses took tens of thousands of calls from Coloradoans (not just UCHealth patients).
So, this data dilettante has to ask, could an increase in online patient messaging (regardless of content of message) be another possible leading indicator for future pandemic surges? We can’t be sure if these messages were about general anxiety, Covid symptoms, or perhaps completely unrelated, but it is suspicious that there is a sustained increase in volume of messages by 30%+ since mid-March. On the other hand, why isn’t online message volume falling, like home telehealth visits are falling, now that clinics are opening up in-person appointments? Stay tuned!
The open question now is: what will CMS (Centers for Medicare/Medicaid Services) do with paying for Video visits and scheduled Telephone visits? Will those payments stop or scale back? This will certainly affect all health systems still heavily relying on Fee for Service, until the rise of Value Based Care (insurance plans paying for Quality instead of Volume) takes over.
CMIO’s take? These are unprecedented times, and patient behavior and health system behavior is fascinating. A tiny RNA virus has changed the way (phone, online, in-person) patients and healthcare providers interact. What comes next?
Which is an entirely unreasonably long list; there are some great selections there. I’ll leave you to browse.
During pandemic, I’ve been learning clawhammer style, from this guy:
Makes my uke sound more like a banjo. Weird, and cool.
Meantime: Our clinics are getting back to business; our patients are returning to in-person care, our visit volumes are back up, past the 80% mark. I hope you are all staying safe; we’re not out of this yet, but it is starting to feel less like a sprint and more like a marathon. Take care of yourself, get some exercise, bring back a hobby or two.
Thanks to those of you who caught my non-displaying graph images, I’m reposting now converting my original PNG to JPG. Please let me know if you can see these and follow the reasoning below! (edited 6/15, CTL)
Thanks to Brendan Drew, one of our data scientists, who is diving into the analysis of Leading Indicators, for the graphs and reasoning below. If I can twist his arm for more graphs, will pass them along.
If you recall, I discussed this recently: the idea that, our future is uncertain. Even though we have survived the first wave of the Covid-19 pandemic, we are concerned about possible future waves. How might we prepare?
If you don’t know this about me already, I find “making the sausage” in informatics and data science fascinating. Here are some intermediate steps we are taking beyond my “data dilettante” days as we search for signal in the noise.
These are all COVID-19 new codes. Firstly, note that ORANGE line R68.89 , orange shows up WAY before March. Turns out, this is not only “suspected Covid-19” it is also “Other Symptoms and Signs” previously in the ICD10 dictionary. So, that is a terrible signal. Then, RED line Z20.828 “Close exposure to COVID-19” is also “Exposure to influenza”. Hmm. Then, BLUE line B34.2 “Coronavirus Infection” is also “Coronavirus, unspecified.” Also Hmm. Only GREEN line U07.1 “Coronavirus identified” is highly specific for COVID-19 in the graph.
So, how do we make sense of this?
First, we take ONLY hospital patient codes for CONFIRMED (BLUE) versus SUSPECTED Covid (ORANGE), and we see that the BLUE CONFIRMED line shows two peaks, whereas ORANGE, there is no real signal there at all. GREEN is adjusted for Market Share based on 2019 data for that zip code (we are trying to localize prediction to the Zip code level).
Now, we compare zip codes. Blue line is 80011, Aurora near University of Colorado Hospital, a relative hot spot in Denver Metro region, and orange is 80634, the hot spot near Greeley hospital, and we see a temporal difference in the onset and peak of Greeley being earlier than Aurora. Interesting.
Here is where it gets tantalizing, and we have to hold back our excitement: Pair up the outpatient symptom data with the inpatient hospitalization rate for Confirmed Covid. Here it is for Aurora, x-axis lined up by date:
Those of us who cannot contain our excitement will see a visual rise in RED (outpatient symptoms suspicious of COVID, like fever, cough, shortness of breath), in the 80011 zip code increasing about 2 weeks BEFORE the corresponding rise in COVID-19 cases at University of Colorado Hospital in Aurora (also 80011). We WIN! Right?
Also, here’s the corresponding graph for Greeley:
This is a bit messier: what is that symptom peak in February? There is no corresponding COVID hospitalization peak in Feb/Mar. BUT, the symptom peak in mid March DOES correspond to a rise and peak in late March, and all of April.
My theory: mid February was probably Influenza A, and we did NOT track hospitalizations on our graph for that, AND the COVID confirmed codes did not get implemented until mid March, and maybe NOT attached in retrospect to patients who MIGHT have had COVID, but were admitted BEFORE those codes went into effect. This is harder than it looks!
Are you looking for a final answer? SORRY! We are still cranking away at this. Even though we humans have frontal lobes that CANNOT WAIT to see patterns (even where there is no pattern!), we have to resist that urge. AND, how do you teach an algorithm (even if there IS a pattern here), to tell us: YES you should pay attention to THIS rise in the data, but THAT ONE is just random noise.
For example, imagine the 80011 graph prints out one day at a time, moving to the right. At what point, would you tell the algorithm to alert us: YES it is TIME TO BRING IN MORE DOCS AND STAFF FOR THE NEXT SURGE.
Would it be: March 15, when there is an uptick? But there are lots of upticks just like that. March 22, a week later, when the line is DOUBLE of the average from 0.0007 to 0.0014?
AND, worse yet, UCHealth is only one of 5 health systems in Metro Denver and across the state of Colorado. Will cases come to US or to other health systems? What will the peak be? Will it be a tiny peak? (Hey, CT, why did you call all of us in here for these dozen patients?) Will it be a HUGE peak (Hey, CT, you didn’t raise enough of an alarm, there still aren’t enough of us).
Finally, signal to noise MIGHT be easier for the summer months when Influenza is done, but what about the fall when Influenza B and many other viruses are back in action? What about seasonal allergies during spring and summer that might kick off cough and shortness of breath?
CMIO’s take? Figuring out Leading Indicators is HARD. If YOU have this figured out, let us know. We’re still working on it. But the math and the figuring-it-out is pretty fascinating in the mean time.