Words of Caution:
First let say the following things
clearly and for the record:
- I am not an epidemiologist although I have had some familiarity with chemical kinetics equations, which are very similar to the epidemiology equations for disease spread,
- I do not warranty any of what follows,
- I strongly urge everyone to continue to follow the guidance of the US CDC and all guidelines and orders of Local, State, and Federal authorities, and
- I to specifically urge that you NOT use what appears here to make personal decisions.
Background:
During this time while I am stuck at home (i.e., In-the Box), I decided to occupy some of my time by trying my hand at modeling the current Coronavirus (CoViD-19) outbreak here in the US. After finding a free FORTRAN compiler, I started programming the equations that describe the standard epidemiology equations for disease spread. For simplicity I chose to use the discrete version rather than the differential forms. I have described the Model (both the equations and how I chose various parameters) at the end of this post, for those who care about the details.
Data:
The data used here are those available on two different websites for the number of CoViD-19 Cases and Deaths for the United States. The data labeled Worldometer Data is from the website here and the data labeled CDC Data is from the website here. The Worldometer website has interactive plots of both the cases and deaths, and the data I have here was extracted from them by hovering over the various points and recording the data. The CDC website has the case data available in a table that can be scrolled, but I have seen no historical data for the deaths, and as I have been recording those data only since 5 days ago I have not shown them on the plot, nor adjusted any of the parameters to try to match them.
Results:
Below are the current (April 6, 2020) plots for the available data and the model fits. I am satisfied that these are in sufficient agreement with the past that I am willing to say a few more things.
Confirmed Cases vs. Days since the 100th Confirmed Case for the US.
Deaths vs. Days since the 100th Confirmed Case for the US.
This model suggests that:
- The stay-at-home orders and the social distancing being practiced across the country are having a clear and downward pressure on the cases and will also show a reduction in the deaths as well.
- There is some indication that there is some small positive influence on disease outcome (people not dying!) from whatever treatments are being provided.
- The number of infectious cases is still very high and these practices must be maintained for the foreseeable future if we are to clear the virus from our population.
- The maximum number of new cases (i.e., the number of people who get infected on a specific day, and hence are now infectious) is likely behind us (model says it occurred on or about March 28).
- The maximum number of infectious people on any given day is also likely to be behind us (model says this happened on or about April 2 or 3).
- The maximum number of newly Confirmed Cases on any given day is likely to have happened, be happening, or about to happen (model shows it somewhere in the span April 5, 6, or 7).
- The maximum number of new deaths is likely to still be in the future (somewhere around April 8 - 11).
- The last new case is likely to be infected on or about May 22nd (!).
- The last death may occur on or about the same day and will represent something like the 24,000th death.
- The last infectious person will clear quarantine some days after that (10-24 days are typical recovery times I've seen quoted, so that works out to June 1-15).
The bottom line I take from the model
is that this epidemic is no where near done, although we may start
seeing some positive news soon, possibly this week. Also, when
things start to look really good we must all take a deep breath and
continue the stay-at-home and social distancing for at least 2 weeks
more than we will want - or seem to need. This will be critical in assuring that the
last few cases find no new victims.
An additional result is that the model
predicts about 82 million people will have had the disease and
recovered. This will be interesting to compare to any retrospective
antibody testing that are done on the population of the US.
Finally, I suspect that the model will
have to be adjusted to reduce the drawdown factors for both cases and
deaths, which will likely move all the dates which are in the future
farther out (and will result in higher figures for both the number of
people that recover and, unfortunately, the number that die. On the
other hand, if these turn out (miraculously) to be correct, remember
that you heard it here first.
The Model:
The variables:
J = day number
U(J) = number of uninfected people on
day J
N(J) = number of newly infected people
on day J
I(J) = number of infectious people on
day J
C(J) = the number of people who are
confirmed to have the disease on day J
TC(J) = the total number of confirmed
cases through day J
D(J) = the number of people that die on
day J
TD(J) = the total number of deaths
through day J
AR(J) = the number of asymptomatic
cases that recover on day J
CR(J) = the number of confirmed
(=symptomatic) cases that recover on day J
R(J) = the total number of people who recover on
day J
TR(J) = the total number of people who
have recovered through day J
Assumptions:
- There are cases of CoViD-19 which can be either symptomatic or asymptomatic. For the purposes of simplicity the symptomatic people will all be assumed to be confirmed, and the asymptomatic cases will be assumed to not be confirmed.
- People who are asymptomatic are assumed to be infectious for 10 days(they recover on the 11th day).
- People who are symptomatic (and hence, confirmed) are assumed to be infectious for only 7 days as they are assumed to become confirmed on day 7. All confirmed cases are assumed to either be in the hospital or in quarantine and, therefore, in both cases it is assumed that the conditions do not allow them to infect anyone.
- Confirmed cases all end up either recovering on the 24th day from their initial infection or dying on the 11th day after infection (originally I assumed this would be around day 15, but to get a reasonable fit I had to change it). [Note: These lag times have no impact on the model beyond which day they show up in the totals, as they were removed from the infectious pool when they were confirmed.]
The Equations:
[Note: the
differential equations are available by searching for the SIR model]
The Base Model
J = J + 1
N(i) = k1 * U(i-1) * I(i-1)
U(i) = U(i-1) – N(i)
C(i+7) = k2 * N(i)
TC(i) = TC(i-1) + C(i)
D(i+10) = k3 * C(i)
TD(i) = TD(i-1) + D(i)
AR(i) = N(i-10) – C(i-3)
I(i) = I(i-1) + N(i) – AR(i) –
C(i)
CR(i) = C(i-24) – D(i)
R(i) = AR(i) + CR(i)
TR(i) = TR(i-1) + R(i)
Note 1: all these equations use
integers as inputs and outputs. The math is done as real numbers and
any non-integer part is included or excluded, as an extra person,
based on a random number.
Note 2: In fact the calculation of N(i)
is the sum of ten terms of this form, which actually use N(i-10)
through N( i-1) and C(i-3) through C(i-1), vice I(i-1), as will
become clear in the next section.
This base model basically results in
essentially everyone in the model getting infected. So for the US
total population of 330 million the death total comes out at slightly less than 200,000 deaths. This however does NOT fit the data, after about the
first couple weeks. It just goes on up exponentially until U goes to
essentially zero (actually 69).
Input Parameters/Information:
The initial condition is that U(0) =
330,000,000 and all others are 0. Then N(1) is set to 1, and off the
model goes.
k1 is not just one number, but rather
it is actually an array of ten multipliers based on which day it is
from the the day a person gets infected. The array varies such that
the actual infectious-ness of an individual is such that: for each of
the 3 people they infect, they will infect (on average) 1/2 person
total over days 1-3, 1/2 person over days 8-10, and the remaining 2
people are infected over days 4-7 (the numbers being the number days
after they were infected). This is a crude approximation of the
data given under the heading “'Characteristic' Infection
progression in a single patient” here.
k2 = 90. / 3300. This based on the only
study I have heard of that may properly define this ratio. This was
data reported for the city of Vo, Italy. It was reported in the Wall
Street Journal, WSJ. I saw it where that article was being discussed
here). Hat tip: KT
k3 = 8% From the same WSJ article via
the same blog post. Again Hat Tip: KT
Adjustments to the Base Model
These adjustment have been implemented
to be able to match the actual number of cases and actual number of
deaths over the current data set.
Adjustment 1: As can be seen in the
plot for the number of cases, the initial exponential growth (the
straight line in this log plot) begins to roll over on or about the
17th day (this is not clear in the graphs shown but if you
look closely, on a greatly expanded scale, you can make it out).
This date works out to March 20. This would seem to match fairly
well with the onset of stay at home orders (for example NY Governor
Cuomo on March 20 and CA Governor Newsom on March 19). The actual
adjustment that gives an curvature rate that matches the data is to
reduce the daily infection rate by a factor of 0.885^(i-16).
Adjustment 2: The actual death rate
did not match using the crude figure of 8%. It required a reduction
to 7.6%. This can easily be accounted for by any of several factors including: better more complete testing in the US vice what Italy had
accomplished, a healthier population (Italy has a significantly
higher rate of smoking), or some other minor factor (like simple
rounding in the value of 8%).
Adjustment 3: Much as the number of
cases was adjusted it became clear that matching the number of deaths
requires an additional similar factor. The best match (by eyeball)
uses it starting on day 23 (real world = March 26) and a factor of
0.96^(i-22). It is not clear what might case this, but I suspect that it
is a reflection of some actual increase in the ability to treat
patients.
Addendum:
This post, and the data and model results it is based on are through April 06 (actual data end April 05). Today, April 07, the data from both Worldometer and CDC show a significant spike upwards in both cases and deaths on April 06. The cause for this is unclear. I will not make any additional model changes for several more days.
1 comment:
Brilliant.
Post a Comment