Tuesday, April 7, 2020

CoViD-19 – Some “In-the-Box” Thinking


Words of Caution:

First let say the following things clearly and for the record:
  1. I am not an epidemiologist although I have had some familiarity with chemical kinetics equations, which are very similar to the epidemiology equations for disease spread,
  2. I do not warranty any of what follows,
  3. I strongly urge everyone to continue to follow the guidance of the US CDC and all guidelines and orders of Local, State, and Federal authorities, and
  4. I to specifically urge that you NOT use what appears here to make personal decisions.

Background:


During this time while I am stuck at home (i.e., In-the Box), I decided to occupy some of my time by trying my hand at modeling the current Coronavirus (CoViD-19) outbreak here in the US. After finding a free FORTRAN compiler, I started programming the equations that describe the standard epidemiology equations for disease spread. For simplicity I chose to use the discrete version rather than the differential forms. I have described the Model (both the equations and how I chose various parameters) at the end of this post, for those who care about the details.

Data:


The data used here are those available on two different websites for the number of CoViD-19 Cases and Deaths for the United States. The data labeled Worldometer Data is from the website here and the data labeled CDC Data is from the website here. The Worldometer website has interactive plots of both the cases and deaths, and the data I have here was extracted from them by hovering over the various points and recording the data. The CDC website has the case data available in a table that can be scrolled, but I have seen no historical data for the deaths, and as I have been recording those data only since 5 days ago I have not shown them on the plot, nor adjusted any of the parameters to try to match them.

Results:


Below are the current (April 6, 2020) plots for the available data and the model fits. I am satisfied that these are in sufficient agreement with the past that I am willing to say a few more things.

Confirmed Cases vs. Days since the 100th Confirmed Case for the US.

Deaths vs. Days since the 100th Confirmed Case for the US.

This model suggests that: 
  1. The stay-at-home orders and the social distancing being practiced across the country are having a clear and downward pressure on the cases and will also show a reduction in the deaths as well.
  2. There is some indication that there is some small positive influence on disease outcome (people not dying!) from whatever treatments are being provided.
  3. The number of infectious cases is still very high and these practices must be maintained for the foreseeable future if we are to clear the virus from our population.
  4. The maximum number of new cases (i.e., the number of people who get infected on a specific day, and hence are now infectious) is likely behind us (model says it occurred on or about March 28).
  5. The maximum number of infectious people on any given day is also likely to be behind us (model says this happened on or about April 2 or 3).
  6. The maximum number of newly Confirmed Cases on any given day is likely to have happened, be happening, or about to happen (model shows it somewhere in the span April 5, 6, or 7).
  7. The maximum number of new deaths is likely to still be in the future (somewhere around April 8 - 11).
  8. The last new case is likely to be infected on or about May 22nd (!).
  9. The last death may occur on or about the same day and will represent something like the 24,000th death.
  10. The last infectious person will clear quarantine some days after that (10-24 days are typical recovery times I've seen quoted, so that works out to June 1-15).
The bottom line I take from the model is that this epidemic is no where near done, although we may start seeing some positive news soon, possibly this week. Also, when things start to look really good we must all take a deep breath and continue the stay-at-home and social distancing for at least 2 weeks more than we will want - or seem to need. This will be critical in assuring that the last few cases find no new victims.

An additional result is that the model predicts about 82 million people will have had the disease and recovered. This will be interesting to compare to any retrospective antibody testing that are done on the population of the US.

Finally, I suspect that the model will have to be adjusted to reduce the drawdown factors for both cases and deaths, which will likely move all the dates which are in the future farther out (and will result in higher figures for both the number of people that recover and, unfortunately, the number that die. On the other hand, if these turn out (miraculously) to be correct, remember that you heard it here first.

The Model:

The variables:


J = day number
U(J) = number of uninfected people on day J
N(J) = number of newly infected people on day J
I(J) = number of infectious people on day J
C(J) = the number of people who are confirmed to have the disease on day J
TC(J) = the total number of confirmed cases through day J
D(J) = the number of people that die on day J
TD(J) = the total number of deaths through day J
AR(J) = the number of asymptomatic cases that recover on day J
CR(J) = the number of confirmed (=symptomatic) cases that recover on day J
R(J) = the total number of people who recover on day J
TR(J) = the total number of people who have recovered through day J

Assumptions:

  1. There are cases of CoViD-19 which can be either symptomatic or asymptomatic. For the purposes of simplicity the symptomatic people will all be assumed to be confirmed, and the asymptomatic cases will be assumed to not be confirmed.
  2. People who are asymptomatic are assumed to be infectious for 10 days(they recover on the 11th day).
  3. People who are symptomatic (and hence, confirmed) are assumed to be infectious for only 7 days as they are assumed to become confirmed on day 7.  All confirmed cases are assumed to either be in the hospital or in quarantine and, therefore, in both cases it is assumed that the conditions do not allow them to infect anyone.
  4. Confirmed cases all end up either recovering on the 24th day from their initial infection or dying on the 11th day after infection  (originally I assumed this would be around day 15, but to get a reasonable fit I had to change it). [Note: These lag times have no impact on the model  beyond which day they show up in the totals, as they were removed from the infectious pool when they were confirmed.]

The Equations: 

[Note: the differential equations are available by searching for the SIR model]

The Base Model



J = J + 1
N(i) = k1 * U(i-1) * I(i-1)
U(i) = U(i-1) – N(i)
C(i+7) = k2 * N(i)
TC(i) = TC(i-1) + C(i)
D(i+10) = k3 * C(i)
TD(i) = TD(i-1) + D(i)
AR(i) = N(i-10) – C(i-3)
I(i) = I(i-1) + N(i) – AR(i) – C(i)
CR(i) = C(i-24) – D(i)
R(i) = AR(i) + CR(i)
TR(i) = TR(i-1) + R(i)

Note 1: all these equations use integers as inputs and outputs. The math is done as real numbers and any non-integer part is included or excluded, as an extra person, based on a random number.
Note 2: In fact the calculation of N(i) is the sum of ten terms of this form, which actually use N(i-10) through N( i-1) and C(i-3) through C(i-1), vice I(i-1), as will become clear in the next section.

This base model basically results in essentially everyone in the model getting infected. So for the US total population of 330 million the death total comes out at slightly less than 200,000 deaths. This however does NOT fit the data, after about the first couple weeks. It just goes on up exponentially until U goes to essentially zero (actually 69).

Input Parameters/Information:


The initial condition is that U(0) = 330,000,000 and all others are 0. Then N(1) is set to 1, and off the model goes.

k1 is not just one number, but rather it is actually an array of ten multipliers based on which day it is from the the day a person gets infected. The array varies such that the actual infectious-ness of an individual is such that: for each of the 3 people they infect, they will infect (on average) 1/2 person total over days 1-3, 1/2 person over days 8-10, and the remaining 2 people are infected over days 4-7 (the numbers being the number days after they were infected). This is a crude approximation of the data given under the heading “'Characteristic' Infection progression in a single patient” here.

k2 = 90. / 3300. This based on the only study I have heard of that may properly define this ratio. This was data reported for the city of Vo, Italy. It was reported in the Wall Street Journal, WSJ. I saw it where that article was being discussed here). Hat tip: KT

k3 = 8% From the same WSJ article via the same blog post. Again Hat Tip: KT

Adjustments to the Base Model

These adjustment have been implemented to be able to match the actual number of cases and actual number of deaths over the current data set.

Adjustment 1: As can be seen in the plot for the number of cases, the initial exponential growth (the straight line in this log plot) begins to roll over on or about the 17th day (this is not clear in the graphs shown but if you look closely, on a greatly expanded scale, you can make it out). This date works out to March 20. This would seem to match fairly well with the onset of stay at home orders (for example NY Governor Cuomo on March 20 and CA Governor Newsom on March 19). The actual adjustment that gives an curvature rate that matches the data is to reduce the daily infection rate by a factor of 0.885^(i-16).

Adjustment 2: The actual death rate did not match using the crude figure of 8%. It required a reduction to 7.6%. This can easily be accounted for by any of several factors including: better more complete testing in the US vice what Italy had accomplished, a healthier population (Italy has a significantly higher rate of smoking), or some other minor factor (like simple rounding in the value of 8%).

Adjustment 3: Much as the number of cases was adjusted it became clear that matching the number of deaths requires an additional similar factor. The best match (by eyeball) uses it starting on day 23 (real world = March 26) and a factor of 0.96^(i-22). It is not clear what might case this, but I suspect that it is a reflection of some actual increase in the ability to treat patients.

Addendum:


This post, and the data and model results it is based on are through April 06 (actual data end April 05). Today, April 07, the data from both Worldometer and CDC show a significant spike upwards in both cases and deaths on April 06. The cause for this is unclear. I will not make any additional model changes for several more days.