Sunday, April 26, 2020

Will They Ever Let Us Reopen the Country?

Let start with the answer: I have grave doubts that any of the blue governors will reopen their states in the next two months.  I truly hope that this is not due to them wanting the economy to completely collapse (and thereby hurting the president's chances for re-election), but I do not believe they are above such heartless calculations (and who cares how many people that might kill, they would still blame the president).  But now, with the political machinations covered, let's talk about how they can (will?) rationalize such policies.

First,  there are two conditions 'needed to reopen', a 14 day period of a downward trend in the confirmed cases and sufficient testing to be able to track and test.  I believe these two conditions are diametrically opposed.  The reason I believe this is that with increasing testing comes increasing positive tests (you see what you measure).  Can we demonstrate this?   Well let's see...

Consider the data for confirmed cases (= positive tests) vs day diagnosed for San Diego County.  The plot (available here) shows that after two weeks of cases below 100/day (April 5-April 19), we have suddenly seen a return to numbers over 100 (and the two biggest single day counts of the entire period).  Is this due an increase in the spread of the virus or is there another reason behind the counts increasing?  Well the county has helpfully also included a plot of the number of tests reported by date (available here).  Notice that the number of tests reported is also higher of late.  Someone who wants to see if the case increase is simply due to the test increase merely needs to either 1) plot the number of positive cases/day vs the number of tests reported/day and see if the numbers are correlated or 2) plot the rate of positive cases by day and see if these appear uncorrelated.  The two plots below are what you get for these (note that the data prior to March 21 shows a markedly higher rate of positive cases, much higher than remainder of the data, and a particularly small number of tests, so as to not obscure the relationship I have shown the data from March 21 onward).

  

The plot on the left is the number of positive test vs the total tests, while the plot on the right is the fraction of positive tests by date.

To me the plot on left shows a fairly strong positive correlation and the plot on the right looks like noise but with some non-zero average value.  The slope on the left is 0.623 (note this is not the standard least squares line, but rather is the line you get by a least squares fit where the intercept is forced to the origin - zero tests must yield zero positive results).  The average value for the fraction of positive cases is 0.069 +/- 0.001.  These are in sufficient agreement that I claim that the positive cases reported is strongly influenced by the total number of tests (and will tend to about 1/15th of the total tests).  This means that as the testing increases we will see an increasing number of positive tests.  This is unavoidable!  Hence, we will never meet both criteria.

If it will, in fact, be nearly impossible to see decreasing cases with increasing testing we should ask what will happen if the country won't reopen until we see a decrease (from presumably actually having the virus go through the population).  If the data above is an accurate reflection of reality (vice the far worse possibility that the tests have a false positive rate of 6%), then at any one time we have steadily had about 1/15th of the population infected with active virus.  If this is the stable rate in the locked down economy, and if the infections tend to last 10 days, then it will take something like 90 days to run through enough of the population (~50%)  to really see the cases start to wane, and something like twice that to be starved out of possible hosts.  We simply can not maintain this lock down that long.

I see no reasonable path forward, that is based on actual current data, other than begin a prompt reopening of the majority of the economy as per my post of April 19 and this much better written description of the same thing by someone who actually has credentials.

Aside #1: There was also a statement by that waste of oxygen, also known as the WHO, that it may not be the case that getting the virus and then recovering would provide immunity from future infection.  Well if that's the case, the lock down is a waste of time and we are all truly f****d.  If it is just another case of them trying to make everything look worse, then all the more reason to confine the WHO to the trash heap of history, where it belongs.

Aside #2: Speaking of just "trying to make things worse", I also wonder if the way the CDC keeps redefining how to count cases (and deaths) will also make the cases continue to rise.  The reason I could see the CDC doing this is that they want to make things look as bad as possible (just like the WHO), in order to make their early models less inaccurate.  They need something like that to happen or no one will take them seriously on future predictions.  I hope that isn't the case, but I have to look at all possibilities.

Sunday, April 19, 2020

Reopening the Country - Some Musings on How I Think It Should Be Done and Why.


It is becoming increasingly clear that all over the country people are getting antsy about getting back to something at least approximating their normal life and freedoms.  At the same time, the government has caused us to pay a huge price in those freedoms and in the economy at all levels.  These sacrifices have allowed us to get to where we are now vis a vis the CoViD-19 pandemic (past the peak and seeing general reductions across the board, having never over taxed the hospitals - with the probable exception of a short time in New York City, NYC).  Every time I run my model, the cases, and the deaths jump like crazy when the restrictions are removed.  It seems obvious to me that to simply walk away from them would be irresponsible and foolhardy. 

However, every plan I've heard so far is based on being able to be reactive to the changes that come about as we reopen.  This is unreasonable for two several reasons: 1) we do not have the testing capabilities to determine who really has the disease, or who has had the disease, and 2) we know that the previous outbreak was circulating for an extensive period of time prior to it becoming obvious, thus making any reactive plan doomed to be too little, too late, and 3) doing lots of little steps would just spread the pain out over a longer time.

No, reactive plans are not the right approach.  We need a path where we can confidently calculate the risks and the outcome.  In other words, we need to be able to predict the outcome, not sit back and try to respond to it.  Happily, there are several recent reports that finally give us the data we need to be able to predict with a reasonable safety margin.

First, there have been several recent (and I use the next word loosely) studies that give us a reasonable handle on the true number of cases that have already happened.  There are (to my limited knowledge) at least five results that allow us to determine a number for the ratio of the total cases to the confirmed cases, R_cases.  The first is mentioned in my blog of April 7.  In this event, the entire town of Vo, Italy was tested for active virus.  Some simple math led to the conclusion that R_cases(Vo) ~ 130.  Next came a report from Chicago (see my blog from the 14th).  Here a drive through testing site tested for the presence of antibodies.  They reported that 30-50% of the people tested showed such antibodies.  In this case the math says that R_cases(Chicago) is in the range 110-190.  The third was from the hot bed of American CoViD-19 cases, non-other than NYC.  The R_cases(NYC) comes out to 33 (for details see my blog from the 16th). Finally, over the last couple days there are reports from Santa Clara County, CA and Chelsea, MA (the reports are here and here, respectively).  For these you can easily extract the ratios as R_cases(SCC) ~ 50-80  and R_cases(CMA) ~ 16.  Now there isn't any safe/practical/justifiable way to conflate these into one number, but I think a safe estimate is around 40.  All this implies very strongly that we are much farther on the way to developing herd immunity than anyone would have guessed just a few weeks ago, but how far?

It is quite easy to find data on the web that allows one to compute the relative infection rates for various age groups (in this case see this CDC page).  My calculation says that the infection rate (based on the number of confirmed cases) for those aged 0-18 is less than 0.02%, about 0.15% for those 18-45, 0.22% for those 45-65, 0.18% for 65-75 and 0.26% for those 75 and older (this is for a case load of about 497,000 confirmed cases - which was about April 10).   If we multiply these by 40, the actual infection rates can be estimated at 0.8%, 6%, 8.8%, 7.2%, and 10.4%, and these could easily be anything from a factor of 2 lower, to a factor of 4 higher.  Also the confirmed cases are now more like 700,000, so that's another factor of 1.4.  Nonetheless, we do not yet approach anything like herd immunity (except at the highest of the possible factors, and even then we'd be at something like 50%).

Second, the data now clearly tell us who can, with a relative safety risk, allow themselves to be exposed, and who should not.  All the data point very strongly to the disease being hardest on the elderly [I hate that that world applies to me... I really thought that getting old would take longer...] and those with significant underlying health issues, and is extremely dangerous for those who fall into both categories.  The relative death rates per 100,000 people in the same age groups as above are, less than 1, 11.3, 96.7, 311.7, and 778.8, respectively (from the NYC Health Dept CoViD-19 data webpage).  Based on some CDC numbers (which produce similar rate numbers) I can estimate that the rates for a breakdown of the 45-65 group into 45-55 and 55-65 would be something like 58 and 135. 

So how can we start to reopen as soon as possible?  Reopen slowly, with care, but only for age groups below 55 and only for those without substantial risk factors (see this CDC page, and links thereon).  These groups must do what they can to  keep the spread rate below about half of what it was, (still practice social distancing and extra hygiene practices).  This allows us to get a large fraction of the population back to work with minimal risk.  Even in a worst case (if there is less than 10% of the population currently exposed/immune and the spread rate is very high) the total number of additional deaths would almost certainly remain at or below what we have currently seen and the number of people stressing the health systems would likewise stay below the capacity of the system (the intrinsic death rates for these groups are more than 10 times less than for the most at risk age group, which completely dominates the current statistics, and if those with underlying conditions are removed from the 45-55 age group the death rate is likely to be well below the 58 quoted above).  The big risk in this plan is folks that have been 'reopened' may not keep a hard safe distance between themselves and those who are not reopened (and are much more at risk).  This is a real risk.

But, that's my plan. We would really want to emphasize over and over and over how important it is to not go see those who are still in isolation. We'd want to test as much as possible, and contact trace as well, but it really looks to be a plan that would result in less risk than a general reopening, and would have much less risk of really dire consequences. We'd definitely want to have grocery stores, etc, have special hours for those folks not in the reopened groups (and those folks would probably want to have as much as possible delivered or to arrange to pick it up).  People still won't get to go hug grandma and grandpa until a vaccine has been widely distributed, but such is life.

Thursday, April 16, 2020

CoViD-19: Not Much More Real Data, But Enough to Figure With...

So I'm sure you've seen the report about women checking in to a maternity ward in NYC (story here). While it is always dangerous to give any one bit of data too much credence, this one seems worth following to it's logical conclusion...

First a quick summary: a hospital tested every woman entering the hospital to give birth for CoViD-19. They collected the data over the two weeks (14 days) March 22 - April 4. There were 215 admissions and 33 of them tested positive for the virus. Of them only 4 "had symptoms".

The first question that came to mind was "Had four non-pregnant women called their doctors and described their symptoms, how many would have gotten tested?"
I can't know for sure, but I know from my son (who lives in New York City, NYC) that when he called in with the basic symptoms and a low grade fever, they did not give him a test. So let's guess that maybe 1 would have been tested. In that case, out of the sample of 215, there would have been 1 confirmed case.

The second question, aiming at the same number, would be "Well if we drew 215 random people from NYC, how many would be from among the Confirmed cases?"
We know that NYC reported a total of 62,230 new cases during that time period (see data on the excellent NYC CoViD-19 website). NYC's population is given (Google search) as various numbers from 8.4 million to 8.7 million. Let's use 8.6 million. Further, data suggests that for all but the most severe cases the recovery period is about ten days. So we'd expect to be drawing 215 random people from 8.6 million, of which each day there about 44,450 active confirmed cases (44,450 = 62,230 X 10 / 14). A slightly incorrect but close enough answer is: 44,450 * 215 / 8,600,000 = 1.1 Hence, we'd expect about 1 Confirmed Case.

Those two estimates are not in bad agreement. So let's assume that there really would have been 1.

That leads to another question, "If we see 33 actual cases in the 215, when we'd expect 1 confirmed case, then how many actual cases would we expect there to be in the city?" Well, the ratio of Confirmed to Actual cases in the 215 is 1:33, there would be about 62,230 X 33 ~ 2,000,000 total new cases during that 14 day period. Now extend that to the total time the cases have been running through the city. That means we take the total number of confirmed cases, which is 117,565 (and still climbing!) and do the same multiplication. That works out to something like 4 million. That's nearly half the population!

Further, there is data from Vo, Italy and some hints from Chicago that the ratio of total cases to confirmed cases is in the range 100:1 to 200:1 (see the previous two posts). If you believe those (and the data from the pregnant women don't really exclude these given the small numbers of them tested), then we are talking numbers so big that essentially everyone in NYC has had CoViD-19. If true, we can expect the number of new cases there to absolutely crash over the next week to 10 days. It will be interesting to see. On the other hand, if the numbers don't crash, then I'll have to toss my existing models out and start again with a reduced ratio like 33.






Tuesday, April 14, 2020

CoViD-19: A Follow Up

Words of Caution:


See the post below.  They all still stand!

Background:


Last week I presented some results from a quick and dirty model, utilizing the best parameters I could find, and the simplest assumptions that allowed me to fit the data.  Well it's been an interesting week, to say the least.  I have had to make a few changes to fit the new data.  So without any further ado...

Data:


I am still using the same data sources for cases and deaths as I did last week.  I just can't find anything that even remotely seems better.  I now have enough CDC death numbers that I'm showing those as well.

Results:


There are four plots below that graphically show the fits.  I am satisfied that while they are some what ad hoc, they agree to a more than adequate extent.

A                                                                              B
Figures 1A and B. Two plots of the Worldometer and CDC Data and the Model results (the one on right, B, being a blow up of the y-axis to show the quality of the agreement at large numbers).


Figure 2. Plots of the Day-to-Day cases numbers.  Note that the data are extremely noisy over the last 10 or so days.


Figure 3. Data for the number of deaths.

It is clear that the model is doing a good job of capturing the dynamics of the CoViD-19 outbreak.since last week.

The conclusions I would draw have changed only somewhat taking into account that last week the model was predicting far fewer cases, because I was completely taken in by the wiggle you can see in the day-to-day case data from around 3/27-3/31.  Beyond that the comparison to last weeks "suggestions" are:
  1. The stay-at-home orders and the social distancing being practiced across the country are having a clear and downward pressure on the cases and will also show a reduction in the deaths as well.  This still holds.
  2. There is some indication that there is some small positive influence on disease outcome (people not dying!) from whatever treatments are being provided. This still holds.
  3. The number of infectious cases is still very high and these practices must be maintained for the foreseeable future if we are to clear the virus from our population. This still holds.
  4. The maximum number of new cases (i.e., the number of people who get infected on a specific day, and hence are now infectious) is likely behind us (model says it occurred on or about March 28). The conclusion holds, but the date at which it occurred was April 2.
  5. The maximum number of infectious people on any given day is also likely to be behind us (model says this happened on or about April 2 or 3). The conclusion holds, but the date at which it occurred was April 7.
  6. The maximum number of newly Confirmed Cases on any given day is likely to have happened, be happening, or about to happen (model shows it somewhere in the span April 5, 6, or 7).  The conclusion holds, but the date at which it occurred was April 10.
  7. The maximum number of new deaths is likely to still be in the future (somewhere around April 8 - 11).  The conclusion holds, but the date at which it will occur is on or about April 16.
The other major change is that the length of time to fully clear the infection is MUCH longer.  I won't even say how far out the model predicts them, other than to say, some of them extend into 2021.  I should also mention that the predicted number of deaths has jumped to about 63,000.  Also the number of people actually infected (and almost entirely recovered) ends up being just under 150 million of the total population of 330 million.

I should also mention that two of the primary factors that influence the model are the number of total cases versus the number of confirmed cases.  These results are based on the same numbers I used last week.  Recently KT has pointed me to two rather interesting reports (Thanks!).  The first is a report that claims the mortality for CoViD-19 is around 0.06% (see the second bullet under the New Studies under the April 12, 2020 heading here).  That is in quite good agreement with the parameter I have been using (7.6%/137. ~ 0.056%) The second is a rather interesting report (even though somewhat anecdotal) from Chicago that 30 - 50% of the people tested shows signs of having had CoViD-19.  If the higher figure holds that suggests that the number of hidden cases might be significantly higher than the assumed parameter of 137 (perhaps as high as 190, but also as low as 110).  A few runs using the high end parameter resulted in a nearly identical looking fit, with essentially the same dates for the peak, but with slightly shorter time to fully clear the cases (but still out to December), around (but below) 60,000 deaths, and something like 190 million total cases.  .The lower value will push the numbers the other way.

Tuesday, April 7, 2020

CoViD-19 – Some “In-the-Box” Thinking


Words of Caution:

First let say the following things clearly and for the record:
  1. I am not an epidemiologist although I have had some familiarity with chemical kinetics equations, which are very similar to the epidemiology equations for disease spread,
  2. I do not warranty any of what follows,
  3. I strongly urge everyone to continue to follow the guidance of the US CDC and all guidelines and orders of Local, State, and Federal authorities, and
  4. I to specifically urge that you NOT use what appears here to make personal decisions.

Background:


During this time while I am stuck at home (i.e., In-the Box), I decided to occupy some of my time by trying my hand at modeling the current Coronavirus (CoViD-19) outbreak here in the US. After finding a free FORTRAN compiler, I started programming the equations that describe the standard epidemiology equations for disease spread. For simplicity I chose to use the discrete version rather than the differential forms. I have described the Model (both the equations and how I chose various parameters) at the end of this post, for those who care about the details.

Data:


The data used here are those available on two different websites for the number of CoViD-19 Cases and Deaths for the United States. The data labeled Worldometer Data is from the website here and the data labeled CDC Data is from the website here. The Worldometer website has interactive plots of both the cases and deaths, and the data I have here was extracted from them by hovering over the various points and recording the data. The CDC website has the case data available in a table that can be scrolled, but I have seen no historical data for the deaths, and as I have been recording those data only since 5 days ago I have not shown them on the plot, nor adjusted any of the parameters to try to match them.

Results:


Below are the current (April 6, 2020) plots for the available data and the model fits. I am satisfied that these are in sufficient agreement with the past that I am willing to say a few more things.

Confirmed Cases vs. Days since the 100th Confirmed Case for the US.

Deaths vs. Days since the 100th Confirmed Case for the US.

This model suggests that: 
  1. The stay-at-home orders and the social distancing being practiced across the country are having a clear and downward pressure on the cases and will also show a reduction in the deaths as well.
  2. There is some indication that there is some small positive influence on disease outcome (people not dying!) from whatever treatments are being provided.
  3. The number of infectious cases is still very high and these practices must be maintained for the foreseeable future if we are to clear the virus from our population.
  4. The maximum number of new cases (i.e., the number of people who get infected on a specific day, and hence are now infectious) is likely behind us (model says it occurred on or about March 28).
  5. The maximum number of infectious people on any given day is also likely to be behind us (model says this happened on or about April 2 or 3).
  6. The maximum number of newly Confirmed Cases on any given day is likely to have happened, be happening, or about to happen (model shows it somewhere in the span April 5, 6, or 7).
  7. The maximum number of new deaths is likely to still be in the future (somewhere around April 8 - 11).
  8. The last new case is likely to be infected on or about May 22nd (!).
  9. The last death may occur on or about the same day and will represent something like the 24,000th death.
  10. The last infectious person will clear quarantine some days after that (10-24 days are typical recovery times I've seen quoted, so that works out to June 1-15).
The bottom line I take from the model is that this epidemic is no where near done, although we may start seeing some positive news soon, possibly this week. Also, when things start to look really good we must all take a deep breath and continue the stay-at-home and social distancing for at least 2 weeks more than we will want - or seem to need. This will be critical in assuring that the last few cases find no new victims.

An additional result is that the model predicts about 82 million people will have had the disease and recovered. This will be interesting to compare to any retrospective antibody testing that are done on the population of the US.

Finally, I suspect that the model will have to be adjusted to reduce the drawdown factors for both cases and deaths, which will likely move all the dates which are in the future farther out (and will result in higher figures for both the number of people that recover and, unfortunately, the number that die. On the other hand, if these turn out (miraculously) to be correct, remember that you heard it here first.

The Model:

The variables:


J = day number
U(J) = number of uninfected people on day J
N(J) = number of newly infected people on day J
I(J) = number of infectious people on day J
C(J) = the number of people who are confirmed to have the disease on day J
TC(J) = the total number of confirmed cases through day J
D(J) = the number of people that die on day J
TD(J) = the total number of deaths through day J
AR(J) = the number of asymptomatic cases that recover on day J
CR(J) = the number of confirmed (=symptomatic) cases that recover on day J
R(J) = the total number of people who recover on day J
TR(J) = the total number of people who have recovered through day J

Assumptions:

  1. There are cases of CoViD-19 which can be either symptomatic or asymptomatic. For the purposes of simplicity the symptomatic people will all be assumed to be confirmed, and the asymptomatic cases will be assumed to not be confirmed.
  2. People who are asymptomatic are assumed to be infectious for 10 days(they recover on the 11th day).
  3. People who are symptomatic (and hence, confirmed) are assumed to be infectious for only 7 days as they are assumed to become confirmed on day 7.  All confirmed cases are assumed to either be in the hospital or in quarantine and, therefore, in both cases it is assumed that the conditions do not allow them to infect anyone.
  4. Confirmed cases all end up either recovering on the 24th day from their initial infection or dying on the 11th day after infection  (originally I assumed this would be around day 15, but to get a reasonable fit I had to change it). [Note: These lag times have no impact on the model  beyond which day they show up in the totals, as they were removed from the infectious pool when they were confirmed.]

The Equations: 

[Note: the differential equations are available by searching for the SIR model]

The Base Model



J = J + 1
N(i) = k1 * U(i-1) * I(i-1)
U(i) = U(i-1) – N(i)
C(i+7) = k2 * N(i)
TC(i) = TC(i-1) + C(i)
D(i+10) = k3 * C(i)
TD(i) = TD(i-1) + D(i)
AR(i) = N(i-10) – C(i-3)
I(i) = I(i-1) + N(i) – AR(i) – C(i)
CR(i) = C(i-24) – D(i)
R(i) = AR(i) + CR(i)
TR(i) = TR(i-1) + R(i)

Note 1: all these equations use integers as inputs and outputs. The math is done as real numbers and any non-integer part is included or excluded, as an extra person, based on a random number.
Note 2: In fact the calculation of N(i) is the sum of ten terms of this form, which actually use N(i-10) through N( i-1) and C(i-3) through C(i-1), vice I(i-1), as will become clear in the next section.

This base model basically results in essentially everyone in the model getting infected. So for the US total population of 330 million the death total comes out at slightly less than 200,000 deaths. This however does NOT fit the data, after about the first couple weeks. It just goes on up exponentially until U goes to essentially zero (actually 69).

Input Parameters/Information:


The initial condition is that U(0) = 330,000,000 and all others are 0. Then N(1) is set to 1, and off the model goes.

k1 is not just one number, but rather it is actually an array of ten multipliers based on which day it is from the the day a person gets infected. The array varies such that the actual infectious-ness of an individual is such that: for each of the 3 people they infect, they will infect (on average) 1/2 person total over days 1-3, 1/2 person over days 8-10, and the remaining 2 people are infected over days 4-7 (the numbers being the number days after they were infected). This is a crude approximation of the data given under the heading “'Characteristic' Infection progression in a single patient” here.

k2 = 90. / 3300. This based on the only study I have heard of that may properly define this ratio. This was data reported for the city of Vo, Italy. It was reported in the Wall Street Journal, WSJ. I saw it where that article was being discussed here). Hat tip: KT

k3 = 8% From the same WSJ article via the same blog post. Again Hat Tip: KT

Adjustments to the Base Model

These adjustment have been implemented to be able to match the actual number of cases and actual number of deaths over the current data set.

Adjustment 1: As can be seen in the plot for the number of cases, the initial exponential growth (the straight line in this log plot) begins to roll over on or about the 17th day (this is not clear in the graphs shown but if you look closely, on a greatly expanded scale, you can make it out). This date works out to March 20. This would seem to match fairly well with the onset of stay at home orders (for example NY Governor Cuomo on March 20 and CA Governor Newsom on March 19). The actual adjustment that gives an curvature rate that matches the data is to reduce the daily infection rate by a factor of 0.885^(i-16).

Adjustment 2: The actual death rate did not match using the crude figure of 8%. It required a reduction to 7.6%. This can easily be accounted for by any of several factors including: better more complete testing in the US vice what Italy had accomplished, a healthier population (Italy has a significantly higher rate of smoking), or some other minor factor (like simple rounding in the value of 8%).

Adjustment 3: Much as the number of cases was adjusted it became clear that matching the number of deaths requires an additional similar factor. The best match (by eyeball) uses it starting on day 23 (real world = March 26) and a factor of 0.96^(i-22). It is not clear what might case this, but I suspect that it is a reflection of some actual increase in the ability to treat patients.

Addendum:


This post, and the data and model results it is based on are through April 06 (actual data end April 05). Today, April 07, the data from both Worldometer and CDC show a significant spike upwards in both cases and deaths on April 06. The cause for this is unclear. I will not make any additional model changes for several more days.