Ohioan@Heart: California's Goober Doubles Down... I Do, Too.

Background and Bottom Line

Our pitiful excuse for a Goober doubled down on his recent 'rollbacks' due to increases in CoViD-19 cases (actually on his twitter feed he wrote "spread at alarming rates"). I made my case last week (here) for why I thought it was a mistake when he rolled back San Diego last week, and now he's doubling down. Well, two can play at that game.

I have a lot to say, but for those of you who don't want to read the whole thing, let me give the most important data right up front. I predicted the death totals for three consecutive weeks. Let's call them weeks 0, 1, and 2, since those were the numbers of weeks into the future the prediction was for. OK, predicting "0" weeks into the future isn't really a prediction, but it does help to validate the basis for the other predictions. So how have I done?

1) For week 0 (week ending 7/4/2020): I predicted, 33 deaths, and there were 27 observed deaths. Not bad. The prediction was 20% on the high side. This could be due to using data over the entire historical time to predict what is happening "Now". If the treatments were improving, or if there are simply far more tests finding a greater fraction of the cases (better/more complete contact tracing), then you have to expect an over estimation.

2) For week 1 (week ending 7/11/2020) I predicted 66 deaths. If we believe the 20% over estimation, maybe it should be 53. In any case the actual number of observed deaths for that week was 35. So actually it begins to look like I may be significantly OVER estimating the deaths. So when I claimed the Goober was imposing reopening rollbacks based on faulty metrics, I appear (at least so far) to be correct. Now he's increasing the rollbacks! Yet the raw prediction for the total for week 2 is still 61 (perhaps 49, maybe less), and we will see how that comes out.

Without bothering to 'show my work', I will state that I can now make a prediction for week 3 (week ending 7/25). It is 62.13, which I will round to 62. So there is still no prediction for a major rise in deaths and the Goober is reacting to a completely misleading metric (at least for San Diego County).

So far the score is: Goober: 0, Me: 1.

A More Thorough Analysis of the Prediction Methodology and Assumptions

If you go through my long post from last week you will see that I based my predictions on rates determined by dividing the deaths by the number of cases two weeks previously. That was of course a completely ad hoc assumption based on the reports that many deaths take 'weeks'. During this past week, I began to wonder, is that the right delay? Can I actually find an answer in the data?

KT was kind enough in the comments to the post last week to provide a pointer to Gummi Bear's Twitter thread where they talked about some observations they had made of CoViD data. In this there were lots of numbers and more than a few plots. The plot that struck me was a side by side comparison of deaths over time for New York City and Spain. Even though I had seen that plot for NYC before, the fact that the plots looked nearly identical triggered something.

Some time ago I had noticed (on the very excellent NYC CoViD site) that the curves for cases, hospitalizations, and deaths all had the same shape, but with the deaths clearly lagging the other two by about a week (don't just believe me: go to the site and scroll down to the "Daily Counts", it comes up on the Cases data, click on the Hospitalizations tab, see that it changes only modestly, then click on the Deaths tab, except for shifting to the right, it also doesn't really change). Anyways, that flash of memory suggested to me that the proper lag might be only 1 week, despite the stories of folks lingering.

Just trying to plot the raw daily data, with all the inherent noise seemed a fool's errand, but I decided that if I averaged the cases over 7 days, and the deaths over the same 7 days, I ought to be able to then plot the deaths vs cases, where if I delayed the cases by an optimal number of days I should be able to make a reasonable straight line. So off I went. To 'cut to the case', the optimal number, was, to my great surprise, four (4) days (see the plot below).

Plot of 7-day averages for deaths vs cases (delayed by 4 days) for New York City. The quality of the correlation is obvious.

Now, I concede that the data, particularly back in time, may well be less than directly comparable to today (that is, of course, exactly what I claim our Goober is doing, so I must be careful not to fall into the same trap). How might it be easily different and in a way that would effect the apparent lag time? One very obvious way (well obvious to some - Thanks, KT!) is that back then in NYC, the place was suffering from massive numbers of cases, and insufficient testing availability. If that meant that people were not getting tested as soon as they would today (likely!), then it would reduce the apparent lag versus what we'd observe today. So really I need to use the latest and best data I can get my hands on. For me that's the San Diego County data. While I've been saving lots of case data for the county for almost 4 months, I did not start to collect the daily demographics for deaths until this month. So I won't have sufficient data to say anything more definite for another week or two. I promise to do a follow on once I do.

Fool Me Once, Shame On You. Fool Me Twice, Shame On Me

In the previous post, and so far in this one, I have focused on only one metric. But let's take a quick look at all the metrics that San Diego County monitors, where we have gone into "alert" mode. The first is the same idiotic metric that our Goober is apoplectic about, namely that the number of cases is more than 100/100,000. The other two county metrics are "Case Investigations" and "Community Outbreaks".

The Case Investigations metric is indicating that the county isn't getting the contact tracing started fast enough. That's purely a function of the number of people doing the tracing and the number of cases. Clearly this is a reasonable thing to watch, and it is important to trace as fast as possible (to try to get out in front of the spread), but it isn't really contributing to our reclosing. The other metric, Community Outbreaks, is much like the case numbers in that the county is not comparing apples to apples. Whether the metric as formed (more than 7 outbreaks in 7 days) is reasonable, depends. For example they are defining an outbreak as 3 or more people, who are from different households, that test positive, and are associated to a specific location/event. A week ago the county reported 21 such outbreaks, 15 of which were associated to "restaurants/bars". Yesterday they reported 17 outbreaks, with 7 associated with restaurants/bars. So far, so good. A reasonable person should ask two questions: 1) Are they truly investigating ALL the possible sources of interactions? and 2) What would we expect from random chance of people who were already infected being in the same place at 'the same time'? I really want to tackle the second in some detail, but that distracts from the far more significant first question. Consequently, I have moved the entire discussion of the second question to the bottom of this post under the heading "Real Effect or Random Correlation?".

But we still have the first question I posed above to deal with: Are they truly investigating ALL the possible sources of interactions? Here I can give an absolute answer. NO. Actually, Hell No! Here the county (and state) have completely lost their moral compass and intentionally failed to perform their duty. They should all be be charged with malfeasance and fired.

But what could cause me to be so sure and so angry? Simple. They refused to even ask if any of the people testing positive attended any of the protests (a.k.a. the riots). As we know these were also attended by predominately the same age groups as were also frequenting restaurants and bars. However, due to the intentional malfeasance of the county health officials, we can never know for certain if these played a role in the spread. We can however demonstrate a smoking gun...

I can say definitively that the jump in cases is observed as having occurred during the 10 day period June 14 - 24. You can convince yourself of the same thing by merely looking at the plot below of positive test cases vs date reported.

Plot of the 3 day running average of the number of new cases versus date for San Diego Couny. The two red points represent the data for July 15 and July 25, in between which the entire rise of case numbers from about 120 to about 470 occurs.

The protests occurred mostly during the period June 1 - 13 (I did a web search for "protests San Diego" and just noting the dates of the reports). If we assume these were significant vectors for spread, we would have seen a rise that was most pronounced in precisely the date range we see it (in the plot above, the two red points are for 6/15/2020 and 6/25/2020). If the rise were due to restaurants/bars the rise would have a coincident rise point, but it would not have leveled off so abruptly, while an abrupt level off is completely reasonable for the end of the protests. In fact, had the cause truly been the restaurants and bars, the earliest we would have expected it to level off, assuming the Goober's reclosing of the bars and restaurants was the primary factor, would be about a week after it took effect, which would work out to... today, the 14th of July.

I see no way to argue anything other than: 1) the 'protests' were the primary driver of the case increase in San Diego County, 2) our Goober and County officials are punishing business that had, at most, a minor effect on the case rise, 3) to increase the closures, in San Diego County, due to disease "spread at alarming rates", is unjustifiable, alarmist, and borderline criminal, and 4) every government official who encouraged, or even merely acquiesced to, not asking about attendance at the 'protests' should be fired immediately for the damage to the people and business of San Diego County they caused by not properly doing their jobs.

Real Effect or Random Correlation?

I have tried to gather the best possible data to answer this question. So here goes, and please bear with me. After doing a bunch of web searches I have concluded that there were well over 180 restaurants before the first lockdown. That is the number of members in the San Diego County Restaurant Association (listed as "180+"). I do not know if a restaurant group (all owned by one person/entity) count as 1 or many. Clearly the number might be bigger, possibly way bigger. I'm also ignoring the number of non-restaurant bars (and that's a bunch too, but I can't really get a good handle on the number). I also don't know how close to capacity the restaurants have been during the reopened period (the news stories I've seen suggest that they've been near their capacity). I also can't say what the true capacity would be. I also don't know how close in time the people needed to be to count as 'at the same time' in the contact tracing. But let's take some educated guesses.

I know, from personal observation, that most of the restaurants I used to frequent would have signs listing their capacity. Those listed capacities were always over 100, usually more like 120. During the reopen period there were requirements to maintain social distancing, so the density was clearly reduced. I also know from the local media that many restaurants had been using outside spaces to add some of the capacity back. So let's take a swag at something like 50 being a reasonable estimate of the average capacity (ASIDE: This is a really important number as the more people there are, the higher the likelihood of random clusters, i.e., groups of 3).

We know that restaurants generally don't seat every table, then later clear out all tables and then reseat every seat and repeat. It is much more continuous than that. Let's guess that during the lunchtime, something like 100 people can be there close enough in time to be counted as 'at the same time'. I'd guess that dinnertime you'd see something larger, maybe two groups of 100. We also know, from the plot above, that the county has had a variable, but consistent case count over the last two weeks of between 400 and 500 (the daily average over the last two weeks is actually 471). We could just use this, however, when we start talking about the number that might be going out to restaurants it is likely to be too big. That '471' is all age demographics summed up. But we know that it is almost entirely the under 40 crowd, and truthfully almost entirely the 'over 20 and under 40' crowd that are going out to the restaurants. So let's just count the cases for those aged 20 - 40. The numbers are that over the last two weeks there were 3,296 cases in this age bracket, and the county population is (for the same age group) about 1,046,000. If we use these data to determine the number of active cases on any given day (by multiplying by an average duration of 10 days and then dividing by 2 to remove the identified cases - who presumably stop going out) we get 1,177 cases (infected but not identified) in a population of 1,046,000, or an average infection rate of essentially 0.11%.

Now we can calculate the expectation of finding at least 3 random people in a 'seating' of 100 as 0.0013. If we then take 3 'seatings' per day per restaurant and 180 restaurants we expect to see about 0.61 'outbreaks'/day (or 4 per week). Note these are 'outbreaks' where the association is not that they spread the disease while at the restaurant/bar, but rather that they just happened to randomly be in the 'same place at the same time'. I strongly believe that this estimate is a severe underestimation. I believe this because I expect that the cases will be concentrated in the fraction of the population that are going out (for example, I have 3 sons in this age group and NONE of them, their wives, and certainly not their children, are going out). I would think that I could easily be a factor of 2 or more low on the expected number of random 'clusters'. If so, then ALL the observed restaurant/bars 'outbreaks' of last week are actually just non-causal clusters. If the factor of 2 is correct, then there may be essentially no true Community Outbreaks, and even if 3 are real 'outbreaks', then it gets closer and closer to having the actual number of outbreaks to be below the metric. Anyways, yet again misuse of a metric is so easy to see, if you bother to look...

2 comments:

K T Cat said...: Wow. Brilliant work as always. I need some time to digest this.

My first, visceral reaction: Do they know? The more I see, the more I am coming to the conclusion that this is Satan from Milton's "Paradise Lost" come to real life. It's OK to wreck the state so long as you maintain power.; July 15, 2020 at 8:07 AM
Ohioan@Heart said...: My first reaction to your question is "Never attribute to malice that which is adequately explained by stupidity". Now I admit that we may have reached a point where it can’t be explained by garden variety stupidity, but rather requires stupidity compounded by massive denial.; July 15, 2020 at 9:00 AM