Why Dietitians are Today's Witch Doctors
Health is an incredibly complicated thing. Heart attacks are one of the largest causes of death in the USA (14%) but what causes a heart attack? Well, no one is sure exactly but there are thought to be a variety of causes. Weight. Genetics. Stress. Smoking. Eating habits. Sleeping habits. Blood pressure. How much time you are sedentary. I could go on. But how do we determine those causes? It is a very hard thing to do when you think about it. Let's say you take a large sample of people (N=500) to study. How do you know what they eat? How do you know how much they sleep? How do you know how stressed they are? How much they sit during the day? You could follow them around with clipboards.... but for how long? If I smoke 10 packs a day for 40 years and then quit right before the study, the observer would not know how much I smoked unless he asked me. So the expensive part (paying someone to follow around a person) would not give you the most important information.
To get the most important information you would need to survey your subjects. You need to ask them. How much soda do you drink? How many cigarettes? Is heart disease common in your family? But this leads to a challenge. Most people do not remember their exact diet. Do not remember exactly how much they smoked. Many do not know the health issues of estranged uncles or aunts. Further, there is a moral hazard in that most people know what healthy living is supposed to look like and might exaggerate one way or the other to tell a story ('oh, I smoked a lot... two packs a day). How much can you trust that?
And this is why diet and health advice is notoriously continually changing. When I was a kid, eggs were lethal heart attack poison. Now, eggs are the very picture of healthy food. Low fat was the way to stay healthy back in the 90s. Now almost no one talks about low fat. Vegetarianism used to be considered healthy. Then it was considered unhealthy. Now veganism is making a push to be considered healthy.
There is a lot of scienitific talk mixed in all this. A confident scientist has analyzed that egg in a lab and determined that it has way too much cholesterol. He turns to you and explains that when there is too much cholesterol in your blood, it builds up in the walls of your arteries, causing a process called atherosclerosis, a form of heart disease. Huh. Wow. That seems like a scientific fact to me. I will quit eggs. But as we know that scientist was wrong. Apparently the cholesterol in eggs is the 'good' kind and it will not build up on your artery walls.It turns out both the food and the human body were more complicated than that scientist thought.
But.... there is another problem. That calculation is a single variable. Modify your egg input and get a result. But what happens if you put salt on your eggs? Or eat them with potatoes? Or maybe put ketchup on them (like a monster). What if I eat eggs while living a active life? What if I sit around all day? How does this incredibly complicated mix of activities and dietary side dishes affect the impact eggs have on my system?
So the scientist imagined that the equation for, "are eggs healthy?" looked like this:
y = f(x)
Where y= health and x = egg intake.
But in reality the equation looks like this contains many non-linear variables. It looks more like this:
Where y= health and x = egg intake, z=ketchup, a = hours a week exercising, b = potatoes as a side c = genetic ability to deal with cholesterol etc, etc, etc.
Here is a photo of the scientist writing the actual equation for determining whether eggs are healthy:
So.. the scientist is not playing with one variable but countless variables interacting in countless ways. Actually finding the equation is impossible so instead the scientist needs to use "big data."He takes a huge bunch of data (collected from surveys) on the activities, eating habits, and genetics of people, tries to control for anything else he thinks is important, and then looks for the effects of eggs. For example, he might take people between 55-70 who are fit, have no other health issues, and who eat eggs daily. But does that group all add salt to the eggs? Do they add ketchup?
"Doesn't matter," the scientist says confidently. "I plotted egg intake in that group and showed that egg intake positively correlated with heart attacks and the R-Squared factor was fairly high."
Hmm. I guess that settles it. Right?
No it does not. The reason it does not is two fold:
1) Correlation is not causation.
2) Correlation is rarely actually correlation.
Most people are familiar with the first phrase. You can show that ice cream intake correlates with shark bites but the cause of shark bites is not ice cream but a tertiary cause (warm weather causes both more swimming and more ice cream consumption). Even if eggs and heart attacks had perfect correlation, there might be some tertiary cause.
But the second phrase, most people do not know. Most scientists are bad at stats. Most assume that if you take two variables and determine that there is some relationship between them (say a correlation of 0.5) that you can then draw conclusions from that. But this is not so in many cases. This actually assumes some things that may not be true. It assumes no sub sampling and strict linearity. The best explaination of the problem here that I have found is Anscombe's Quartet.
Consider these four graphs:
Four very different sets of data with the same exact mean, standard deviation, and correlation. And yet a quick look at these data and you see that there are clearly different things going on. If this was egg usage versus heart attacks, quadrant I would indeed appear to suggest that egg usage more or less led to more heart attacks. But what about quadrant II? That would appear to suggest that there is a bad range of egg usage somewhere in the middle and that people should either eat no eggs or a lot of eggs. Consider quadrant VI, this would suggest that long as you don't eat 3 eggs a day or 9, you are good. But even these conclusions are faulty because the data may be a subsample of a much large picture that looks quite different. Consider these data:
Based on that... it looks like every egg is damn near one step closer to death.
But I pulled that data from these:
Which obviously tells different story - egg intake only matters in that narrow range and does not get worse with increased consumption (it gets better).
So....and none of this takes into account all the other variables.
With these things in mind, we can see why diet science is not the same thing as calculating the motion of the planets or other relatively simple physical science equations. Science that depends on mining data - especially unreliable data drawn from surveys - to get conclusions is almost always quackery. The Oracle of Delphi. Witch doctors.