myrelaxsauna.com

Understanding Scientific Studies: A Comprehensive Analysis

Written on

Understanding Scientific Studies

This article provides a thorough overview of the well-known series "Studying Studies" by Dr. Peter Attia.

All quotations, unless otherwise mentioned, are from Dr. Peter Attia’s proprietary series titled "Studying Studies." This summary was created to clarify my own understanding while navigating scientific research and to assist others in critically evaluating scientific claims.

Part I — Relative Risk Versus Absolute Risk

Relative risk is frequently employed to report study results, likely due to its catchiness. For instance, a study might indicate a relative risk of 85% ([Control — Treatment] / Control) while the absolute risk could be just 0.63% (Control — Treatment). This discrepancy highlights how misleading it can be when examining event rates!

> “Always keep absolute risk in mind when you hear about the risk of an event increasing or decreasing.”

Example: 1. Relative Risk: A new medication decreases cancer occurrence by 50%. 2. Absolute Risk: The same medication lowers cancer occurrence from 2 per 1000 to 1 per 1000.

While both statements convey the same outcome, the first can be deceptive.

As we delve into this analysis, you will not only become familiar with the prevalent misconceptions found in scientific literature, but also enhance your ability to differentiate between credible and misleading statistics.

Part II — Observational Epidemiology

> “The risk of harm from an intervention is meaningless in a vacuum without understanding the risk of not intervening.”

When examining epidemiological studies, an initial assessment might refer to Bradford-Hill's nine criteria to determine causation. However, this list is not exhaustive.

Observational studies typically do not imply causation and are mostly expressed in terms of relative risk (as previously noted). Additionally, observational epidemiology generally emphasizes hypothesis-testing rather than hypothesis-generating.

Richard Feinman articulated this well: > “If you observe that children who consume a lot of candy appear overweight, or if you find that candy causes you to gain weight, that observation leads to a hypothesis that sugar may cause obesity. Testing this hypothesis involves examining the correlation between sugar intake and obesity rates. Various comparisons can be made, but with an infinite number of independent variables, your focus might be solely on candy. As Einstein noted, ‘your theory determines the measurement you make.’”

Most observational epidemiology cannot definitively establish cause-and-effect relationships, complicating the acceptance or rejection of any hypothesis.

What types of studies warrant closer attention? Randomized-controlled trials.

Part III — The Rationale Behind Observational Studies

Randomized-controlled trials are often hailed as the gold standard for assessing risk within studies, allowing for a clearer distinction between cause and effect.

However, their usage is limited due to several factors: 1. High costs (large trials can average around $208 million) 2. Extended duration (average of approximately 5.5 years) 3. Difficulties in execution (numerous challenges and confounding variables) 4. Ethical limitations in testing harmful outcomes

> “[T]hey can’t assign a group to an intervention that they expect will do harm to the people. In this sense, RCTs run counter to epidemiology: RCTs aim to establish cause-and-effect relationships that benefit individuals, while epidemiologists seek to identify associations that may harm populations.”

Consequently, observational studies remain popular as they can investigate risks and potential benefits of substances like drugs or food. However, critics have pointed out that claims made in observational studies often fail to hold up when tested in randomized trials, with a replication success rate of zero out of fifty-two.

To address this gap, epidemiologists often examine retrospective and prospective cohort studies (though these methods have their own biases).

Retrospective and Prospective Cohort Study Biases

While both approaches provide valuable insights, they are not without biases:

  • Healthy-User Bias: Individuals who are healthy in one aspect tend to be healthy in others, complicating the isolation of healthy lifestyle effects on health.
  • Confounding Bias: An overlooked variable may falsely imply a relationship between other variables.
  • Information Bias: Distortions in results due to inadequate or inaccurate information.
  • Reverse-Causality Bias: Misinterpretation of cause-and-effect order; for example, did diet soda consumption lead to obesity, or vice versa?
  • Selection Bias: The likelihood of participants agreeing to partake in a study, which can introduce bias, sometimes related to those lost to follow-up.

Retrospective Cohort Studies

In retrospective cohort studies, researchers analyze past data in light of current outcomes. However, these studies are susceptible to confounding and bias.

Prospective Cohort Studies

These studies differ as they involve planning, participant recruitment, and baseline data collection.

They include both: 1. Inclusion: Criteria for participant eligibility. 2. Exclusion: Criteria that disqualify participants from the study.

Prospective studies help minimize biases like selection bias, but they still lack random assignment.

Part IV — Randomization and Confounding

Randomization is crucial for distinguishing cause and effect, as it assigns participants to treatment groups at random. This method helps eliminate confounding variables, leading to more reliable results.

Successful randomization can preemptively control for biases and confounding factors through careful design. Reducing the impact of confounding variables is essential for researchers. These variables are not always foreseeable, making it vital to manage them effectively during study planning.

Researchers aim to account for potential confounders in three ways: 1. Including measures of these factors in regression models — Logistic Regression is adept at handling complex data relationships. 2. Stratifying data based on known demographic variables like age and gender — Stratification creates groups where the confounder remains consistent. 3. Utilizing multivariate models — These statistical analyses handle multiple variables simultaneously, providing insights into relationships between factors.

> “You can’t always eliminate every confounding factor because you don’t know what you’re not looking for.”

However, many studies fail to adequately address confounding bias. In fact, nearly 30% of observational studies neglect to mention it entirely.

Dr. Attia elaborates on this issue: > “In 2013, authors examined the NEJM, Lancet, JAMA, and the Annals of Internal Medicine, searching for observational studies published in 2010. They found that in 56% of the 298 studies from these prestigious journals, the authors recommended a medical practice based on their findings. Yet only 14% of these studies suggested that a randomized trial was necessary to validate their recommendations.”

Consider this: fewer than 25% of these papers advised subsequent confirmation through randomized trials, which are arguably the most reliable method for validating results.

Within observational epidemiology, two major threats are: (a) Illusory superiority, or (b) The illusion of control.

In both scenarios, researchers often overestimate the quality of their own research when accounting for the limitations of their methods.

Importantly, we should remember that we are often more incorrect than correct in science.

> “An association alone can almost never necessarily be proven right and it’s nearly as difficult to prove an association wrong [...] The vast majority of associations, even those deemed ‘significant,’ are not causal.”

Part V — Power and Significance

It is crucial to recognize that approximately 96% of biomedical literature reports “statistically significant results.” If everything is significant, then nothing is.

Results typically stem from experiments, which formally test specific hypotheses. The default assumption is that no relationship exists between two data points, and experiments aim to refute this.

This is referred to as the null hypothesis. Researchers strive to reject the null hypothesis to a degree that signifies statistical relevance.

This is where the p-value (probability value) comes into play. It quantifies the significance of results, representing the likelihood of rejecting the null hypothesis when it is actually true.

In simpler terms... > “[W]hat is the probability that the observed effect (i.e., the difference between groups) is not simply due to chance?”

A low p-value (e.g., p < 0.05; for instance, p = 0.001) suggests that the results are less likely to be due to random chance, thereby providing less support for the null hypothesis.

A low p-value indicates that a correlation is more likely to exist.

A high p-value (e.g., p > 0.05; for example, p = 0.12) suggests that the results are more likely attributable to chance, thus offering greater support for the null hypothesis.

A high p-value implies that a correlation is less likely to exist.

Significance Level

The significance level (also known as alpha [?]) is typically set at 5%, or p < 0.05. If the p-value falls below 0.05, the results are deemed statistically significant. Otherwise, the null hypothesis is accepted.

For instance, if the observed p-value is 0.01, or 1%, this would be considered statistically significant.

Confidence Intervals

Simply put, the confidence interval (CI) is calculated as [1 - alpha (?)]. Given the significance level set at 0.05 (5%), the CI would be [1-0.05], equating to 0.95 (95%).

Dr. Attia provides the following example: A study investigating the link between a daily intake of 100g of red meat and colorectal cancer yields a result of “1.17, 95% CI (1.05–1.31),” indicating statistical significance.

How do we determine this? The CI (95%) does not include the threshold of 1.00 (100%); it falls within the range of 1.05–1.31. Additionally, a positive association is suggested since the results exceed one.

The value of 1.17 indicates a 1.17-fold increase in risk for someone consuming 100g of red meat daily (implying a 17% increased risk [1–1.17 = -0.17, or 17%] of developing colorectal cancer).

Conversely, if the observed relative risk were less than 1.00, it would imply a negative association; for example, a result of “0.89, 95% CI (0.77–1.07)” would suggest an 11% decrease in risk (1–0.89 = 0.11, or 11%).

However, this result is not statistically significant, as the CI value crosses 1.00 (100%); it lies within the range of 0.77–1.07, with 0.77 on the decreasing side of 1.00, and 1.07 on the increasing side.

Statistical Significance Versus Practical Significance

“Significant” implies meaning, but a significant statistical result does not always equate to practical relevance.

This distinction is often misunderstood, even among those reporting findings. Statistical and practical significances are not mutually inclusive.

Focusing on studies that utilize the Bonferroni correction can be beneficial. This adjustment indicates that while the p-value is usually set at 0.05, if researchers explore multiple hypotheses, the p-value is divided by the number of tests conducted.

This elevates the threshold for significance, indicating that even if noise exists in the data, the 0.05 p-value shows that at least one of the tests may yield statistical significance.

However, if a study lacks a sufficient number of subjects to detect a “statistical” effect, but the difference between groups—the effect size (which measures the relationship strength between two variables in a population)—is considerable, the study may not be statistically significant but still practically relevant.

Even if results are deemed statistically significant, it’s crucial to assess the magnitude (effect size), the sample size, and the predefined significance level (usually set at 0.05, or 5%).

For instance, if the sample size is excessively large (with thousands or millions of subjects), it’s possible to achieve statistical significance without practical relevance; i.e., significant, but does it really matter?

When a study is reported as “statistically significant,” consider: 1. What is the effect size (magnitude of difference)? 2. What was the sample size (n)?

Conversely, if a study is reported as “not statistically significant,” ask: 1. What effect size was the study designed to detect? 2. Is there a chance that the effect was present but simply below the detection threshold?

Statistical Power

Statistical power refers to the likelihood that the study will successfully identify a true positive (an actual effect).

Power analysis can be conducted before the study to ensure appropriate parameters, using the calculation (Power = 1 - ?), where ? represents the probability of making a false-negative (type II) error.

Statistical power is influenced by: 1. Probability of false-positive (type I error); alpha (?) 2. Sample size (n) 3. Effect size (the magnitude of difference between groups) 4. Probability of false-negative (type II error); beta (?)

The predetermined significance level is alpha (?), while beta (?) indicates the likelihood of a type II error.

The ultimate aim of power analysis is to balance the incidence of type I and type II errors. To minimize the risk of type II errors, studies are structured to achieve an 80 percent probability of detecting genuine effects. An adequately powered study typically has a probability of 0.8, implying that in an ideal scenario, there is only a 1-in-5 chance of a false negative result.

However, sample size (n) can significantly impact the power of a study. Small effect differences in larger studies can yield statistically significant results, while smaller samples require greater effect differences to achieve statistical significance.


Ultimately, a study may be statistically significant, possess substantial data, and have high confidence in its findings, yet still be practically irrelevant, weak, or uncertain. Conversely, statistically weak studies may reveal powerful and meaningful insights regarding their subjects.

Thank you for reading! If you found this article helpful, please consider showing your support with a clap and a follow!

For more content like this, check out my detailed review of Tim Ferriss’ “Geek to Freak” experiment.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

A Comprehensive Overview of Data Orchestration in 2024

Discover the essentials of data orchestration, its significance, and the evolution of tools in 2024.

The Hard Truth About Your Writing and How to Improve It

Understand the reasons behind your writing rejections and how to address them effectively.

Exploring the Most Engaging Articles on Medium: Insights Revealed

Discover which types of articles attract the most readers on Medium and gain insights into effective writing strategies.

Innovative MIT Ink: Reprogrammable Colors at Your Fingertips

MIT has developed a light-sensitive ink that changes colors on command, allowing for endless customization of objects.

Mastering Your Commitment: A Guide to Effective Planning

Discover the essential steps to follow through on your plans and make effective changes in your life.

Albert Einstein: Brilliant Mind, Troubled Heart—Lessons on Love

Explore the paradox of Einstein's genius and his tumultuous love life, revealing lessons on love's complexities and the need for emotional connection.

Navigating the Complexities of Social Media Addiction Responsibility

Examining the role of marketers in social media addiction and its impact on mental health.

Crafting My Coaching Journey: Week 2 Insights on Chocolate Broccoli

Exploring insights from coaching calls and strategies for effective client engagement.