Statistical Thinking

This blog is devoted to statistical thinking and its impact on science and everyday life. Emphasis is given to maximizing the use of information, avoiding statistical pitfalls, describing problems caused by the frequentist approach to statistical inference, describing advantages of Bayesian and likelihood methods, and discussing intended and unintended differences between statistics and data science. I'll also cover regression modeling strategies, clinical trials, and drug evaluation.

Monday, January 16, 2017

Ideas for Future Articles

Suggestions for future articles are welcomed as comments to this entry. Some topics I intend to write about are listed below.

The litany of problems with p-values - catalog of all the problems I can think of
Matching vs. covariate adjustment (see below from Arne Warnke)
Statistical strategy for propensity score modeling and usage
Analysis of change: why so many things go wrong
What exactly is a type I error and should we care? (analogy: worrying about the chance of a false positive diagnostic test vs. computing current probability of disease given whatever the test result was). Alternate title: Why Clinicians' Misunderstanding of Probabilities Makes Them Like Backwards Probabilities Such As Sensitivity, Specificity, and Type I Error.
Forward vs. backwards probabilities and why forward probabilities serve as their own error probabilities (we have been fed backwards probabilities such as p-values, sensitivity, and specificity for so long it's hard to look forward)
What is the full meaning of a posterior probability?
Posterior probabilities can be computed as often as desired
Statistical critiques of published articles in the biomedical literature
New dynamic graphics capabilities using R plotly in the R Hmisc package: Showing more by initially showing less
Moving from pdf to html for statistical reporting
Is machine learning statistics or computer science?
Sample size calculation: Is it voodoo?
Difference between Bayesian modeling and frequentist inference
Proper accuracy scoring rules and why improper scores such as proportion "classified" "correctly" give misleading results.

A few weeks ago we had a small discussion at CrossValidated about the pros and cons of matching.

http://stats.stackexchange.com/questions/248676/analysis-strategy-for-rare-outcome-with-matching

I am sorry that I did not had enough time to elaborate further on the support of matching procedures (in my field researchers do not focus much on a bias-variance tradeoff but they prioritize on minimizing biases. For that reason, they like matching procedures).

Now, I have seen that you started a blog recently (congratulations!). I would like to encourage to take up the topic of matching because it is probably interesting for many applied researchers.
I think in your ‘philosophy’, this would belong to the point “Preserve all the information in the data”.

Here, perhaps some input for a blog post. Back then, you wrote:

Matching on continuous variables results in an incomplete adjustment because the variables have to be binned.

What about propensity score matching?

Matching throws away good data from observations that would be good matches.

I agree

Extrapolation bias is only a significant problem if there is a covariate by group interaction, and users of matching methods ignore interactions anyway.

Here, you go too far (in my view). You can add interactions, again for example with propensity score matching. Imbens and Rubin (2015) suggest a procedure using quadratic and interaction terms of the covariates.

Comment: Nice to know this exists but I've never seen a paper that used matching attempt to explore interactions.

If you don't want to make regression assumptions that are unverifiable, remove observations outside the overlap region just as with matching.

Which assumptions do you refer to? I think that treating everyone the same (statistically) is also an unverifiable assumption (do you disagree?). What is your opinion about weighted least squares?

Comment: This is the no-interaction assumption. If you assume additivity then it's more OK to have a no-overlap region, otherwise throw-away non-overlap regions and do a conditional analysis. Not clear on the need for weighting here. In general I like conditioning over weighting.

Arne Jonas Warnke

Labour Markets, Human Resources and Social Policy
Internet: www.zew.de www.zew.eu

37 comments:

KevinJanuary 16, 2017 at 7:52 PM
I have read enough to know the pitfalls of using null hypoth testing. But as a teacher of stats in HS, the texts are focused on this process for inference. So is the AP exam the students take.

My question is....what would you do as a teacher in my position?

Thanks.
ReplyDelete
Replies
AnonymousJanuary 16, 2017 at 7:59 PM
In your initial post, you identified a major problem in our scientific culture

"Statistics has been and continues to be taught in a traditional way, leading to statisticians believing that our historical approach to estimation, prediction, and inference was good enough."

It's worse than that. Traditional statistics is what we're teaching our undergraduates in business and the sciences, so we're perpetuating ideas that were already threadbare decades ago (NOW we're worried about p-values? After a hundred years?)

Much of the appeal of the Fisher frequentist methods is that they can be applied by anyone competent in basic algebra, and can be taught as rote formulas, requiring only a simple calculator and a few tables of critical values. And colleges have seized upon this as a way to promote "quantitative literacy," feeding cargo-cult statistics to the math-averse undergraduate masses.

Teachers of statistics need ways to starting changing the direction of statistics education towards modern techniques. But it can't start with requiring everyone to complete a year of calculus first (even though I'd make it a requirement to graduate, or even to vote).

Mike Anderson
University of Texas at San Antonio
ReplyDelete
Replies
UnknownJanuary 16, 2017 at 8:44 PM
Frank:

I am glad to read this new blog. Concerning future topics: I would be interested in your views on clinical trial design. Perhaps not so much on large Phase 3 type trials but more on the role of statistics in smaller, more exploratory Phase 2 trials. How can we get away from hypothesis testing of the null in these situations? How should we design and analyze trials for rare diseases?

Roy Tamura
University of South Florida
ReplyDelete
Replies
Daniel LakensJanuary 17, 2017 at 12:31 AM
I've long wondered why many statisticians embrace Bayesian statistics (and many have done so for decades), but the FDA does not (yet, as I understand it) fully Bayesian inferences. The FDA seems to value error control. Maybe this is related to point 5? Showing concrete examples of how Bayesian statistics would improve cumulative science would also be a strong argument towards adopting those practices (preferably illustrated with real lines of research). I think most of your readers are not novices at statistics - criticism on NHST is available in dozens of articles. I would personally feel that writing about NHST is a bit of a waste of time (I'm sure you can point out the most important issues as sidenotes in other blogs). Best, Daniel
ReplyDelete
Replies
Frank HarrellJanuary 17, 2017 at 6:50 AM
I can't disagree with any of that. I'm spending a lot of time discussing p-values and NHST in an attempt to show the emperor has no cloths and we need to change not just for the sake of change but to have better solutions with clearer interpretations. You're right about the perception of 'error control' driving many choices of statistical approaches, in industry, academia (e.g., NIH-funded research), and regulatory. Few people think about whether the false positive referred to in type I error is really an error. I'll be writing more about advantages of direct forward probabilities because in my opinion what really needs to be known is the probability that the conclusion you are about to make is true, not the probability of getting data more extreme than the current data if the effect happens to be exactly zero. The probability of being wrong about efficacy is quite simply one minus the posterior probability of efficacy given the data. In the future I'd like to expand on what type I error means.
ReplyDelete
Replies
ramirobJanuary 17, 2017 at 2:39 PM
Hello, I would be interested in posts about how to "do things right". For example, an analysis of a dataset done properly, with some of the subtleties and nuances explained. In addition, I haven't heard much about design of experiments geared towards a bayesian approach.
ReplyDelete
Replies
Bill RJanuary 18, 2017 at 10:34 AM
Forward and Backward probabilities, I like that. My personal view is similar: "Probability is the future tense of a proportion." P-values are proportions and calling them probabilities just confuses everything.
ReplyDelete
Replies
UnknownJanuary 18, 2017 at 2:34 PM
Dr. Harrell, I'm delighted that you've started this blog. I would be very interested in your view of David Glass's critique of the hypothesis as a framework for experimentation (as opposed to "the question"). Dr. Glass teaches experimental design at Harvard & his views surprised me & got me thinking. I've included two sources (a Cell paper & a Clinical Chemistry paper) in case you haven't run across his perspective. Thanks in advance for considering this topic for your blog. Go 'Dores. Ihahttp://www.sciencedirect.com/science/article/pii/S0092867408009537 http://clinchem.aaccjnls.org/content/clinchem/56/7/1080.full.pdf
ReplyDelete
Replies
Frank HarrellJanuary 18, 2017 at 7:01 PM
WOW I've been looking for just this type of paper for years. From the title I think I'm going to really like it. I really don't like straw man hypotheses and frequently tell investigators to state a question or sometimes better, state the quantity you want to estimate (often an effect such as a treatment difference). I'll read that paper as soon as I can. Thanks!
ReplyDelete
Replies
UnknownJanuary 18, 2017 at 9:03 PM
Terrific! If you happen to find those papers interesting, I can highly recommend Dr. Glass's book, Experimental Design for Biologists, Cold Spring Harbor Press. It's expansive in scope, spanning philosophy of science to how a Western blot should be designed to considerations relevant to clinical trials. It doesn't dwell on any one topic terribly long, but it's packed with interesting ideas, as you can see from the Table of Contents: http://www.cshlpress.org/default.tpl?cart=148479481242280267&fromlink=T&linkaction=full&linksortby=oop_title&--eqSKUdatarq=1020 One other point I would make: his checklist of experimental design is extremely good & is worth the price of the book, even if it's only a page or so.
ReplyDelete
Replies
Moritz KörberJanuary 19, 2017 at 9:57 AM
1. Error control in Bayesian statistics

2. Robustness of Bayesian statistics: What to do if the assumption of the models are not met?

3. non-central distributions
ReplyDelete
Replies
AnonymousJanuary 19, 2017 at 1:08 PM
1. Is machine learning statistics or computer science?
2. Methods for sample size calculation (art or science)?
3. Statistical Inference
4. Is logical statistical reasoning reductio ad aburdum? Does it thus always and unavoidably have an agenda?
5. The difference between Bayesian and Frequentist statistics (to be honest, from what I understand Frquentist seems plausible while Bayesian is just vodoo ...)
6. Analysis of contigency tables
7. The p value (and its criticisms)
8. The NHST framework
9. Scientific and statistical misconduct and fraud of the pharmaceutical/nutritional/fitness industry
ReplyDelete
Replies
Frank HarrellJanuary 19, 2017 at 4:32 PM
I'm interested in 1, 2, 5, 7, 8. I'm almost finished covering what I wanted to cover for 7 and 8 after two upcoming posts.
ReplyDelete
Replies
LeonJanuary 22, 2017 at 1:09 AM
This comment has been removed by the author.
ReplyDelete
Replies
LeonJanuary 22, 2017 at 1:09 AM
I'd be interested in how you think about the bootstrap, given your distaste for "backwards" probabilities. The bootstrap seems to me entirely backwards, since it's all about variation in data-space rather than unknown-space.
ReplyDelete
Replies
tgsJanuary 23, 2017 at 4:25 PM
Suggested topic: If I give up p-values, do I also have to give up confidence intervals? Is an "estimation and uncertainty" approach also inherently flawed, even if there are no p-values?

(We've had a brief discussion about this before, but I'm still fuzzy on the issue.)
ReplyDelete
Replies
Frank HarrellJanuary 23, 2017 at 4:48 PM
I'm of two minds about this. (1) CIs are almost impossible for non-statisticians to perfectly understand because of their indirectness. They a lot of the same problems as p-values. (2) they are better than p-values and can be interpreted not matter how large the p-value; so they are a bridge to better methods with likelihood and Bayes.
ReplyDelete
Replies
tibuJanuary 28, 2017 at 11:40 PM
can you explain the concept of "degree of freedom"?
ReplyDelete
Replies