Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>> Some of the problems don’t matter as much if your goal for the model is just prediction, not interpretation of the model and its coefficients. But most of the time that I see the method used (including recent examples being distributed by so-called experts as part of their online teaching), the end model is indeed used for interpretation, and I have no doubt this is also the case with much published science. Further, even when the goal is only prediction, there are better methods like the Lasso, of dealing with a problem of a high number of variables.

I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

Second, I often see the method dogmatically shut down like this, in favor of lasso. Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified? I prefer the stepwise method though as you can visualize the benefit of adding in each additional feature. That can help to guide further feature development — a point that I’ve seen significantly lift the bottom line of enterprise scale companies.



> Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified?

Frequentist and Bayesian approaches often yield similar results but philosophically are different. In general I favor and recommend lasso because I see it perform as well or better than stepwise at variable selection but doesn't come with all the baggage.

Lasso avoid the multiple comparison problem by applying a regularization penalty instead of sequentially fitting multiple models and performing hypothesis testing. This also helps to prevent overfitting. If you want to see which variables would be included/excluded you can turn the regularization up or down (it is pretty easy to spit out an automated report).

Stepwise selection comes in different flavors: forward, backwards, or bidirectional; R-squared, adjusted R-squared, AIC, BIC, etc.; these often all lead to different models so the choices must be justified and I rarely see any defense for them.

Of course, if the point is prediction over coefficient estimation and interpretability then neither of these are great choices.


> I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

What the article is talking about is inference, not prediction. It's a different problem domain, it's not about telling a company whether design A or B leads to more engagement, it's about finding out about the (true!) causal drivers of that difference. The distinction may seem subtle but it's important. The key problems outlined all talk about common (frequentist) statistical tests and how they get messed up by variable selection. Holdout sets don't address this, because if the holdout set comes from the same distribution as the test set (as it should), the biases would be the same there. Bayesian inference isn't a panacea either, the core problem is structuring the model based on the data and then drawing conclusions about their relationships (Bayesian analysis gives you tools to help avoid this, but comes with its own set of traps to fall into, such as the difficulty of finding truly non-informative priors).


Yeah, the title is a bit hyperbolic. I have not used selection methods that much, but not too surprising they would have similar results to LASSO as selection or predictive method for people who think of it in terms of "feature development".

The distaste for step-wise selection comes from its typical use. If one reads Harrell's complaints quoted in the blog post carefully, quite many of them are less about the selection method but what analyst does with it, namely, interpretation of inferential statistics. When you see step-wise in the wild, practitioner often has used step-wise or other selection method and then reports the usual test-statistics and p-values for the final fitted model ... that are derived with assumptions that don't usually take into account the selection steps. It is quite unfortunate in fields where people put lot of faith in coefficient estimates, p-values and Wald confidence intervals when writing conclusions of their paper.

With LASSO and its cousins, the standard packages and literature strongly encourage the user to focus on predictions and run cross-validation right from the beginning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: