>> Some of the problems don’t matter as much if your goal for the model is just ...

ellisv · on Sept 17, 2024

> Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified?

Frequentist and Bayesian approaches often yield similar results but philosophically are different. In general I favor and recommend lasso because I see it perform as well or better than stepwise at variable selection but doesn't come with all the baggage.

Lasso avoid the multiple comparison problem by applying a regularization penalty instead of sequentially fitting multiple models and performing hypothesis testing. This also helps to prevent overfitting. If you want to see which variables would be included/excluded you can turn the regularization up or down (it is pretty easy to spit out an automated report).

Stepwise selection comes in different flavors: forward, backwards, or bidirectional; R-squared, adjusted R-squared, AIC, BIC, etc.; these often all lead to different models so the choices must be justified and I rarely see any defense for them.

Of course, if the point is prediction over coefficient estimation and interpretability then neither of these are great choices.

t_mann · on Sept 17, 2024

> I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

What the article is talking about is inference, not prediction. It's a different problem domain, it's not about telling a company whether design A or B leads to more engagement, it's about finding out about the (true!) causal drivers of that difference. The distinction may seem subtle but it's important. The key problems outlined all talk about common (frequentist) statistical tests and how they get messed up by variable selection. Holdout sets don't address this, because if the holdout set comes from the same distribution as the test set (as it should), the biases would be the same there. Bayesian inference isn't a panacea either, the core problem is structuring the model based on the data and then drawing conclusions about their relationships (Bayesian analysis gives you tools to help avoid this, but comes with its own set of traps to fall into, such as the difficulty of finding truly non-informative priors).

aqsalose · on Sept 17, 2024

Yeah, the title is a bit hyperbolic. I have not used selection methods that much, but not too surprising they would have similar results to LASSO as selection or predictive method for people who think of it in terms of "feature development".

The distaste for step-wise selection comes from its typical use. If one reads Harrell's complaints quoted in the blog post carefully, quite many of them are less about the selection method but what analyst does with it, namely, interpretation of inferential statistics. When you see step-wise in the wild, practitioner often has used step-wise or other selection method and then reports the usual test-statistics and p-values for the final fitted model ... that are derived with assumptions that don't usually take into account the selection steps. It is quite unfortunate in fields where people put lot of faith in coefficient estimates, p-values and Wald confidence intervals when writing conclusions of their paper.

With LASSO and its cousins, the standard packages and literature strongly encourage the user to focus on predictions and run cross-validation right from the beginning.