The first few graphs use the interactive [Stack Overflow Trends](http://insights.stackoverflow.com/trends) tool, which is all public data. This is useful since it lets readers modify the graphs (for example, to add a few more tags to the comparison).
The later graphs use data that's already not public (what tags users visit together, and what countries tags are visited from), so there's no reason not to use visits instead.
The graphs for visits over time do look very similar (in general question traffic by tag roughly matches questions asked, but as a slightly lagging indicator)
OP author here: the rivalries there are directional, "if you liked X, you probably disliked Y", but not vice versa. (One thing I find interesting is that most rivalries really are directional: you see git users disliking svn, but not svn users disliking git).
Perhaps there's a way I could make that clearer in the post!
> OP author here: the rivalries there are directional, "if you liked X, you probably disliked Y", but not vice versa. (One thing I find interesting is that most rivalries really are directional: you see git users disliking svn, but not svn users disliking git).
Awesome, thank you! I agree that the direction of the rivalry is an interesting part of it.
I just noticed that "backend : frontend" and "frontend : backend" both appear on the chart, which implicitly answered my question.
Perhaps another sentence or two that highlights this would help readers understand the directional nature of the chart. For example: These rivalries are directional, which demonstrates how rivalries are often asymmetrical. For example, as seen in the results chart, those who like backend have more distaste for frontend than vice-versa.
Yeah. I first observed this and was confused because the phi coefficient should be symmetric---I understood what's going on only after proof-reading the description (again).
OP author here; in another blog post I explored the growth in questions about some cross-platform solutions. Ionic (now Ionic2) has been growing quickly, and React Native is among the fastest growing tags on the site. So you may get your wish!
That was an early hypothesis I considered, but it doesn't fit the data. That would show up as a larger tabs/spaces salary gap for big companies, when in fact the gap is similar across all company sizes:
Perhaps it could play a role as a more indirect effect: people leaving Google/Microsoft/Apple/etc, starting new companies or joining senior positions at others, and spreading this particular practice.
This is not the reason for the correlation. The p-value of the tabs/spaces connection is about 10^-13 (one in 10 trillion), and it can be found separately within each (large) country.
This does not mean the effect isn't confounded with some other factor, but it does mean it's not a multiple hypothesis testing issue.
The latter is indeed what that graph is showing: what percentage of each language's users get stuck in Vim (or more precisely what % of their Vim visits are to that question). It's not being confused with the most used programming languages.
If you keep adding "magic" and doing careful research on what magic works and what doesn't, you end up roughly with the modern field of machine learning.
Random forests are a method that's often effective in taking into account many interactions among high dimensional data.
2. Oh, excellent! We hadn't found that or we'd have used it, and we'll start working with it.
3. Tomorrow I'm going to blog about how we approached the machine learning. Short version; we manually came up with regular expressions to classify a training set based on titles. The idea is that when we experimented with manual annotations on titles, the vast majority of the time we were looking for only a few key words. There's no question that this adds biases and will not be entirely accurate, but manual inspection convinced us it was a good enough approach for our hackathon, and most of the articles we identified with the resulting algorithm would not have been found by the title regex alone.
Oh, that was silly of us not to use BigQuery! I was just able to use that download a full million stories (though we still would have had the rate-limiting step of downloading the articles).
During a hackathon it can be hard to tell when to keep searching for an easy solution like that, as opposed to going with something slow you know will work- sometimes it turns out to be a dead end.