Hacker Newsnew | past | comments | ask | show | jobs | submit | var_explained's commentslogin

author here, let me know if you have any questions!


The first few graphs use the interactive [Stack Overflow Trends](http://insights.stackoverflow.com/trends) tool, which is all public data. This is useful since it lets readers modify the graphs (for example, to add a few more tags to the comparison).

The later graphs use data that's already not public (what tags users visit together, and what countries tags are visited from), so there's no reason not to use visits instead.

The graphs for visits over time do look very similar (in general question traffic by tag roughly matches questions asked, but as a slightly lagging indicator)


OP author here: the rivalries there are directional, "if you liked X, you probably disliked Y", but not vice versa. (One thing I find interesting is that most rivalries really are directional: you see git users disliking svn, but not svn users disliking git).

Perhaps there's a way I could make that clearer in the post!


> OP author here: the rivalries there are directional, "if you liked X, you probably disliked Y", but not vice versa. (One thing I find interesting is that most rivalries really are directional: you see git users disliking svn, but not svn users disliking git).

Awesome, thank you! I agree that the direction of the rivalry is an interesting part of it.

I just noticed that "backend : frontend" and "frontend : backend" both appear on the chart, which implicitly answered my question.

Perhaps another sentence or two that highlights this would help readers understand the directional nature of the chart. For example: These rivalries are directional, which demonstrates how rivalries are often asymmetrical. For example, as seen in the results chart, those who like backend have more distaste for frontend than vice-versa.


The ":" implies bidirectionality (or at least, non-directionality), you could clarify it in the graph by replacing ":" with "dislikes" or "->".


As a coder, : is pretty directional in my head.

A greater than sign would be explanatory tho.


Yeah. I first observed this and was confused because the phi coefficient should be symmetric---I understood what's going on only after proof-reading the description (again).


OP author here; in another blog post I explored the growth in questions about some cross-platform solutions. Ionic (now Ionic2) has been growing quickly, and React Native is among the fastest growing tags on the site. So you may get your wish!

https://stackoverflow.blog/2017/05/16/exploring-state-mobile...


That was an early hypothesis I considered, but it doesn't fit the data. That would show up as a larger tabs/spaces salary gap for big companies, when in fact the gap is similar across all company sizes:

https://twitter.com/drob/status/875493967865008129

Perhaps it could play a role as a more indirect effect: people leaving Google/Microsoft/Apple/etc, starting new companies or joining senior positions at others, and spreading this particular practice.


Only informative comment thread I've seen so far. I was wondering about this.


This is not the reason for the correlation. The p-value of the tabs/spaces connection is about 10^-13 (one in 10 trillion), and it can be found separately within each (large) country.

This does not mean the effect isn't confounded with some other factor, but it does mean it's not a multiple hypothesis testing issue.

You can explore the code and regression yourself! Take a look: https://github.com/dgrtwo/tabs-spaces-post


I've got a followup coming about what words lead to upvotes, and rust features quite prominently there!


The latter is indeed what that graph is showing: what percentage of each language's users get stuck in Vim (or more precisely what % of their Vim visits are to that question). It's not being confused with the most used programming languages.


Ah, excellent!


You mean Naive Bayes? Because it can't account for interactions between the effects of multiple words.


Just add some "magic", e.g. per response analysis and inter-response rule-based system.


If you keep adding "magic" and doing careful research on what magic works and what doesn't, you end up roughly with the modern field of machine learning.

Random forests are a method that's often effective in taking into account many interactions among high dimensional data.


Expert Systems "magic" predates neural networks by decades, being predictable and giving validable results (unlike most ML models).


One of the devs here.

1. That's the way we were thinking about it :)

2. Oh, excellent! We hadn't found that or we'd have used it, and we'll start working with it.

3. Tomorrow I'm going to blog about how we approached the machine learning. Short version; we manually came up with regular expressions to classify a training set based on titles. The idea is that when we experimented with manual annotations on titles, the vast majority of the time we were looking for only a few key words. There's no question that this adds biases and will not be entirely accurate, but manual inspection convinced us it was a good enough approach for our hackathon, and most of the articles we identified with the resulting algorithm would not have been found by the title regex alone.

You can see the table of regular expressions [here](https://github.com/dodger487/analyze_hn/blob/master/topics.c...) and a bunch of (pretty unstructured) analysis code [here](https://github.com/dodger487/analyze_hn/blob/master/hn-analy...).


This is awesome ! Congrats..

https://github.com/HackerNews/API

The firebase API is excellent. I have been using that to keep http://searchhn.com up to date in real time.

Also big query is updated every day with all comments and posts. https://bigquery.cloud.google.com/dataset/bigquery-public-da...

This is what I started with to update the Searchera (https://searchera.io) index which powers Searchhn


Oh, that was silly of us not to use BigQuery! I was just able to use that download a full million stories (though we still would have had the rate-limiting step of downloading the articles).

During a hackathon it can be hard to tell when to keep searching for an easy solution like that, as opposed to going with something slow you know will work- sometimes it turns out to be a dead end.

Thanks for the recommendations!


I've now blogged in more detail about building Tagger News- check it out here! https://news.ycombinator.com/item?id=14343854


Hey mate, you should follow this guide step by step when you deploy a django app: https://docs.djangoproject.com/en/1.11/howto/deployment/chec...

BTW, congrats for the projects, well done!


The Awful Reign of the Red Delicious (2014) (theatlantic.com) is tagged 'Microsoft' 'Apple'

Might wanna tweak that...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: