Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GitHub Recommended Repos Contest (contest.github.com)
71 points by pjhyett on July 29, 2009 | hide | past | favorite | 20 comments


"GitHub must be allowed to use the code commercially without restriction, regardless of the license choosen."

I'm not too crazy about this. Contests can be good and bad at the same time. "Here guys, solve this very hard problem for us for free and we'll take it off your hands and we'll completely ignore your licensing wishes and use it without any restrictions whatsoever."

I envision a lot of "Well if you don't want to participate, DON'T!", but the point of the matter is, I see GitHub as taking advantage of a warm and supportive open-source community. The license of the submitted code should apply to everyone, including GitHub!


Though much of GitHub is open-sourced, the core app itself is not (yet). We just don't want to not be able to use the winning entry to best improve the site for the 150,000 open source projects we provide free hosting for because the winning code is GPLd or something.

We are trying to encourage the development of open source solutions to this problem so everyone can benefit, though - all entries have to be open source at the end of the contest, that is a rule. We want everyone to learn and be able to use the results of this contest. I would encourage the use of more permissive licenses, but I didn't want to enforce it, thus the disclaimer exempting us. Otherwise the 100,000 users we provide free Git hosting for don't get to benefit from the best solution, either.


Free? If you win, you get a lifetime GitHub account and a tasty bottle of bourbon.

We're trading you the prizes for a commercial license.


That's what I was going to say too. If the payout isn't high enough, then don't sell :).


What country/state/province law are you using for the age cutoff for the bourbon? :-p


i'd recommend you try rittenhouse's 23-year-old and black maple hill's 21-year-old


I'm a big Black Maple Hill fan. Will have to try Rittenhouse - thanks!


Well it's really a brilliant plan if you ask me... give the winner a bottle of nice bourbon and the concerns about licensing turn into concerns about getting laid and performing random stunts of awesomeness that result in hospital visits. I'm in!


The license of the submitted code should apply to everyone, including GitHub!

Indeed. I'm tempted to try this but keep the code GPL'd. Sure, I don't get any liquor or free disk space, but the amusement almost makes it worthwhile.

(I hate proprietary software.)


You could still get the bourbon and disk space - since you would still theoretically be the sole copyright holder, you could submit your entry with one license (BSD for example) and then publicly release it under another (GPL)

Note: I am not a lawyer


If a license prohibit commercial usage, then it violated the open source etho of the community.


There are two open source ethics. The first (MIT/BSD) says "do whatever you want with it." The second (GNU) says "keep software free."

The latter places heavy restrictions on certain ways to commercialize software, e.g. selling it without making the source freely available.


Part of the callenge of the netflix contest was the dataset. At 101 million rows of data you couldn't toss your simple agorithm at the problem. But with githubs data at only 440,237... I am tempted to toss it into my netflix code just to see how fast the recommendations are generated!


Followup: I dumped it into my svd code. Running over the netflix dataset took ~6 hours. Running over the github data takes ~2 minutes :)


Some other info on the data

  user id 1-56554 (i.e. 16bits) <- yah
  repo_id 1-123344 (i.e. 17bits)


It'd be nice if there was a leaderboard (as Netflix had) so we could see other's progress (to see if our implementations stand a chance, or if we're wasting time).

Edit: Scratch that. It was hiding up top (http://contest.github.com/leaderboard)


I see that you didn't read the whole page carefully: http://contest.github.com/leaderboard


I wonder if one can brute-force the results.txt file. To the github folks: do like netflix and publish the comparison with one data set (validation set) and, after the contest is over, publish the comparisons with a test set that is only used once. That way nobody can brute-force anything.


figuring out the answers is actually not that hard - we're using data that is publicly available through our API. in the rules we state that your entry will be disqualified if you do this sort of thing, though.


Github folks: the repos numbered

59337 95472 80221 73599 24616

have no descriptions in the repos.txt files but have had their languages computed. Is it because they are private? Anyway, you maybe should remove that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: