Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Kickass! When you paste the Disclaimer text into the box, it suggests "the darjeeling limited" and "potty training". The first three paragraphs of Yegge gave me "faithfulness" and "registry cleaner". Just make it work, and it will be awesome.


Yes, it doesn't know when to shut up. It tends to work better with longer articles, we usually test it with content from Google News.


Have you looked at work done on keyword extraction from academic literature?

http://www.nzdl.org/Kea/

It's simple and accurate.


Yes, we have. What we do is not keyword extraction, our tool suggests tags based on probabilistic algorithms. For example, if your document contains the terms Bush and Obama it should be tagged as politics even if that word is not present in it. Compare to the Yahoo Extraction Tool, for example. This approach will not add new keywords that would help in a search. It's only useful to have an idea of what the document is about.

The main problem is not the algorithm but the input data. Our system learns from millions of tagged blog posts among other sources. The quality of the tags varies a lot, and most of the work we do is about deciding what data to use for training.


When is the API coming? I can see us using this quite a bit.


The API is already available although we haven't announced it. There is a WordPress plugin that uses it, called TagMahal.

Please contact us if you'd like to use the API, if you need to do up to 5k queries per day or so it shouldn't have much of an impact on our server.


I would like to see a Blogspot plug-in too! I just tried it on my latest post and it worked much better than I did. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: