Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did work on a proof of concept program to accomplish this for my own content library. It would scan a directory to find files and compare them with locally stored metadata. For v2 torrents this is trivial to do via a "pieces root" lookup, for v1 torrents it involves basically checking that each piece matches, and since pieces may not align with the file then it's not possible to guarantee that it's the same file without having all of the other files in the torrent.

I built it with libtorrent and after loading in all of the torrents (multiple TBs of data), it would promptly and routinely crashed. I couldn't find the cause of the error, it doesn't seem it was designed to run with thousands of torrents.

One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase. I haven't been able to find a publicly available database of torrent metadata. If you have an info hash then itorrents.org will give you the metadata, if it exists. I started scraping metadata via DHT announcements, but it's not exactly fast, and each client would have to do this unless they can share the database of metadata between them (I have an idea on how to accomplish this via BEP 46).



I have a solution to this, it's the successor to Magnetico.


Could you please share a link to your solution? I would be interested to take a look


>One problem that I've yet to build a solution for is finding the metadata to use for the lookup phase.

I think BEP 51 followed by BEP 9 is all you need.


This is how I was originally achieving this. As I said, it's very slow. I don't think it would be a good solution on its own because it would require that every client be constantly sampling all DHT nodes, and downloading all metadata and indexing it for a potential future lookup. It's a huge amount of additional load on the DHT.

I think a better solution would be some way for clients to query the DHT for a specific "pieces root", but I don't know if all clients publishing "pieces root" for the torrents they know about would also be a good idea. Some kind of distributed metadata database where clients can query would be ideal.


Would you mind sharing the source? Sounds like something others could build on.


The source is available here: https://github.com/chhs1/content-seeder




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: