Maybe I'm not understanding, but wouldn't using the file hash to keep track of t...

mook · on July 19, 2024

I download still-running fiction from the internet and read them locally, and periodically replace the files with versions with more chapters. It does not sound like the workflow would work for me.

(This also means hashing the contents wouldn't work.)

codethief · on July 20, 2024

But if the annotations get stored directly in the ebook file (as opposed to separately in a JSON file), I don't see how this would work for you, either? You would still have to transfer them to the new file somehow.

arichard123 · on July 20, 2024

If notes are stored as json maybe it would be easy to write a program to transfer notes from the old hash to the new one

dredmorbius · on July 20, 2024

If a chain of hashes is associated with a single work ... you might get somewhere.

I've thought through the problem of fingerprinting records (as in, any recorded data: text, images, audio, video, software, etc.) in a way that coherently identifies it despite changes over time. Git and related revision control systems probably offer one useful model. Another is to generate signatures via ngrams of the text in such a way that's resilient (i.e., non-brittle) despite varioius changes: different fonts, charactersets, slight variances in spelling (e.g., British vs. American English, transliterations between languages), omissions or additions, or other changes. Different versions of the same underlying work, e.g., PDF, HTML, or ASCII text, translations, different editions, etc., all have much in common, though in ways that a naive file hash wouldn't immediately recognise or reveal.

We often refer to works through tuples such as author-title-pubdate, or editor-language-title. This is a minute fraction of the actual content of most works, but is remarkably effective in creating namespaces. Controlled vocabularies and specific indexing systems (Dewey Decimal, OCLC, Library of Congress Catalog Number, ISBN, DOI, etc.) all refine this further, but require specific authority and expertise. I'd like to see an approach which both leverages and extends such classifications.

braiamp · on July 19, 2024

> Reading progress, bookmarks, and annotations are stored in plain JSON files

The file is not modified.

cl3misch · on July 19, 2024

I think they mean that the hash in the JSON record does not match the PDF file anymore, after the PDF has been changed by some other program.

kylebenzle · on July 19, 2024

I agree but it seems the original PDF is not changed but you are right, if it is edited directly and renamed then it will fail to load the JSON.

TuringTest · on July 19, 2024

I guess they'd use a hash of the book contents, not the whole file?

ericjmorey · on July 19, 2024

You end up with the ebook file and an annotations and bookmarks file. I'm assuming that the program would just look in the same directory as the ebook file or some configurable location.