Twibright Optar – OPTical ARchiver – a codec for encoding data on paper

rubicks · on March 27, 2022

> Reducing the space necessary to keep accounting records that are mandatory to be kept on paper

I can't wait to see the ensuing hilarity when this is challenged in court.

mmastrac · on March 27, 2022

I realize that courts are looking at the spirit of the law and don't look kindly on loopholes, but this seems like it would be considered a reasonable option by a reasonable person, assuming you go through the diligence of ensuring the software remains available.

Generally the paper retention laws I've worked with have more required that documents be produceable on demand on paper (and associated fined for failure).

41b696ef1113 · on March 27, 2022

I am reminded of NanoRosetta[0] which will engrave your data onto a physical ~coin sized object.

[0] https://nanorosetta.com/

WinterMount223 · on March 27, 2022

How does this compare to printed base 64, then OCR? It seems more robust to print and recognize text than binary patterns. You can always type it out if you are desperate.

rubicks · on March 27, 2022

3750 characters/page at 6 bits/character gets you 22.5 kbits/page or ~2.8kB/page. Optar apparently does 200kB/page.

41b696ef1113 · on March 27, 2022

At minimum, I would want to use one of the encodings that avoids oO0il1 ambiguity. Maybe settle for hex in a pinch.

bawolff · on March 28, 2022

> Sheet music is sound storage which is very space inefficient and allows only MIDI capability. Twibright Optar stores digital music with full digital sound capability. Recommended format is 32kbps AoTuV Ogg Vorbis which allows storage of around 45 seconds of music per page with acceptable quality. Musical skill is not necessary for playback anymore. And music album can now literally be an album.

Lol, i wonder with opus if you could have an audiobook thats thinner than the book.

dragontamer · on March 28, 2022

Golay code?

I feel like with 200kb sizes, Reed Solomon would be better?

Definitely a fun idea. 200KB per page is reasonable and much more than I thought was possible (I never did the math though).

Maybe for actual pragmatic use, 50% Reed Solomon code for ~100Kb per page is still good? 100KB of data and 100KB of Reed Solomon code could correct up to 50KB of errors.

Error detection (CRC32?) to reliably throw away errors (Pararchive style) also could help.

--------

Or maybe just packaging the data in Pararchive before writing to paper would work, lol

idealmedtech · on March 28, 2022

If you're using the standard RS(223,32) code you actually can get much more data than just half! Only 32 bytes in a 255 byte block are used for parity. We used it extensively when designing a comms protocol for LEO usage!

dragontamer · on March 28, 2022

Well, very large codes like RS(150000, 50000) might be a bit expensive computationally, but computers today are really really fast lol.

I think codes of that size are some kind of low density parity check or something these days. Not something I understand, but RS codes are a big matrix multiplication: O(N^3) so maybe not practical lol.

Don't mind me, I'm basically talking out of my ass on this one.

-------------

EDIT: I'm remembering college a bit more. Code interleaving and 2 dimensional codes are used for CD ROMs. Reed Solomon works best if errors are spread out over many frames, so there are patterns you can do to make this sort of 2D printing more resilient vs 2 dimensional burst errors (ex: spilled coffee, or page tears, streaking printers)

namibj · on March 28, 2022

One of the three exponents is due to data length, though. So "only" quadratic cost in the spreading distance of the error correction code.

eternityforest · on March 28, 2022

I think CDs use two layers of codes interleaved in some non-obvious pattern. Errors in the smaller blocks that take out a whole block are actually spread amongst multiple large blocks, and you know what bits are bad so you're correcting erasures instead of errors, which is easier.

dragontamer · on March 28, 2022

So basically the issue is that large codes (ex: RS(32768, 32768), a 32kb message+32kb parity) are very expensive. I don't quite remember the math off the top of my head, but I believe to form error-correction over a code like this, you need to make a 32768x65536 matrix (2GB). Which... quite possibly can work on modern computers, but you can imagine that in the 80s and 90s this is infeasible.

The largest codes in practice are 8-bit (RS(223,32) as the satellite guy mentioned). CD-ROMs are much much smaller RS(32,28) inner + RS(28,24) outer-code.

Instead of making such a large code, you can make a smaller interleaved code that handles a burst-error like the large code, but no other kinds of errors.

------

CD-ROMs, designed in the 80s, would go for the interleaved code. (1GB of RAM on every CD-ROM reader would have been prohibitively expensive in the 80s!)

In practice, its burst-errors that happen with the most frequency anyway. Random errors are exceedingly rare in practice, most errors are "localized" to a single region. In this paper-example, a tear or rip in the paper would erase many bits that are "next" to each other.

By focusing on "burst" errors instead of "all posssible random errors", you end up with much faster computations that use much smaller matricies.

--------

I hear that the new hotness for year 2000+ era is Low-Density Parity Checks. But I don't actually know how they work / or their math at all. I know they're an "imperfect" code from a theoretical basis (they'll correct fewer errors than a simple Reed-Solomon code), but they are "less imperfect" than interleaved codes and other codes designed for burst-errors... while having even better performance characteristics.

Reed Solomon is great... if you got the compute power for it.

idealmedtech · on March 30, 2022

Another important piece is that the total size of the code (223+32) is determined by the _word size_; in the common case, 8 bits. For an N-bit word architecture, your code may contain at most -1+2^N bits, or 255. So to practically implement an RS(32k, 32k), you'd need an efficient way to operate on 16 byte words, which we just didn't have in the 70s when these codes were devised.

wazoox · on March 27, 2022

There are systems now to archive masses of data on film. Contrary to ordinary printed paper, film can easily last several decades and even centuries.

AlotOfReading · on March 27, 2022

Paper can also easily last centuries. I have some century old paper in my house, just sitting on the bookshelf. There are archives with examples that are upwards of a thousand years old. If we include parchments, we have surviving documents predating the pyramids.

zozbot234 · on March 27, 2022

I disagree, microfilms from only a few decades ago have degraded to the point of being near-illegible. Even low-quality paper has been far more reliable than that.

wazoox · on March 27, 2022

I'm talking about 35mm film infrequently accessed, not microfilm which is mostly degraded by repeated reading.

Typical laser-printed paper doesn't last more than a few decades AFAIK. Color 35mm film is stable for at least 60 years in ordinary conditions and B&W film 100 years and more.

8bitsrule · on March 27, 2022

> Typical laser-printed paper doesn't last

Many options there. I'm more concerned about printer-ink longevity. Tough to chase that down (recent tech). This page about preserving photos looks relevant; recommends coated halide or ink-jet for 100+ years) [https://www.shutterbug.com/content/how-long-will-your-digita...].

Paper's been around forever, so have long-lasting inks ... but printer-ink is too new. Typewriters have been around over a century ... haven't seen studies!

kragen · on March 28, 2022

Laser printer toner is mostly pretty stable, stabler than paper. Most inkjet ink isn't.

Brian_K_White · on March 27, 2022

Polymers degrade all on their own without physical action.

Microfiche or other photographic film is merely more dense and "pretty good" longevity.

But the goal here is maximum robustness, including all aspects of the system or life cycle. Ubiquity, dependencies, acessibility, are important aspects, more important than information density.

A film that lasts 200 years instead of thousands is not better.

A film that requires a whole specialized infrastructure to produce the materials, is not better.

A technology that requires a special media or special writer or special reader, is not better.

Paper is both very long lasting, it's also easy to get, easy to print, and easy to read with only basic equipment and process requirements, and it only matters that the paper and ink are as durable as you want them to be.

IE, you need paper, printer, camera, and computer, and if you also want longevity then you also need to select durable ink and paper, but it doesn't matter what kind exact kind of paper printer, camera or computer. If these were printed 40 years ago with dot matrix printers, it doesn't matter that in 200 years paper may not be made out of wood cellulose any more, or what kind of tech printers, cameras and computers are based on at that time. All that matters is that your choice of ink and paper aren't the obviously ephemeral types like thermal receipt printers or most inkjet.

The 40 year old dot matrix version of this would just have lower data density than what a laser printer can attain, but it would be perfectly scannable today, and it wouldn't matter if today no one makes tractor feed paper any more, or that you're scanning it with a phone instead of some photodiode contraption.

Aside from the simpleness of the paper & ink itself, the ubiquity is a huge functionality aspect.

It's not better if it requires a photo lab to produce and a special viewer to read. A microfich viewer is not exactly high tech, but I don't have one, nor does my coffee shop nor any hotel or airport I've ever been in. But printers and cameras are everywhere, which means anyone can use them any where any time.

And the critical point is it's not just todays printers and paper and inks that manufacturers just happen to be mass producing today. The image doesn't care what tech was used to print it, or scan it, it remains functional even when all the tech changes.

The tech agnosticism and ubiquity/accessibility are the critically important features. They're not nice or optional, they're central, they're the explicit defined purposes and that outweigh all other considerations.

There is no film anywhere that even comes close to doing as good a job as a printer and paper for the stated goals of this project.

wazoox · on March 28, 2022

Reading 35mm film isn't high tech. In fact, you only need a light source and a lens. You can perfectly scan 35mm film with a basic flatbed scanner, or a phone camera.

If you're using only text and image paper wins. But as soon as you plan to encode data digitally, you'll need something like a scanner and a computer (not necessarily complex ones). In that case, film is better (denser, less fragile).

Brian_K_White · on March 28, 2022

Creating film is high tech, both the film itself, the camera to expose it, and the lab to develop it.

And you need more than a light source, you need some sort of optics. Maybe you can shine a flashlight through it onto a wall and take a picture of that, if the film hasn't gone to dust the way all the Hollywood film negatives from barely a few decades ago have already done.

Film is garbage for this.

It's better than a CD, but doesn't hold a candle to paper.

wazoox · on March 29, 2022

> Creating film is high tech, both the film itself, the camera to expose it, and the lab to develop it.

Absolutely not. See https://hmfi.handmadefilm.org/

Seriously, film is a 19th century technology. All of its process is really, really simple compared to even the most basic computer.

> It's better than a CD, but doesn't hold a candle to paper.

It's much denser, a film roll can hold many gigabytes.

Brian_K_White · on March 30, 2022

The high tech of computers and digital cameras and printers is irrelevant.

All that matters is that all of those things are as common as potatos, and from now on always will be. It doesn't matter how they change over time, only that any random person has access to them on any form, which they do and always will now.

And in particular, printers and paper are by far the most ubiquitous form of reproduction, and they don't require any kind of lab. Printers are everywhere, and everyone can use any of them, directly and immediately.

That is not remotely as true for film, even today let alone later. Not even in the same universe. Not even back in the days of 1hour photomats in shopping plaza parking lots, and those days are long gone.

Even today when you can still actually buy film, who has a darkroom? They exist, but where and how many? Not in every other kitchen and den, but printers are. And so are digital cameras and computers, by which I mean phones not just laptops.

I have the means to produce a printed paper right in my dining room, and so does my mother, and my mother in law, and my sister in law, and brother in law, and my brother, and the coffee shop, and the pizza shop... and none of them has a darkroom or even a film camera.

Even if I wanted to put a qr code onto film, how would I even do it?

The first step would probably be I'd have to print it just so I could take a picture of it, with a film camera I don't have, and film I don't have, and send it somewhere to get developed. It's patently absurd since I already had the paper. Even if I didn't have a printer, and had to take a picture of a screen, there are 9000x more convenient and immediate ways to print the image than by getting film developed.

All film is good for is it's smaller than paper and it's lower bar to read the finished developed film than things like a cd or a thumb drive. But the size is not the most important thing, and the means to create the film is important to the usefulness of a whole system, not just reading it.

mburee · on March 27, 2022

On that note, "Double Fold" by Nicholson Baker

kragen · on March 28, 2022

Paper normally lasts millennia if not burned or wet. Pulp paper is the exception, lasting only a century. Much film, by contrast, is only acetate, which suffers from a catastrophic autocatalytic degradation called "vinegar syndrome", similar to and synergistic with acid paper embrittlement but much, much worse.

PET film (polyester) is stabler than any paper and immune to water, but most paper is much better than most film. Gelatin-printed film is also vulnerable to water.

fmajid · on March 28, 2022

And fungus will eat the gelatine base.

traverseda · on March 27, 2022

Where? I can't find them.

wazoox · on March 27, 2022

There's this one https://digifilm-corp.com/home And another one I can't find right now, it's a Norwegian company.

traverseda · on March 27, 2022

Thanks! Google just could not find anything in this field for me.

canadaduane · on March 28, 2022

Are there any error correction systems that would make a 3cm x 3cm hole anywhere on the page still allow the full page to be completely recoverable?

Or "holographic" error correction, where even if 90% of the page is gone, you could still get the "gist" of it somehow? (I'm not sure what I'm asking for mathematically, but probing for ideas that are out there.)

deoxxa · on March 28, 2022

Erasure coding (particularly fountain codes) are what you'd want to use.

mbreese · on March 28, 2022

You’d have to be clever about the physical distance between overlapping codes. You’d hate to see a 3cm artifact (cut, stain, spill, etc) take out a whole unit/block. (Assuming you’d print multiple blocks on a page.)

EvanAnderson · on March 27, 2022

Previous discussion of PaperBack (which, in turn, references Optar): https://news.ycombinator.com/item?id=10245836

webmaven · on March 27, 2022

This reminds me of Xerox DataGlyphs:

https://microglyphs.com/english/html/dataglyphs.shtml

throwaway81523 · on March 27, 2022

This is pretty cool and I remember seeing something similar some years back. It would be nice to add a long range erasure code on top of the FEC, in case there is an ink splotch or the like.

rahimnathwani · on March 27, 2022

This reminds me of the Danmere Backer, which was for backing up data to VHS tapes.