Reverse-engineering the classic MK4116 16-kilobit DRAM chip

fred256 · on Nov 14, 2020

Reading about separate row and column addresses reminded me that the Apple II had its address lines arranged in such a way that the video circuitry would naturally read every row of memory, thus obviating the need for a separate refresh circuit.

(To RAM, the order of address lines doesn't matter)

mark-r · on Nov 15, 2020

I don't think Apple were the only ones to do this, but it is very clever.

jamwaffles · on Nov 14, 2020

I always get a bit excited when I see a righto.com post.

An excellent read as usual.

kens · on Nov 14, 2020

Thanks for your kind comment

nickcw · on Nov 15, 2020

I remember those chips as they were the ones in the ZX81 16K RAM Pack. 8 of the little beauties.

The RAM Pack itself was a nightmare. If it wobbled the computer would crash. I held mine on with blu-tack - lots of it. Every now and again I'd have to take it off and clean the contacts by rubbing the oxide layer off with an India rubber. It used to run quite hot too.

Despite the problems, the RAM pack was the best upgrade I had!

kens · on Nov 14, 2020

Author here for all your vintage DRAM questions :-)

drfuchs · on Nov 14, 2020

Great! Question: When 64K DRAM first appeared, I remember a cover article in EDN(?) claiming that nothing denser could ever be built, because cosmic radiation would zap individual electrons, and thus flip bits too frequently, and there was nothing you could do to protect from it. So, how was that thinking wrong?

kens · on Nov 14, 2020

They discovered that most of the DRAM soft errors were due to alpha particles from the ceramic packaging. Changing the packaging solved most of the problem.

There are still bit flips due to cosmic rays but the rate is low enough that most people don't care. ECC memory can be used if errors are a problem.

More details on Wikipedia: https://en.wikipedia.org/wiki/Soft_error#Alpha_particles_fro...

askvictor · on Nov 15, 2020

At least they weren't due to radioactive cows: https://www.jakepoz.com/debugging-behind-the-iron-curtain/

formerly_proven · on Nov 14, 2020

Some of Sun's servers famously suffered big reliability problems due to using IBM SRAM as caches, which is said to have had elevated error rates due to more alpha-emitting contaminants in their particular package filler.

kens · on Nov 14, 2020

I was at Sun at the time (although not working on that), and these memory errors were a catastrophic problem for Sun. Customers were paying a lot of money for Sun's reliability and then systems started mysteriously having problems and Sun couldn't figure out why.

lokedhs · on Nov 15, 2020

I was there too. That would be the ecache problem, right?

I was in the service division at the time, and yes, it was a serious issue. However, if I recall correctly it was limited to the E10K. It was the flagship machine, so of course it got all the attention, but most customers didn't suffer from it.

phkahler · on Nov 14, 2020

Just wanted to say that is an amazing write up. My first computer was an Interact with 16K of memory built with these chips. I never knew how complex they really are until today. I knew the row/column multiplexing was a PITA for system designers, but was it really worth it? How much more did a package with 7 or 8 more pins cost back then? I guess that's x8 for a system though.

Anyway I hope someplace archives these documentary articles!

kens · on Nov 14, 2020

I don't know how much more the larger packages cost, but it was enough that Intel really really didn't want to move off 16-pin packages, which caused problems for the 4004 and 8008 processors. In addition, the larger packages took up more space on the circuit board which was a serious disadvantage, especially when you had a board full of memory chips.

djmips · on Nov 14, 2020

I'm so glad to see you tackle a DRAM since it's probably the one main component of a computer that is least understood since it is somewhat analog. Is dummy cell an official term? It's perfectly fine but I thought it might be better described as a reference cell. I enjoy seeing self correcting designs like this. My question would be - if you were to contrast to modern DRAM is it quite similar or what are the differences? (I saw your chart at the end but I'm guessing that there is more similar than different in a modern DRAM?)

kens · on Nov 14, 2020

Dummy cell is the term used in the paper "Storage Array and Sense/Refresh Circuit for Single-Transistor Memory Cells" that introduced the idea. It's also the term used on the MK4116 datasheet and elsewhere.

The capacitors in modern DRAMs are deep trenches, rather than a simple polysilicon plate. The trenches are 3.6 micrometers deep while the feature size is 45 nanometers, so they are remarkably deep. See the photos here: https://chipworksrealchips.blogspot.com/2014/02/intels-e-dra...

djmips · on Nov 14, 2020

From your new found knowledge of the 4116, do you have any insight into why the 4116 have a reputation for failure? They are often wholesale replaced with 4164 in old equipment.

kens · on Nov 14, 2020

I don't know anything specific about 4116 reliability problems. But it makes sense that the 4164 would be a few years more modern and thus more reliable.

bitminer · on Nov 15, 2020

Very interesting article, and a good topic.

Any idea how big the design team was? I would expect at least two but can imagine up to twenty people.

And wonder if the masks were manually prepped.

kens · on Nov 15, 2020

The Computer History Museum has on exhibit a rubylith mask for the 4K DRAM chip. According to the designer's oral history, they started using a Calcomp plotter to cut the rubyliths for the 4K DRAM while earlier ones were hand-cut.

It wouldn't have been possible to make the masks for the 4K chip by hand because hand-cut masks wouldn't be accurate enough. The problem was that the sense amps need to distinguish tiny voltage differences. If you cut the sense lines by hand, they would be slightly different widths, enough to mess up the signals.

The conference paper on this memory chip was authored by Schroeder and Proebsting of Mostek, so there were two main designers. There must have been more people on the team, but I couldn't find the size.

https://www.computerhistory.org/revolution/digital-logic/12/...

https://www.cs.utexas.edu/~hunt/class/2016-spring/cs350c/doc...

bitminer · on Nov 15, 2020

Interesting. I did some (IT) work for a company making photo plotters for the PCB industry. The MDA Fire 9000, later Cymbolic Sciences and other companies. This was about 1985.

They had a minicomputer with a hardware rasterizer driving a laser writing to photosensitive film. They had amazing throughput for the day.

One of the issues was temperature and humidity stability of the film. It would change dimensions with as little as 10% change in humidity.

The change was more than the resolution of the laser.

thodin · on Nov 15, 2020

We had soviet clones of this chip. Part numbers КМ581РУ4/КМ584РУ4/КМ565РУ3 (MK4116-2/MK4116-4)

rwmj · on Nov 15, 2020

Would love to see kens reverse engineering those to find out if they were direct copies or something else.

dboreham · on Nov 15, 2020

Cut my teeth on this device. I remember the excitement at getting the first 4164 samples: Single 5V rail: luxury. Had to invent bank switching to use more than 8 of them:) Good times.

h2odragon · on Nov 14, 2020

At some level, all signals are analog. Lovely clever tricks every step of the way are necessary to make the nice abstract digital levels we try to think about. How might such tricks discriminate more states? can we have 3, 8, more state "bits"?

kens · on Nov 14, 2020

That's what flash memory does, it can store four bits per cell (quad-level-cell) https://en.wikipedia.org/wiki/Multi-level_cell

The Intel 8087 math coprocessor stored two bits per transistor in its microcode ROM. It used four transistor sizes / voltage levels. This was necessary to fit the microcode on the die. https://www.righto.com/2018/09/two-bits-per-transistor-high-...

mmastrac · on Nov 14, 2020

It just comes down to noise. If you can detect X levels successfully 1 out of 1x10^Y times, then you can ship it. Just set X and Y to whatever your specs should be.

musicale · on Nov 14, 2020

Nice. This is a really interesting design.

I tend to like NMOS because of its compactness and simplicity (disadvantage is usually dissipating static power of course.)

egsmi · on Nov 15, 2020

Love the articles. I definitely want to spend a bit more time with it. The section on the sense amp was great.

But one thing I was left wondering was how was the refresh managed?

Was it just the act the reading/writing had the side effect of refreshing? Therefore the memory controller had to keep track of when & what was accessed?

PhantomGremlin · on Nov 15, 2020

Therefore the memory controller had to keep track of when & what was accessed?

It was much more simple than that.

A typical DRAM of the era had 128 rows (at least as externally visible; what's inside is up to the DRAM implementer).

So in a higher end system the memory controller, about every 15 microseconds, would increment a 7-bit binary counter. It then would command all the DRAMs to simultaneously do a refresh cycle with this specific row address. This refresh took priority over normal read and write.

No need to optimize refresh by skipping it for rows which had recently been accessed. Just blindly refresh all 128 rows in sequence.

Internally a 16k bit DRAM would have 128 rows and 128 columns. Each time a row is read, all 128 associated columns are read in parallel. Then the contents of the selected single column are output. That's what makes refresh work without consuming too much of the chips bandwidth. A refresh of a row results in all 128 columns of that row being refreshed.

It takes about 2000 microseconds to do 128 refreshes. That's often what the DRAM chip was specified for. In reality the chips could often retain content for a minute at room temperature. It was when operating at the limit of 70C that refresh every 2 milliseconds became close to necessary.

Some microprocessors of the era, such as the Z80, had an internal 7 bit counter. The Z80 could be easily set up to send this 7-bit counter out to the DRAM after every instruction fetch.

So a Z80 system did something like this:

   fetch an 8-bit opcode
   refresh the next 1 of 128 rows of DRAM
   fetch additional instruction bytes
   complete the instruction

Since a Z80 operated at about 2 MHz minimum and instructions completed in about 6 cycles, DRAM memory was being refeshed, once per instruction, at a much faster rate than the chips needed. There was no such thing as slow divide instructions. The worst case IIRC was about 12 or so cycles to complete an instruction.

The only downside to this sort of refresh is in systems that have more than one word width's worth of chips. A normal read or write accessed a single word. Simple refresh accessed all DRAM chips in parallel. Much higher power consumption.

So, e.g. 16 K bytes of memory needed 8 DRAM chips. Whereas 64 K bytes of memory needed 32 DRAM chips. In a Z80 system, which read 8 bits at a time, only 8 chips would be active during normal read/write but all 32 chips would be active in refresh.

A DRAM chip that was idle consumed a few mW of power whereas a DRAM chip that was being accessed or refreshed consumed a few hundred mW of power. So in a large system (hundreds of chips) you couldn't take as simplistic an approach to refresh as the Z80 did.

egsmi · on Nov 15, 2020

Thanks for that! It’s very clear now.

cute_boi · on Nov 14, 2020

really love these reverse engineering articles <3