Memory Error Detection and Correction

An Introduction to Parity and ECC

Someone asked:

> Thanks for the info.  I guess my confusion stems from a misunderstanding of
> what ECC memory is.  Is it _just_ parity, or is there other built-in error
> detection/correction circuitry that does a more thorough job of detecting
> memory errors?

The memory is just memory. In general, memory doesn't do parity or ECC, that is done by memory control logic on the motherboard, in the chip set, or (as in the 21066) in the CPU.

The memory only has to store the parity or ECC bits, just as it stores the data bits.

Parity is implemented on most PCs with one parity bit per byte. For a 32-bit word size there are four parity bits, for a total of 36 bits that have to be stored in the memory. On most Pentium and Pentium Pro systems, and a few 486 systems, there is a 64-bit wide memory data path, so there are eight parity bits, for a total of 72 bits.

When a word is written into memory, each parity bit is generated from the data bits of the byte it is associated with. This is done by a tree of exclusive-or gates. When the word is read back from the memory, the same parity computation is done on the data bits read from the memory, and the result is compared to the parity bits that were read. Any computed parity bit that doesn't match the stored parity bit indicates that there was at least one error in that byte (or in the parity bit itself). However, parity can only detect an odd number of errors. If an even number of errors occur, the computed parity will match the read parity, so the error will go undetected. Since memory errors are rare if the system is operating correctly, the vast majority of errors will be single-bit errors, and will be detected.

Unfortunately, while parity allows for the detection of single bit errors, it does not provide a means of determining which bit is in error, which would be necessary to correct the error. This is why parity is only an Error Detection Code (EDC).

ECC is an extension of the parity concept. ECC is usually performed only on complete words, rather than individual bytes. In a typical ECC system with a 64-bit data word, there would be 7 ECC bits. Each ECC bit is calculated as the parity of a different subset of the data bits. The key to the power of ECC is that each data bit contributes to more than one ECC bit. By making careful choices as to which data bits contribute to which ECC bits, it becomes possible to not just detect a single-bit error, but actually identify which bit is in error (even if it is one of the ECC bits). In fact, the code is usually designed so that single-bit errors can be corrected, and double-bit errors can be detected (but not corrected), hence the term Single Error Correction with Double Error Detection (SECDED).

When a word is written into ECC-protected memory, the ECC bits are computed by a set of exclusive-or trees. When the word is read back, the exclusive-OR trees use the data read from the memory to recompute the ECC. The recomputed ECC is compared to the ECC bits read from the memory. Any discrepancy indicates an error. By looking at which ECC bits don't match, it is possible to identify which data or ECC bit is in error, or whether a double-bit error occurred. In practice this comparison is done by an exclusive-or of the read and recomputed ECC bits. The result of this exclusive-or is called the syndrome. If the syndrome is zero, no error occurred. If the syndrome is non-zero, it can be used to index a table to determine which bits are in error, or that the error is uncorrectable. This table lookup stage is implemented in hardware in some systems, and via an interrupt, trap, or exception in others. In the latter case, the system software is responsible for correcting the error if possible. On the Alpha this is one of the functions of PALcode.

> I assumed initially that the ECC memory is the same as the common 72-pin 70nS
> parity memory.  The reason I am confused is that my local memory dealer is
> saying that ECC memory is a different animal than "parity" memory.

Your memory dealer is the one that is confused. Common 72-pin 36-bit SIMMs will work fine. 32-bit SIMMs will not work, because the Alpha always uses ECC, whereas Pentium machines and Macintoshes often do not implement either parity or ECC. Also, even on Pentium machines that are capable of parity or ECC, it is usually optional. Since it is obviously cheaper to make 32-bit memory than 36-bit, most people are happy to use 32-bit.

I recently purchased a new Intel "Marl" motherboard, which is based on the Intel 430HX (Triton II) chipset. The board supports ECC when 36-bit memory is installed, and it supports EDO memory. But just try to find 36-bit EDO memory. There is no technical impediment to manufacturing 36-bit EDO SIMMs, and they appear in DRAM manufacturer's data books, but retailers don't stock them. As far as they are concerned, all they need is 36-bit FPM SIMMs for people with older Pentium systems which need the parity, and 32-bit EDO SIMMs for people with new motherboards that support EDO but for which parity is optional.

There are a few variations that you might encounter that blur the definitions. Recently there has been a trend to make "logic-parity" SIMMs for use in older PC motherboard that require parity. Instead of using a RAM chip to store the parity bits, they use a parity generator (i.e., an exclusive-or tree) to regenerate the parity bits on read cycles, to fool the memory controller into thinking that the correct parity was stored. It should be readily apparent that these SIMMs will not work in a system that uses the "extra" bits to store ECC. Also, the parity generator takes at least a few nanoseconds beyond the access time of the DRAMs to regenerate the parity, so a logic-parity SIMM is slower than a normal SIMM using the same DRAMs. However, most of the companies making logic-parity SIMMs don't add this time to the rating they use for the SIMM. Thus a so-called 70 nS logic-parity SIMM may really be a 75 nS SIMM. The only "advantage" of logic-parity is that a parity generator is cheaper than a RAM chip. However, if anyone sells you a logic-parity SIMM as a 36-bit SIMM without telling you that it uses logic-parity, IMNSHO they have fraudulently misrepresented it since it actually only has 32 bits of memory.

It used to be the case that there actually were DRAM chips which internally generated and checked ECC codes, so they were essentially self-correcting memory. Micron offered these in the early '80s. I am unaware of any currently available DRAMs that do this, although it is apparently common practice for serial-interface EEPROM chips to use ECC to increase the effective endurance rating.

Back to my home page

Last updated August 23, 1996

Copyright 1996 Eric Smith