|
Casting
a light on DRAM
Text: Matthias Alles
English translation: Peter West
Memory under the microscope
Today's computers
could not manage without DRAM (Dynamic Random Access
Memory). Thanks to the permanent relatively good price/performance
relationship, this kind of memory undoubtedly has contributed
to the massive spread of computers. But the disadvantages
cannot be disputed: The low price is a result of a simple
construction, which manifests itself in a reduction
of operating speed. However, as the basic principle
of DRAMs has not changed in decades, it is rather surprising
that it still finds use in today's computers. In order
to understand certain properties of DRAMs, it is necessary
to take a look at the innards of memory chips.
|
Figure 1: This small section
of a memory matrix makes clear the very
simple construction of DRAMs. |
To store one bit of information,
modern DRAMs require only a single transistor and a
capacitor. The information is stored in a tiny capacitor
of a few fF (1 femto-Farad = 10-15
Farads), which is charged with 0 V for a logical 0 or
for example 5 V for a logical 1. The resulting single-transistor
cell permits a very high degree of integration on the
silicon die, so the costs of such a memory chip can
be held very low. Static RAM (SRAM) on the other hand,
which is used for cache memory among other things, has
faster access times but also needs six transistors to
store one bit, which makes them more expensive due to
the extra real estate required.
Now there is not very
much you can do with one bit, which is why a multiplicity
of single-transistor cells are arranged in a matrix
on the chip. One complete row of this matrix is called
a page. To address one bit you need both a row address
as well as a column address. These are passed to the
memory chip by multiplexed address lines via /RAS and
/CAS signals (more about this later).
To read out a bit, the
following happens: First of all a so-called pre-charge
has to occur. During this all data lines are brought
to half the voltage for a logical 1, so for instance
2.5 V if the chip works with 5 V. The time required
to do this is called the pre-charge-time. While this
is happening, the row decoder in the memory chip evaluates
the row address and after the pre-charge selects the
suitable word-line (WL) with a voltage pulse. All the
transistors of this page now conduct. The charged capacitor
now ensures that the voltage on the data line alters
slightly depending on the stored charge - in other words
becomes slightly lower or slightly higher than the previous
2.5 V.
At the end of the data
line the sense amplifier comes into play. This uses
the pre-charge voltage as a reference, and converts
voltages lower than this to 0 V and higher than this
to 5 V. Depending on the column address, the contents
of the selected memory cell will be output externally.
However, the problem now arises that all capacitors
of one page have been discharged and so their contents
have been lost. As a consequence, all these memory cells
have to have their previous information written back
to them again. So after the read-out the sense amplifiers
ensure that the evaluated information is fed again as
either 0 V or 5 V to all data lines. Finally the capacitors
are charged with this voltage and so store the correct
information once more.
To write a bit, one has
to proceed in exactly the same way as for reading, except
that after reading the page no data are passed to the
outside but read from outside and passed to the selected
data line. If one did not read out the whole page before
writing a bit, it's clear that apart from this one written
bit all other information in the page would be lost.
So during a read or write access, all memory cells of
a page are read out and then written back again, even
if they are not required.
Memory chips that can
output four bits simultaneously, for instance, simply
contain four matrices working in parallel. The way that
the matrix is organized has a direct effect on the current
consumption of the chip: The more columns that are present,
the more capacitors have to be recharged with each read
or write access. The result is higher current consumption
(and so greater heat dissipation). Today a multitude
of memory chips have become established that often have
a different matrix organizations. A 16 Mbit chip, for
instance, can be organized as a 4,096 x 4,096 or a 16,384
x 1,024 matrix. For the first example we talk about
12/12 mapping (212 x 212
= 224 = 16,777,216), whereas the second
example would correspond to a 14/10 mapping (214 x 210 = 224
= 16,777,216). Finally, four such memory areas would
make the chip a 64 Mbit chip, organized as 16 Mbit x
4.
From the mapping you
can determine how many bits are necessary for the row
address and how many for the column address. An asymetric
14/10 mapping requires 14 address bits for the rows
(214 = 16,384) and 10 address bits
(210 = 1,024) for the columns. To
save on connections, the memory chip is given only as
many pins for addressing as are absolutely necessary,
thus in this case 14. So to address a memory cell this
address connection is multiplexed, meaning the data
are transferred successively: First the page address
is transmitted with an activated /RAS signal (Row Address
Strobe) and subsequently the column address with an
activated /CAS signal (Column Address Strobe).
These are the basic principles
of dynamic memory. The information throughput therefore
appears fairly low at first, as row and column addresses
have to be sent to the memory chip for each read or
write access. So quite early on thoughts turned to how
memory throughput could be increased so the CPU (Central
Processing Unit, which performs the actual calculations)
wouldn't be starved. If one takes a look at CPU code
in RAM, it quickly becomes apparent that this is usually
stored sequentially. As a consequence, the data requested
by the CPU often lie in one page, which makes it unnecessary
to transmit the page address with the /RAS signal each
time. Instead this is transmitted only once and stored
on the RAM chip. During this time the memory logic holds
the /RAS signal active. The memory chip then knows that
fast accesses to the RAM will follow. Synchronously
with the /CAS signal for column addressing the memory
chip now only receives the column address; thanks to
this a lot of time can be saved with these so-called
burst accesses. DRAMs that work in this manner are called
Page Mode DRAMs, as they can provide data faster following
a page hit.
The next logical consequence
is that the read and write amplifiers do not have to
write back the just-read data to the memory matrix after
each access, but only after a page change. Thanks to
this the RAS pre-charge time is done away with in burst
mode, which reduces the CAS cycle time. The Page Mode
DRAMs optimized in this manner are known as Fast Page
Mode DRAMs (FPM-DRAM), which found their way into the
TT and Falcon, for instance.
But FPM-DRAMs still have
the disadvantage that the /CAS signal determines how
long a read cycle takes. Due to this there is no possibility
of passing the column address of the following access
in the meantime. This can only happen after the data
on the data bus are written back. And it is just this
point that is tackled by EDO-RAMs (Extended Data Out).
Here the /OE signal has the task of indicating the end
of a read cycle. Therefore the next address can be transmitted
with the /CAS signal already while the previously read
datum is still present on the data connection. Due to
this light parallelling of the access (overlap) it is
possible to reduce the CAS cycle time from 60 ns for
DRAMs to 40 ns for FPM-RAM and 25 ns for EDO-RAM. As
a result of this, EDO-RAM during burst accesses and
a bus clock of 40 MHz can deliver data at each clock
cycle, while FPM-RAM can only manage this at 25 MHz.
However, it should be noted that the speed gain of EDO
versus FPM only applies during reading. During writing
an overlap is not possible.
Further development of
EDO-RAMs is described as BEDO (Burst Extended Data Out).
As the requested data usually lies both in a page as
well as directly following in a row, one lets the BEDO-DRAMs
do their own thing after passing the column and row
addresses. The memory chip generates the new column
addresses internally with an address generator and outputs
the required data synchronously with the /CAS signal,
which makes an additional reduction of the CAS cycle
time possible. Thanks to this, BEDOs can operate with
a 66 MHz bus clock with a 5-1-1-1 burst and so are distinctly
superior to normal EDOs. But despite the good concept
they never found wide acceptance - particularly due
to fast SDRAM - and so were only available for a short
time.
It is just these SDRAMs
(Synchronous DRAM) that dominate today's memory market.
In basic principle this memory species works exactly
like the memory types described above. Nevertheless
SDRAM is appreciably more flexible and intelligent than
its predecessors. All control signals depend on one
clock, which - according to the age of ther SDRAMs -
may be 66, 100, 133 or at present 166 MHz. The control
signals themselves (/CS, /RAS, /CAS, /WE) should be
understood as a kind of command issuers which tell the
memory exactly what it has to do with various bit-patterns.
The SDRAM stores the present command during a rising
clock edge. It is so advanced in its construction that
after receiving a command it can perform the task assigned
to it by itself, without requiring further external
control signals.
|
Figure 2: Memory types compared:
With refined processes higher memory throughput
can be achieved. |
To adapt the memory to
the requirements of the operating system, there is even
the possibility of setting certain properties of the
memory chip oneself with the mode register. Among other
things one can determine here how many memory accesses
a burst access should consist of. So it is possible
that it consists of 1, 2, 4 or 8 accesses, or that the
whole page is read out at once. The lead-off cycle (the
reading of the first datum during a burst access) however
lasts 5 clock cycles with SDRAMs as well, just like
for previously described memory types. After this however
the data move to or from the external world at system
clock speed. If desired a burst may also be terminated,
or only frozen if it is to be continued after a short
while. The CAS-latency, which can also be set in the
mode register, specifies the number of clock cycles
following the receipt of the column address after which
the memory is to output the first valid data. You can
buy SDRAMs with CL2 and those with CL3. You can see
from the price already that DRAMs with CL2 are always
faster than those with CL3.
A further innovation,
besides the clock signal, is that SDRAMs are made up
internally of at least two banks, which may be addressed
independently. This makes it possible to run certain
actions in parallel, so they no longer slow down the
memory. For instance, one can read from one bank in
burst mode while already executing a command for a pre-charge
for the next access on the other bank. In this way it
is also possible to conceal the pre-charge time or the
5 clock cycles of the lead-off cycle by addressing one
bank while the other is still delivering data.
But besides the actual
memory chips, the memory modules (DIMMs) also contain
a small EEPROM, the SPD-EEPROM (Serial Presence Detect).
This can be read out via the I_C bus and contains information
about the memory chips on the module, for instance their
organization of the access times.
Ater taking a close look
at SDRAMs, we reach the peak of current memory developments:
Here we find the very expensive RAMBUS memory, which
is apparently being used in fewer and fewer computers
and will not be discussed further here. Another memory
type represents a further development of the SDRAM,
even though the changes are not that great. The DDR-SDRAM
(Double Data Rate) which is being talked about is attractive
because it is said to deliver twice the amount of data
of previous DRAMs. However, DDR-SDRAM does not - as
one might suppose - just double the clock rate. Rather
it performs two actions during one clock cycle. While
traditional SDRAMs always synchronize to the rising
edges of the clock signal, this memory makes uses of
both the rising as well as the falling edges for data
and command transfer.
As this memory stands
on rather unsteady legs, it has been given an additional
bi-directional signal (DQS) for control purposes. If
the memory is outputting data then it will indicate
the validity of this data, while if the computer's chip-set
outputs data then it controls the DQS signal.
A further property of
DRAMs, which we have ignored completely till now, is
the refresh. As we saw at the start of the article,
the information of a memory cell is stored in an associated
capacitor. Due to leakage currents however the capacitor
will be discharged again quite quickly, and this cannot
be stopped. This property is just the price that one
has to pay for a high packing density of the memory:
The speed may be seriously affected in these circumstances.
During a refresh the
contents of a page are read out and written back straight
afterwards. Naturally this refresh must occur before
the voltage on the capacitor no longer suffices for
establishing the stored information by the sense amplifiers.
The maximum time that may pass between two refreshes
of the same capacitor is called the refresh period.
Since refreshes are always carried out for a complete
row, one can read from the number of rows how large
the refresh cycle has to be that specifies how many
refreshes have to be executed in a refresh period. Chips
with 211 = 2,048 rows normally have a
2 K refresh, while chips with 4,096 rows have a 4 K
refresh. The row refresh cycle (the average time required
for a row refresh), is fixed at 15.6 µs. With
a 2 K refresh this results in a refresh period of 32
ms (2,048 x 15.6 µs = 32 ms), while a 4 K refresh
takes 64 ms (4,096 x 15.6 µs = 64 ms).
|
Figure 3: Due to leakage currents
the capacitors lose charge. Therefore a
refresh of the memory contents is necessary.
|
|
Figure 4: To perform a refresh
one uses signal combinations that are not
possible with read and write accesses. |
Here too there
are several methods to more or less reduce the importance
of such refresh delays. Roughly one can differentiate
three types of refresh:
- RAS-only refresh:
When addressing the DRAM one proceeds first as for
a read access, but after the row address one does
not transmit a column address, thus leaving the
/CAS signal inactive. After this the DRAM executes
a refresh in the specified row.
- CAS before RAS refresh:
During a normal read/write access the row address
is always passed first via /RAS. However, if the
/CAS signal is activated first, /RAS specifies the
duration of the refresh cycle. Here the row address
no longer has to be transmitted, as the DRAM contains
an internal self-incrementing address counter for
this refresh.
- Hidden refresh:
This type of refresh is actually no longer in use
these days, as high bus frequencies hardly make
its use possible. If /CAS remains active after a
read access, then the hidden refresh is triggered
if a further /RAS impulse follows. Here too the
DRAM contains its own row address counter. However,
hidden refresh only works if the next RAM access
follows the actual refresh - in other words if the
bus frequency is low enough to hide the refresh
between two memory accesses.
Things are simpler with
SDRAMs or DDR-SDRAMs, as one only has to transmit the
command for a refresh and one does not have to worry
about the timing of the individual control signals.
And finally: For 37 years
now, Moore's Law which states that a doubling of processor
capabilities occurs every 18 to 24 months, has remained
valid. And at present one cannot foresee when this will
no longer apply. But what applies to CPUs is equally
true of the capacity of memory chips. Performance as
well as size reduction also follows Moore's Law and
are at present rising steeply. Improved production processes
that permit higher packing densities and higher operating
frequencies will ensure in the future too that the eye
of the needle between the CPU and memory does not become
too small. But the principle behind DRAM is unlikely
to change in the next few years.
|
A microphotograph of a 4 Mbit
chip. The large areas represent the memory
matrices. |
This article
was originally published in German by st-computer magazine,
February 2002, and is reproduced in English with kind
permission.
|