Wednesday, July 29, 2009

Memory Packaging

First of all, it should be noted that each motherboard supports memory based on the speed of the frontside bus (FSB) and the memory’s form factor.

So, for example, if the motherboard’s FSB is rated at a maximum speed of 533MHz, and you install memory that is rated at 300Mhz, the memory will operate at only 300MHz, thus making the computer operate slower than what it could.

In their specifications, most motherboards list which type(s) of memory they support as well as its maximum speeds.

The memory slots on a motherboard are designed for particular module form factors or styles. In case you run across the older terms, DIP, SIMM, and SIPP are obsolete memory packages.

Terms like double-sided/single-sided memory and dual-bank/single-bank memory are often confused. When speaking of sides, it is correct to refer to the two physical sides of the
module and whether they contain chips.

However, that says nothing of the number of banks the module satisfies. Satisfying two banks, or channels more often, as in the case of the DDR family, can be accomplished with single-sided memory. The most popular form factors for primary memory modules today are these:

  1. DIMM
  2. RIMM
  3. SoDIMM
  4. MicroDIMM
DIMM

One type of memory package is known as a DIMM. As mentioned earlier in this chapter, DIMM stands for Dual Inline Memory Module.

DIMMs are 64-bit memory modules that are used as a package for the SDRAM family: SDRAM, DDR, and DDR2.

The term dual refers to the fact that, unlike their SIMM predecessors, DIMMs differentiate the functionality of the pins on one side of the module from the corresponding pins on the other side.

With 84 pins per side, this makes 168 independent pins on each standard SDRAM module.

The DIMM used for DDR memory has a total of 184 pins and a single keying notch, while the DIMM used for DDR2 has a total of 240 pins, one keying notch, and an aluminum cover for both sides, called a heat spreader, designed like a heat sink to dissipate heat away from the memory chips and prevent overheating.

RIMM

Not an acronym, RIMM is a trademark of Rambus Inc., perhaps a clever play on the acronym DIMM, a competing form factor.

A RIMM is a custom memory module that varies in physical specification based on whether it is a 16-bit or 32-bit module.

The 16-bit modules have 184 pins and two keying notches, while 32-bit modules have 232 pins and only one keying notch, reminiscent of the trend in SDRAM-to-DDR evolution.

The dual-channel architecture can be implemented utilizing two separate 16-bit RIMMs or the newer 32-bit single-module design.

Motherboards with the 16-bit single- or dual-channel implementation provide four RIMM slots that must be filled in pairs, while the 32-bit versions provide two RIMM slots that can be filled one at a time.

A 32-bit RIMM has two 16-bit modules built in and requires only a single motherboard slot, albeit a physically different slot. So you must be sure of the module your motherboard accepts before upgrading.

DIMM

One type of memory package is known as a DIMM.

As mentioned earlier in this chapter, DIMM stands for Dual Inline Memory Module. DIMMs are 64-bit memory modules that are used as a package for the SDRAM family: SDRAM, DDR, and DDR2.

The term dual refers to the fact that, unlike their SIMM predecessors, DIMMs differentiate the functionality of the pins on one side of the module from the corresponding pins on the other side.

With 84 pins per side, this makes 168 independent pins on each standard SDRAM module.

The DIMM used for DDR memory has a total of 184 pins and a single keying notch, while
the DIMM used for DDR2 has a total of 240 pins, one keying notch, and an aluminum cover
for both sides, called a heat spreader, designed like a heat sink to dissipate heat away from the
memory chips and prevent overheating.

RIMM

Not an acronym, RIMM is a trademark of Rambus Inc., perhaps a clever play on the acronym DIMM, a competing form factor.

A RIMM is a custom memory module that varies in physical specification based on whether it is a 16-bit or 32-bit module.

The 16-bit modules have 184 pins and two keying notches, while 32-bit modules have 232 pins and only one keying notch, reminiscent of the trend in SDRAM-to-DDR evolution.

The dual-channel architecture can be implemented utilizing two separate 16-bit RIMMs or the newer 32-bit single-module design.

Motherboards with the 16-bit single- or dual-channel implementation provide four RIMM slots that must be filled in pairs, while the 32-bit versions provide two RIMM slots that can be filled one at a time.

A 32-bit RIMM has two 16-bit modules built in and requires only a single motherboard slot, albeit a physically different slot.

So you must be sure of the module your motherboard accepts before upgrading.

MicroDIMM

The newest, and smallest, RAM form factor is the MicroDIMM.

The MicroDIMM is an extremely small RAM form factor.

In fact, it is over 50 percent smaller than a SoDIMM, only 45.5 millimeters (about 1.75 inches) long and 30 millimeters (about 1.2 inches—a bit bigger than a quarter) wide.

It was designed for the ultralight and portable subnotebook style of computer (like those based on the Transmeta Crusoe processor).

These modules have 144 pins or 172 pins and are similar to a DIMM in that they use a 64-bit data bus.

Often employed in laptop computers.

EXAMPLES OF RAM

DIMM




DRAM

DRAM is dynamic random access memory. (This is what most people are talking about when they mention RAM.)

When you expand the memory in a computer, you are adding DRAM chips.

You use DRAM to expand the memory in the computer because it’s cheaper than any other type of memory.

Dynamic RAM chips are cheaper to manufacture than other types because they are less
complex.

Dynamic refers to the memory chips’ need for a constant update signal (also called a
refresh signal) in order to keep the information that is written there.

If this signal is not received every so often, the information will cease to exist. Currently, there are four popular implementations of DRAM: SDRAM, DDR, DDR2, and RAMBUS.

SDRAM

The original form of DRAM had an asynchronous interface, meaning that it derived its clocking
from the actual inbound signal, paying attention to the electrical aspects of the waveform, such
as pulse width, to set its own clock to synchronize on the fly with the transmitter.

Synchronous DRAM (SDRAM) shares a common clock signal with the transmitter of the data.

The computer’s system bus clock provides the common signal that all SDRAM components use for each step to be performed.

This characteristic ties SDRAM to the speed of the FSB and the processor, eliminating the
need to configure the CPU to wait for the memory to catch up.

Every time the system clock ticks, one bit of data can be transmitted per data pin, limiting the bit rate per pin of SDRAM to the corresponding numerical value of the clock’s frequency.

With today’s processors interfacing with memory using a parallel data-bus width of 8 bytes (hence the term 64-bit processor), a 100MHz clock signal produces 800MBps.

That’s megabytes per second, not megabits. Such memory is referred to as PC100, because throughput is easily computed as eight times the rating.

DDR

Double Data Rate (DDR) SDRAM earns its name by doubling the transfer rate of ordinary SDRAM by double-pumping the data, which means transferring it on both the rising and
falling edges of the clock signal.

This obtains twice the transfer rate at the same FSB clock frequency. It’s the rising clock frequency that generates heating issues with newer components, so keeping the clock the same is an advantage.

The same 100MHz clock gives a DDR SDRAM system the impression of a 200MHz clock in comparison to a single data rate (SDR) SDRAM system.

You can use this new frequency in your computations or simply remember to double your results for SDR calculations, producing DDR results.

For example, with a 100MHz clock, two operations per cycle, and 8 bytes transferred per operation, the data rate is 1600MBps.

Now that throughput is becoming a bit tricker to compute, the industry uses this final figure to name the memory modules instead of the frequency, which was used with SDR.

This makes the result seem many times better, while it’s really only twice as good. In this example, the module is referred to as PC1600.

The chips that go into making PC1600 modules are named after the perceived double-clock frequency: DDR-200.

DDR2

Think of the 2 in DDR2 as yet another multiplier of 2 in the SDRAM technology, using a lower peak voltage to keep power consumption down (1.8V vs. the 2.5V of DDR and others).

Still double-pumping, DDR2, like DDR, uses both sweeps of the clock signal for data transfer.

Internally, DDR2 further splits each clock pulse in two, doubling the number of operations it
can perform per FSB clock cycle.

Through enhancements in the electrical interface and buffers, as well as through adding off-chip drivers, DDR2 nominally produces four times what SDR is capable of producing.

However, DDR2 suffers from enough additional latency over DDR that identical throughput ratings find DDR2 at a disadvantage.

Once frequencies develop for DDR2 that do not exist for DDR, however, DDR2 could become the clear SDRAM leader, although DDR3 is nearing release.

Continuing the preceding example and initially ignoring the latency issue, DDR2 using a 100MHz clock transfers data in four operations per cycle and still 8 bytes per operation, for a total of 3200MBps.

Just like DDR, DDR2 names its chips based on the perceived frequency. In this case, you would be using DDR2-400 chips.

DDR2 carries on the final-result method for naming modules but cannot simply call them PC3200 modules because those already exist in the DDR world. DDR2 calls these modules PC2-3200.

The latency consideration, however, means that DDR’s PC3200 offering is preferable to DDR2’s PC2-3200.

After reading the “RDRAM” section, consult Table 1.2 below , which summarizes how each technology in the “DRAM” section would achieve a transfer rate of 3200MBps, even if only theoretically.

For example, SDR PC400 doesn’t exist.











RDRAM

Rambus DRAM, or Rambus Direct RAM (RDRAM), named for the company that designed
it, is a proprietary synchronous DRAM technology.

RDRAM can be found in fewer new systems today than just a few years ago.

This is because Intel once had a contractual agreement with Rambus to create chipsets for the motherboards of Intel and others that would primarily use RDRAM in exchange for special licensing considerations and royalties from Rambus.

The contract ran from 1996 until 2002.

In 1999, Intel launched the first motherboards with RDRAM support.

Until then, Rambus could be found mainly in gaming consoles and home theater components.

RDRAM did not impact the market as Intel had hoped, and so motherboard manufacturers got around Intel’s obligation by using chipsets from VIA Technologies, leading to the rise of that company.

Although other specifications preceded it, the first motherboard RDRAM model was known as PC800.

As with non-RDRAM specifications that use this naming convention, PC800 specifies that, using a faster 400MHz clock signal and double-pumping like DDR/DDR2, an effective frequency of 800MHz and a transfer rate of 800Mbps per data pin are created.

PC800 uses only a 16-bit (2-byte) bus called a channel, exchanging a 2-byte packet during each read/write cycle, still bringing the overall transfer rate to 1600MBps per channel because of the much higher clock rate. Modern chipsets allow two 16-bit channels to communicate simultaneously for the same read/ write request, creating a 32-bit dual-channel.

Two PC800 modules in a dual-channel configuration produce transfer rates of 3200MBps.

Today, RDRAM modules are also manufactured for 533MHz and 600MHz bus clock frequencies and 32-bit dual-channel architectures.

Termed PC1066 and PC1200, these models produce transfer rates of 2133 and 2400MBps per channel, respectively, making 4266 and 4800MBps per dual-channel.

Rambus has road maps to 1333 and 1600MHz models.

The section “RIMM” in this chapter details the physical details of the modules. Despite RDRAM’s performance advantages, it has some drawbacks that keep it from taking over the market.

Increased latency, heat output, complexity in the manufacturing process, and cost are the primary shortcomings.

PC800 RDRAM had a 45ns latency, compared to only 7.5ns for PC133 SDR SDRAM.

The additional heat that individual RDRAM chips put out led to the requirement for heat sinks on all modules.

High manufacturing costs and high licensing fees led to triple the cost to consumers over SDR, although today there is more parity between the prices.

In 2003, free from its contractual obligations to Rambus, Intel released the i875P chipset. This new chipset provides support for a dual-channel platform using standard PC3200 DDR
modules.

Now, with 16 bytes (128 bits) transferred per read/write request, making a total transfer rate of 6400MBps, RDRAM no longer holds the performance advantage it once did.

SRAM

The S in SRAM stands for static.

Static random access memory doesn’t require a refresh signal
like DRAM does.

The chips are more complex and are thus more expensive.

However, they are faster. DRAM access times come in at 60 nanoseconds (ns) or more; SRAM has access times as fast as 10ns.

SRAM is often used for cache memory.

ROM

ROM stands for read-only memory.

It is called read-only because the original form of this memory could not be written to.

Once information had been written to the ROM, it couldn’t be changed. ROM is normally used to store the computer’s BIOS, because this information normally does not change very often.

The system ROM in the original IBM PC contained the power-on self-test (POST), Basic Input/Output System (BIOS), and cassette BASIC.

Later IBM computers and compatibles include everything but the cassette BASIC.

The system ROM enables the computer to “pull itself up by its bootstraps,” or boot (start the operating system).

Through the years, different forms of ROM were developed that could be altered.

The first generation was the programmable ROM (PROM), which could be written to for the first time in the field, but then no more.

Following the PROM came erasable PROM (EPROM), which was able to be erased using ultraviolet light and subsequently reprogrammed.

These days, our flash memory is a form of electrically erasable PROM (EEPROM), which does not require UV light, but rather a slightly higher than normal electrical pulse, to erase its contents.

CMOS

CMOS is a special kind of memory that holds the BIOS configuration settings.

CMOS memory is powered by a small battery, so the settings are retained when the computer is shut off.

The BIOS starts with its own default information and then reads information from the CMOS, such as which hard drive types are configured for this computer to use, which drive(s) it should search for boot sectors, and so on.

Any conflicting information read from the CMOS overrides the default information from the BIOS.

CMOS memory is usually not upgradable in terms of its capacity and is very often integrated into the modern BIOS chip.

Identifying Purposes and Characteristics of Memory

“More memory, more memory, I don’t have enough memory!” Today, memory is one of the most popular, easy, and inexpensive ways to upgrade a computer.

As the computer’s CPU works, it stores information in the computer’s memory.

The rule of thumb is the more memory a computer has, the faster it will operate.

To identify memory within a computer, look for several thin rows of small circuit boards sitting vertically, packed tightly together near the processor.

Location of memory within a system
Parity












Parity checking is a rudimentary error-checking scheme that lines up the chips in a column and divides them into an equal number of bits, numbered starting at 0.

All the number n bits, one from each chip, form a numerical set. If even parity is used, for example, the number of bits in the set is counted up, and if the total comes out even, then the parity bit is set to 0, because the count is already even. If it comes out odd, then the parity bit is set to 1 to even up the count.

You can see that this is effective only for determining if there was a blatant error in the set of bits, but there is no indication as to where the error is and how to fix it. This is error checking, not error correction.

Finding an error can lock up the entire system and display a memory parity error. Enough of these errors and you need to replace the memory.

If that doesn’t fix the problem, good luck. In the early days of personal computing, almost all memory was parity-based. Compaq was one of the first manufacturers to employ non-parity RAM in their mainstream systems.

As quality has increased over the years, parity checking in the RAM subsystem has become rarer.

If parity checking is not supported, there will generally be fewer chips per module, usually one less per column of RAM.

The next step in the evolution of memory error detection is known as Error Checking and
Correcting (ECC).

If memory supports ECC, check bits are generated and stored with the data. An algorithm is performed on the data and its check bits whenever the memory is accessed. If the result of the algorithm is all zeros, then the data is deemed valid and processing
continues.

ECC can detect single- and double-bit errors and actually correct single-bit errors.

In the following sections, we’ll outline the four major types of computer memory—DRAM, SRAM, ROM, and CMOS—as well as memory packaging.

Identifying Purposes and Characteristics of Memory


Identifying Purposes and Characteristics of Processors


The role of the CPU, or central processing unit, is to control and direct all the activities of the computer using both external and internal buses.

It is a processor chip consisting of an array of millions of transistors.

Older CPUs are generally square, with contacts arranged in a Pin Grid Array (PGA).

Prior to 1981, chips were found in a rectangle with two rows of 20 pins known as a Dual Inline Package (DIP).

There are still integrated circuits that use the DIP form factor.

However, the DIP form factor is no longer used for PC CPUs.

Most CPUs use either the PGA or the Single Edge Contact Cartridge (SECC) form factor.

SECC is essentially a PGA-type socket on a special expansion card.

As processor technology grows and motherboard real estate stays the same, more must
be done with the same amount of space.

To this end, the Staggered PGA (SPGA) layout was developed.

An SPGA package arranges the pins in what appears to be a checkerboard pattern, but if you angle the chip diagonally, you’ll notice straight rows, closer together than the rightangle rows and columns of a PGA.

This feature allows a higher pin count per area.

You can easily identify which component inside the computer is the CPU because it is a large square lying flat on the motherboard with a very large heat sink and fan.

Or if the CPU is installed in a Slot 1 motherboard, it is a large 1⁄2-inch-thick expansion card with a large heat sink and fan integrated into the package.

It is located away from the expansion cards.

Notice how prominent the CPU is.















The location of a CPU inside a typical computer

Hyperthreading

This term refers to Intel’s Hyper-Threading Technology (HTT).

HTT is aform of simultaneous multithreading (SMT).

SMT takes advantage of a modern CPU’s superscalar architecture.

Superscalar processors are able to have multiple instructions operating on separate data in parallel.

HTT-capable processors appear to the operating system to be two processors.

As a result, the operating system can schedule two processes at the same time, as in the case of symmetric multiprocessing (SMP), where two or more processors use the same system resources.

In fact, the operating system must support SMP in order to take advantage of HTT.

If the current process stalls because of missing data caused by, say, cache or branch prediction issues, the execution resources of the processor can be reallocated for a different process that is ready to go, reducing processor downtime.

Multicore

A processor that exhibits a multicore architecture has two completely separate processors
in the same package.

Whether there are multiple dies in the same package or the singledie contains the equivalent circuitry of multiple processors, the operating system can treat the single processor as if it were two separate processors.

As with HTT, the operating system must support SMP.

In addition, SMP is not an enhancement if the applications run on the SMP system are not written for parallel processing.

Dual-core processors are a common specific case for the multi-core technology.

Throttling

CPU throttling, or clamping, is the process of controlling how much CPU time is spent on an application. By controlling how individual applications use the CPU, all applications are treated more fairly.

The concept of application fairness becomes a particular issue in server environments, where each application could represent the efforts of a different user.

Thus, fairness to applications becomes fairness to users, the real customers.

Clients of today’s terminal servers benefit from CPU throttling.


Microcode

Microcode is the set of instructions (known as an instruction set) that make up the various microprograms that the processor executes while carrying out its various duties.

The Multimedia Extensions (MMX) microcode is a specialized example of a separate microprogram that carries out a particular set of functions.

Microcode is at a much lower level than the code that makes up application programs.

Each instruction in an application will end up being represented by many microinstructions, on average.

The MMX instruction set is incorporated into most modern CPUs from Intel and others. MMX came about as a way to take much of the multimedia processing off the CPU’s hands, leaving the processor to other tasks.

Think of it as sort of a coprocessor for multimedia, much like the floating-point unit (FPU) is
a math coprocessor.

Overclocking

Overclocking your CPU offers increased performance, on par with a processor designed to operate at the overclocked speed.

However, unlike with the processor designedto run that fast, you must make special arrangements to ensure that an overclocked CPU does not destroy itself from the increased heat levels.

An advanced cooling mechanism, such as liquid cooling, might be necessary to avoid losing the processor and other components.

Cache

Cache is a very fast chip memory that is used to hold data and instructions that are most likely to be requested next by the CPU.

The cache located on the CPU is known as L1 cache and is generally smaller in comparison to L2 cache, which is located on the motherboard.

When the CPU requires outside information, it believes it requests that information from RAM.

The cache controller, however, intercepts the request and consults its tag RAM to discover if the requested information is already cached, either at L1 or L2.

If not, a cache miss is recorded and the information is brought back from the much slower RAM, but this new information sticks to the L1 and L2 cache on its way to the CPU from RAM.

Voltage Regulator Module

The voltage regulator module (VRM) is the circuitry that sends a standard voltage level to the portion of the processor that is able to send a signal back to the VRM concerning the voltage level the CPU needs.

After receiving the signal, the VRM truly regulates the voltage to steadily provide the requested voltage.

Speed

The speed of the processor is generally described in clock frequency (MHz or GHz).

There can be a discrepancy between the advertised frequency and the frequency the CPU uses
to latch data and instructions through the pipeline.

This disagreement between the numbers comes from the fact that the CPU is capable of splitting the clock signal it receives from the oscillator into multiple regular signals for its own use.

32- and 64-Bit System Bus

The set of data lines between the CPU and the primary memory of the system can be 32 or 64 bits wide, among other widths.

The wider the bus, the more data that can be processed per unit of time, and hence the more work that can be performed.

Internal registers in the CPU might be only 32 bits wide, but with a 64-bit system bus, two separate pipelines can receive information simultaneously.

Firmware

Firmware is the name given to any software that is encoded into a read-only memory (ROM) chip and can be run without extra instructions from the operating system.

Most computers use firmware in some limited sense.

The best example of firmware is a computer’s CMOS setup program, which is used to set the options for the computer’s BIOS (time/date and boot options, for example).

Also, some expansion cards, such as Small Computer System Interface (SCSI) cards, use their own firmware utilities for setting up peripherals.

Jumpers and DIP Switches

The last components of the motherboard we will discuss in this section are jumpers and DIP
switches.

These two devices are used to configure various hardware options on the motherboard.
For example, some processors use different voltages (1.5, 3.3, or 5 volts).

You must set the motherboard to provide the correct voltage for the processor it is using.

You do so by changing a setting on the motherboard with either a jumper or a DIP switch. Motherboards often have either several jumpers or one bank of DIP switches.

Individual jumpers are often labeled with the moniker JPx (where x is the number of the jumper).