Monday, January 31, 2011

Samsung’s 3x DDR3 SDRAM – 4F2 or 6F2? You Be the Judge..

We recently acquired Samsung’s latest DDR3 SDRAM, allegedly a 3x-nm part. When we did a little research, we found that the package markings K4B2G0846D-HCH9 lined up with a press release from Samsung last year about their 2 Gb 3x-nm generation DRAMs. My colleague at Chipworks, Randy Torrance, popped the lid to take a look, and drafted the following discussion (which, amongst other things, raises the perennial question for us reverse engineers - how do you define a process node in real terms?). Now read on..

The first thing we did was measure the die size. This chip is 35 sq mm, compared to the previous generation 48-nm Samsung 1Gb DDR3 SDRAM, which is 28.6 sq mm. Clearly this 2 Gb die is much smaller than 2X the 48-nm 1 Gb die, so our assumption that we have a 3x nm part looks good so far.

Die Photo of Samsung 3x DDR3 SDRAM

Next we did a bevel-section of the part to take a look at the cell array. We were surprised with what we found. The capacitors are laid out in a square array instead of the more usual hexagonal pattern (see below), and the wordline (WL) and bitline (BL) pitches are both about 96 nm. The usual method of determining DRAM node is to take half the minimum WL or BL pitch. That places this DRAM at the 48-nm process node, the same as the previous Samsung generation of 48 nm. So why does the die size look like it should be a smaller technology? For this we need to look at cell size.

Plan-View TEM image of Capacitors in Samsung 3x-nm SDRAM

But before we get into that we should discuss the DRAM convention of describing the memory cell size in terms of the minimum feature size, F. Historically, DRAM cells have used an 8F2 architecture for many years. This allows for the use of a folded bitline architecture, which helps reduce noise. In order to decrease cell area, companies came out with the first 6F2 cells in 2007; this 6F2 architecture is now used by all major players in the DRAM market. The guys at ICInsights published the plot below in the latest McLean report which nicely illustrates the progress:

DRAM Cell Size Reduction Through the Years

The 48 nm SDRAM has a cell size of ~0.014 sq µm. This new SDRAM has a cell size of 0.0092 sq µm. Clearly this cell is much smaller than the 48 nm generation. If we take the half-WL pitch as the minimum feature size (F), we get an F of 48 nm for this process. The cell area of 0.0092 sq µm is exactly 4 x F, squared, 4F2. Is this the world’s first 4F2 cell? From this point of view it certainly appears so. The cell is four times the size of the minimum feature, squared. But, there are other ways of looking at this.
A 4F2 architecture is defined as having a memory cell at each and every possible location, that being each and every crossing of WL and BL, with the cell being 2F x 2F. This is in fact what we see on this Samsung DRAM, so maybe we are looking at the first 4F2 architecture. But let’s look just a bit closer to be sure.

We compared the poly and active layout under the array between the 48 nm SDRAM and this new one. The images are shown below. As can be seen, both have very similar layouts. The angle of the active silicon (diffusion) direction is about the same. The active areas are ovals. Each diffusion has two wordlines crossing it. There is a gap between all the active areas, such that a third WL does not cross active on this diagonal active direction.

Samsung K4B1G0846F 48nm 1 Gb DDR3 SDRAM,
Poly and Active Area Image under Cell Array

Samsung K4B2G0846D 2Gb DDR3 SDRAM,
Poly Remnants and Active Areas under Cell Array

This new DRAM clearly has a very similar cell layout to the previous one. In both cases the wordlines do not have a transistor under them at every possible location that a transistor would fit. Rather, one of every three possible transistor locations is filled with a break in the diffusion stripe. This is really a better definition of a 6F2 cell, since in a 6F2 architecture 2/3 of the WL/BL intersections are filled with storage cells. As we noted above, a 4F2 cell really should have transistors at every possible transistor location.

When we look at the pitch of the diffusions in this new DRAM, we see it is much tighter. In fact, along the WL direction the diffusion pitch is now 64 nm, whereas in the 48 nm SDRAM this pitch was 96 nm. So if you take half the minimum pitch in the chip as the node, this is a 32-nm part (ITRS 2009 still defines F as half the contacted M1 pitch, which would be 48 nm).

So, do we have a 32 nm node, and a 6F2 architecture? Maybe. The only issue is that if we use 32 nm as F, then when we plug that into the 6F2 equation we get 0.0061 um2 as the cell size. However, the cell size is actually 0.0092 um2. If we use that number and use the equation to calculate F we find that F=39nm. So… do we call this a 32 nm or a 39 nm node? It depends how you calculate it - either way it's a 3x!

So, although it’s a little disappointing that I don’t think we can announce the worlds first 4F2 DRAM, we can announce the worlds smallest node, 32 or 39 nm, production 6F2 DRAM.

Samsung have had to put in a few process tweaks to squeeze the cells into the much smaller area, mostly at the transistor and STI level. We’re still looking at it, so we may not have the whole story yet, but some of what we’ve seen so far is:

• Ti-? (likely TiN)-gate buried wordline transistors

• STI filled with nitride in the array

• Bitlines at the same level as peripheral transistors

Our up-coming reports will give many more details on this fascinating part.


Thursday, January 20, 2011

Common Platform Goes Gate-Last – at Last!

At the IBM/GLOBALFOUNDRIES/Samsung Common Platform Technology Forum on Tuesday, Gary Patton of IBM announced that the Platform would be moving to a gate-last high-k, metal-gate (HKMG) technology at the 20-nm node.

At the 45- and 32-nm nodes there has been a dichotomy between gate-last as embodied by Intel, TSMC, and UMC, and gate-first, promoted by the Common Platform and others such as Panasonic. (Though, to be realistic, Intel’s is the only HKMG we’ve seen so far, and the only 32-nm product.)

The split puzzled me a bit, at least for high-performance processes, since Intel have clearly shown that for PMOS, compressive stress using embedded SiGe source/drains is a really big crank that is enhanced by removal of the dummy polysilicon gate in the gate-last sequence. In fact, in their 32-nm paper at IEDM 2009 [1], the PMOS linear drive current exceeds NMOS, and the saturated drive current (Idsat) is 85% of NMOS. This trend is shown below:



Intel Drive Currents at the Different Nodes [1]

 We can clearly see the narrowing between NMOS and PMOS drive currents at the 45-nm node, namely with the start of replacement gate (gate-last) technology.

So it seems obvious that to have high-performance PMOS, gate-last is the way to go; admittedly IBM and their allies have been using compressive nitride for PMOS, which Intel never have (at least to my knowledge), but there are limitations to that – now that contacted gate pitch has shrunk to less than 200 nm, there is not much room to get the nitride close to the channel - a problem that will increase with further shrinks.

So in a way it’s not surprising that the Platform has made the change; nitride stress is running out of steam, and gate replacement offers improved compressive stress for PMOS, and other stress techniques for NMOS (Intel builds some stress in with the gate metal).

Gary Patton said that IBM have been evaluating gate-last in parallel with gate-first since 2001, and it’s logical that they and their partners should. Both GLOBALFOUNDRIES and Samsung have published on gate-last, so there has been some evidence of checking out the parallel paths.

GLOBALFOUNDRIES PMOS and NMOS (right) Gate-Last Transistors [2]
 
Samsung Gate-Last Transistor [3]

Patton said that they selected gate-first in 2004; judging by their papers, Intel took their decision in 2003. The rationale that he put forward for the change to gate-last involved four points:
  • Density – gate-first has higher density, since gate-last requires restricted design rules (RDRs). That prevents orthogonal layout, requiring local interconnect; but at 20-nm RDRs are needed for lithography, so that advantage disappears.
  • Scaling – it’s easier to scale without having to cope with RDRs; at 20-nm there’s no choice.
  • Process simplicity – it’s obviously easier to shrink if you can keep the same process architecture, whether it be to 32- or 20-nm
  • Power/performance – the gate last structure allows strain closer to the channel, increasing performance; but fully contacted source/drains increase parasitic capacitance, slowing things down. According to Patton these net each other out for a high-performance process, making the gate first/last decision neutral. For low-power processes, strain is not used at the 45/32-nm nodes, so gate-first gives better power/performance metrics.  At 20-nm strain has to be used for low-power, and with the need for RDRs and local interconnect, the balance shifts in favour of gate-last.
So it appears that for the Platform the equation between pure transistor performance, process convenience, and power/performance made gate-first the choice at 45/32/28-nm, but at 20-nm the balance changes to make gate-last the way to go. That was likely influenced by the adoption of immersion lithography between 65- and 45-nm, which reduced the need for RDRs.

Intel presumably did similar sums during their 45-nm development, and figured that using RDRs would save them the cost of going to wet lithography at that node, and at the same time adopting gate-last technology would give them a manufacturing advantage. (My speculation is that they had also concluded that their version of gate-last may be more complicated to start up, but would prove to be more manufacturable than struggling with the instabilities that seem to go with the gate-first work-function materials. I guess they’ve proved that!)

Interestingly, now that Intel is using immersion lithography at 32-nm, they have loosened up on the RDRs, there’s more flexibility in the layout than there appeared to be at 45-nm.

I have to congratulate the Common Platform marketing guys on putting up a live webstream of the Technology Forum – I couldn’t get to the event itself, so wouldn’t have been able to comment without it. The stream will be available until April 29, so if you want to see Gary Patton for yourself, you can.


Screen Shot of Gary Patton of IBM at the Common Platform Technology Forum

Unfortunately, talking to my journalist colleagues, no slide sets were available, even at the press conference, so watching the stream occasionally leaves you puzzled as to what’s being talked about; and as you can see from the screen shot above, the room screens were carefully blanked out for the camera. Also, the breakout sessions in the afternoon were not streamed, or if they were, not recorded for later viewing. Still, kudos to the Platform for the live stream we did have, and the pre-recorded panel sessions!

From Gary’s and other comments at the Forum, it’s clear that the first HKMG products will be launched at 32-nm, and 28-nm will be following along fairly soon after. We can’t wait to see some!

For those waiting for more details if last year’s IEDM, I will finish my review; there were 36 sessions with 212 papers, so not a small task to do conscientiously, the Christmas break interrupted things, and there have been distractions since (like the Forum!), but I will get there!

References:

  1. P. Packan et al., High Performance 32nm Logic Technology Featuring 2nd Generation High-k + Metal Gate Transistors, IEDM 2009, paper 28.4, pp. 659 – 662
  2. M. Horstmann et al., Advanced SOI CMOS Transistor Technologies for High-Performance Microprocessor Applications, CICC 2009, paper 8.3, pp.149 – 152
  3. K-Y. Lim et al., Novel Stress-Memorization-Technology (SMT) for High Electron Mobility Enhancement of Gate Last High-k/Metal Gate Devices, IEDM 2010, paper 10.1, pp. 229 – 232