发JSSC上的最新的microprocessor和memory的papers,都是最近3年的

corespirit36 · 发表于 2008-7-19 00:01:53

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？注册

×

大家有兴趣下载看看吧,JSSC上的这些论文都反映了当今世界上最先进的microprocessor设计水平。大家好好钻研，更要打好基础，扎实钻研，争取让我国IC水平最终赶超老美，干掉小日本！这些论文我给出Title和 Abstract，大家看好了再下，以免下载了自己不感兴趣的。

1 Design and Implementation of a Configurable Heterogeneous Multicore SoC With Nine CPUs and Two Matrix Processors

Abstract-A multicore system-on-chip (SoC) has been developed for various applications (recognition, inference, measurement, control,and security) that require high-performance processing and low power consumption. This SoC integrates three types of synthesizable processors: eight CPUs (M32R), two multi-bank matrix processors (MBMX), and a controller (M32C). These processors operate at 1 GHz, 500 MHz, and 500 MHz, respectively. These three types of processors are interconnected on this chip with a high-bandwidth multi-layer system bus. The eight CPUs are connected to a common pipelined bus using a cache coherence mechanism.Additionally, a 512-kB L2 cache memory is shared by the eight CPUs to reduce internal bus traffic. A multi-bank matrix processor with 2-read/1-write calculation and background I/O operation has been adopted. The 1-GHz CPU is realized using a delay management network which consists of delay monitors that can be applied for any kind of application or process technology. Our configurable heterogeneous architecture with nine CPUs and two matrix processors reduces power consumption by 45%.

abbr_67dfc2120e0b410b91477e2aaa335cc1.pdf (3.31 MB , 下载次数: 127 )

corespirit36 · 发表于 2008-7-19 00:05:35

2 Design and Implementation of the POWER6 Microprocessor

Abstract-The IBM POWER6 processor is a dual-core,341 mm2, 790 million transistor chip fabricated using IBM’s 65 nm partially-depleted SOI process. Capable of running at
frequencies up to 5 GHz in high performance applications, it can also operate under 100 W for power-sensitive applications.Traditional power-intensive and deep-pipelining techniques used in high frequency design were abandoned in favor of more power efficient circuit design methodologies. The complexity and size of POWER6, together with its high operating frequency, presented a number of significant challenges for its multi-site team to complete the design on an aggressive schedule. This paper describes some
of the circuit methodology and implementation innovations used in the development of POWER6, with particular emphasis on custom, synthesized, register file and SRAM design, as well as the electrical characterizations performed in the lab.

Design and Implementation of the POWER6 Microprocessor.pdf (2.28 MB , 下载次数: 93 )

corespirit36 · 发表于 2008-7-19 00:09:32

3 A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing

Abstract-A 34-million transistor stream processor system-on chip (SoC) for signal, image, and video processing contains 80 parallel integer ALUs organized into 16 data-parallel lanes with a 5-ALU VLIW per lane, two CPU cores, and I/Os. Implemented in a 0.13 um CMOS technology, sixteen 800 MHz data-parallel lanes combine to deliver performance of 512 8-bit GOPS or 256 16-bit GOPS, or 128 billion 16-bit multiply-accumulates per second
GMACs), with a power efficiency of 82 pJ/MAC.

A Programmable 512 GOPS Stream Processor for Signal, Image, and Video Processing.pdf (2.98 MB , 下载次数: 87 )

corespirit36 · 发表于 2008-7-19 00:13:22

6 Exploring Variability and Performance in a Sub-200-mV Processor

Abstract—In this study, we explore the design of a subthreshold processor for use in ultra-low-energy sensor systems. We describe an 8-bit subthreshold processor that has been designed with energy efficiency as the primary constraint. The processor, which is functional below Vdd=200 mV, consumes only 3.5 pJ/inst at Vdd=350 mV and, under a reverse body bias, draws only 11nW at Vdd=160 mV. Process and temperature variations in subthreshold circuits can cause dramatic fluctuations in performance and energy consumption and can lead to robustness problems.We investigate the use of body biasing to adapt to process and temperature variations. Test-chip measurements show that body biasing is particularly effective in subthreshold circuits and can eliminate performance variations with minimal energy penalties. Reduced performance is also problematic at low voltages, so we investigate global and local techniques for improving performance
while maintaining energy efficiency.

Exploring Variability and Performance in a Sub-200-mV Processor.pdf (1.26 MB , 下载次数: 46 )

corespirit36 · 发表于 2008-7-19 00:17:27

4 An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

Abstract-This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8 10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s.
The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques.In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS.pdf (4.54 MB , 下载次数: 55 )

corespirit36 · 发表于 2008-7-19 00:20:19

5 An 80 nm 4 Gb/s/pin 32 bit 512 Mb GDDR4 Graphics DRAM With Low Power and Low Noise Data Bus Inversion

Abstract-A 4 Gb/s/pin 32 bit 512 Mb GDDR4 (Graphics Double Data Rate 4) SDRAM was implemented by using an 80 nm CMOS process. It employs a data bus inversion (DBI) coding to overcome the bottleneck of a parallel single-ended signaling, a power consumption of I/O, power supply noise, and crosstalk. Both DBI AC and DC modes are combined to a single circuit by eliminating the feedback path of a conventional DBI AC circuit while achieving high-speed operation. The proposed DBI circuit uses an analog majority voter insensitive to mismatch for small area and delay. Ron tuning further improves the voltage and time margin by adding a user-supplied offset to auto-calibrated Ron. In addition, a dual duty cycle corrector (DCC) is used to reduce duty error and jitter by averaging two outputs of two DCCs. Measured results show that DBI DC coding reduces the peak-to-peak jitter from 65.5 ps to 44.5 ps and the voltage fluctuation from 183 mV to 115 mV at the data rate of 4 Gb/s with the 2 V.

abbr_eff70d192f574648dc96162ea6dbd60e.pdf (2.98 MB , 下载次数: 113 )

corespirit36 · 发表于 2008-7-19 00:24:56

7 Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip

Abstract-The second in the Niagara series of processors (Niagara2) from Sun Microsystems is based on the power-efficient chip multi-threading (CMT) architecture optimized for
Space, Watts (Power), and Performance (SWaP) [SWap Rating= Performance (Space Power)]. It doubles the throughput performance and performance/watt, and provides 10X improvement in floating point throughput performance as compared to UltraSPARC T1 (Niagara1). There are two 10 Gb Ethernet ports on chip. Niagara2 has eight SPARC cores, each supporting concurrent execution of eight threads for 64 threads total. Each SPARC core has a Floating Point and Graphics unit and an advanced Cryptographic unit which provides high enough bandwidth to run the two 10 Gb Ethernet ports encrypted at wire
speeds. There is a 4 MB Level2 cache on chip. Each of the four on-chip memory controllers controls two FBDIMM channels.Niagara2 has 503 million transistors on a 342 mm2 die packaged in a flip-chip glass ceramic package with 1831 pins. The chip is built in Texas Instruments’ 65 nm 11LM triple-Vt CMOS process.It operates at 1.4 GHz at 1.1 V and consumes 84 W.

本篇强烈推荐！

Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip.pdf (3.97 MB , 下载次数: 94 )

corespirit36 · 发表于 2008-7-19 00:28:25

8 A 70 nm 16 Gb 16-Level-Cell NAND flash Memory

Abstract-A 16 Gb 16-level-cell (16LC) NAND flash memory using 70 nm Design Rule has been developed [1]. This 16LC NAND flash memory can store 4 bits in a cell which enabled
double bit density comparing to 4-level-cell (4LC) NAND flash,and quadruple bit density comparing to single-bit (SLC) NAND flash memory with the same design rule. New programming method suppresses the floating gate coupling effect and enabled the narrow Vth distribution for 16LC. The cache-program function can be achievable without any additional latches. Optimization of programming sequence achieves 0.62 MB/s programming throughput. This 16-level NAND flash memory technology reduces
the cost per bit and improves the memory density even more.

A 70 nm 16 Gb 16-Level-Cell NAND flash Memory.pdf (2.5 MB , 下载次数: 92 )

corespirit36 · 发表于 2008-7-19 00:36:43

9 A Power-Efficient High-Throughput 32-Thread SPARC Processor

Abstract-This first generation of “Niagara” SPARC processors implements a power-efficient Chip Multi-Threading (CMT) architecture which maximizes overall throughput performance
for commercial workloads. The target performance is achieved by exploiting high bandwidth rather than high frequency, thereby reducing hardware complexity and power. The UltraSPARC T1 processor combines eight four-threaded 64-b cores, a floating-point
unit, a high-bandwidth interconnect crossbar, a shared 3-MB L2 Cache, four DDR2 DRAM interfaces, and a system interface unit. Power and thermal monitoring techniques further enhance CMT performance benefits, increasing overall chip reliability.The 378-mm2 die is fabricated in Texas Instrument’s 90-nm CMOS technology with nine layers of copper interconnect. The chip contains 279 million transistors and consumes a maximum of
63 W at 1.2 GHz and 1.2 V. Key functional units employ special circuit techniques to provide the high bandwidth required by a CMT architecture while optimizing power and silicon area. These include a highly integrated integer register file, a high-bandwidth
interconnect crossbar, the shared L2 cache, and the IO subsystem.Key aspects of the physical design methodology are also discussed.

本篇和前面的那篇一样，都是论述SUN公司的server processor产品，只不过这篇讲述的processor是前面那篇的前一个版本，从中可以看出仅仅1年时间，人家的processor技术进步的有多么大！而且文章本身论述非常酣畅，语句非常流畅，读起来很有一气呵成的感觉，相信对英语也有帮助，本篇也强烈推荐。

加油，中国龙芯！未来中国那么多IC人才，你一定可以超越老美的货！

A Power-Efficient High-Throughput 32-Thread SPARC Processor.pdf (2.42 MB , 下载次数: 92 )

corespirit36 · 发表于 2008-7-19 00:49:05

10 A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy

Abstract-Aggressively scaling the supply voltage of SRAMs greatly minimizes their active and leakage power, a dominating portion of the total power in modern ICs. Hence, energy constrained applications, where performance requirements are secondary, benefit significantly from an SRAM that offers read and write functionality at the lowest possible voltage. However,bit-cells and architectures achieving very high density conventionally
fail to operate at low voltages. This paper describes a high density SRAM in 65 nm CMOS that uses an 8T bit-cell to achieve a minimum operating voltage of 350 mV. Buffered read
is used to ensure read stability, and peripheral control of both the bit-cell supply voltage and the read-buffer’s foot voltage enable sub- write and read without degrading the bit-cell’s density.The plaguing area-offset tradeoff in modern sense-amplifiers is alleviated using redundancy, which reduces read errors by a factor of five compared to device up-sizing. At its lowest operating voltage, the entire 256 kb SRAM consumes 2.2 W in leakage
power.

Memory的设计同样很有挑战性，因为它不仅仅是存储单元，还需要一种特殊的放大器（对于放大器，很多IC设计人员都感兴趣吧），这种放大器叫做Sense-Amplifier（灵敏放大器），它用来对读出的数据信号（通常淹没在大量的噪声背景中）作放大和初步处理，现代memory的数据传输速率非常高，因此显然Sense-Amplifier的速率要能满足要求，并且还要控制功率（原因太简单了，现代ULSI的功耗已经成了一个非常难解决的问题，必须在满足功能的前提下，把power dissipation降到最低）

本篇就是论述了Sense-Amplifier的最新技术，强烈推荐有基础的人study it！有关Sense-Amplifier的基础知识大家可以去IEEE的Xplore上去搜索，怎么进入IEEE的数据库，可以使用代理，详情：

http://www.eetop.cn/bbs/thread-117859-1-10.html

A 256 kb 65 nm 8T Subthreshold SRAM Employing Sense-Amplifier Redundancy.pdf (1.45 MB , 下载次数: 152 )

账号		自动登录	找回密码
密码			注册

发JSSC上的最新的microprocessor和memory的papers,都是最近3年的

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

相关帖子

浏览过的版块

站长推荐 /1