hb 2512851289

5
7/31/2019 Hb 2512851289 http://slidepdf.com/reader/full/hb-2512851289 1/5 Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 5, September- October 2012, pp.1285-1289 1285 | P age Double Pass-Transistor Logic for High Performance, Low Latency Wave Pipeline Circuit Santimoy Mandal Dept. of Electronics and Communication Engineering, RVS college of Engineering and Technology Jamshedpur, India Shyam Sundar Prasad Dept.of Electronics and Communication National Institute of Technology, Jamshedpur, India Abstract  —  High throughput and low latency designs are required in modern high performance systems, especially for signal processing applications.Existing logic families cannot provide both of them simultaneously.We propose Double Pass Transistor Logic (DPL) which can be used as a universal logic to provide finest grain pipelining without affecting overall latency or increasing the area. It does not require any special process steps and hence, can be realized in a normal process technology as against the CPL proposed by Yano et al [2] which uses threshold voltage adjustment of selected devices. The design procedure is described for (a) low latency, (b) high throughput and (c) low area requirements.In addition to the various advantages, it is envisioned that DPL designs can also be used to build ultra-high speed pipelined system without pipelining latches, viz., wave pipelined digital systems, where the throughput achievable is beyond that permitted by the delay of a pipeline stage. I. INTRODUCTION  High speed adders and multipliers are required to meet the demands of signal processing and multimedia applications.Wavepipelining or “maximal rate pipelining” [l] is a design method that can increase the throughput of a combinational circuit.In conventional pipelining, the combinational circuit is broken into smaller blocks or pipeline stages and synchronizing elements like D-flip flops are used as storage elements. The maximum speed is limited By the number of pipe stages, the size of pipe stages and the complexity of the clock distribution network. In the wave pipelining approach, flip flops are not used as storage elements between pipeline stages. Instead, the internal capacitances of the gates are used for storing the intermediate values [l] [3] [4].There is considerable area reduction and minimization of power due to the elimination of storage elements. This also eliminates clock distribution and clock skew problems as no clock signal is required within the combinational block. New inputs can be applied to the circuit before th e outputs are available, effectively allowing multiple waves of data to propagate coherently through the circuit. Wave pipelining requires all paths from the inputs to the outputs to be balanced. This is achieved by inserting active delay buffers in the paths in which there are less number of gates than the longest path from the input to the output. The rough tuning method [6] ensures that the gate count along all the paths is the same. However, rough-tuned circuit is still not balanced as there is bound to be different delays due to different fan-outs. The absence of synchronizing elements in the wave pipelined circuit could lead to collision between adjacent waves of data. The clock period should be such that the waves do not collide with each other giving enough time for the gates to complete its task. The pipe stages in a wave pipelined circuit are composed of single gates and the load capacitances of the gates are used for storage. The load capacitance may vary for different gates in the same stage depending on the fan-outs. Different load capacitances result in different rise and fall times for the driver gates. This delay variation is reduced by fine tuning [5] [6]. Fine tuning involves sizing of the transistors in the output inverters of the driver gate to balance the delay. Once fine tuned, the circuit can be clocked at its maximum speed limited only by the delay. Section II discusses the timing constraints of wave pipelining and the necessary features in basic gates to be designed for wave pipelining. Section III gives an overview of the existing logic styles for wave pipelining. The limitations of the logic styles and the tuning methods are also discussed. Section IV presents the performance of basic gates highly suitable for wave pipelining. The power analysis of 8 bit multiplier is represent in section V.Section VI presents conclusion and further research direction. II. TIMING CONSTRAINTS IN WAVEPIPELINING Wave pipelined circuits can be clocked at a much higher frequency than conventional pipelining because its maximum rate is limited only by the path delay difference instead of the maximum path delay. The minimum clock period for a wave pipelined circuit [7] can be represented by Tcp≥MAX [∆Tp + 2∆C+Tsh + Trf, ∆Tx + ∆C + Tms + Trf]  Where Tcp is the clock period of the circuit, ∆tp is the difference between the longest and shortest paths in the circuit, ∆C i s the worst case clock skew, Tsh is the setup plus hold time for the registers, Trf is the

Upload: anonymous-7vppkws8o

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 1/5

Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com 

Vol. 2, Issue 5, September- October 2012, pp.1285-1289 

1285 | P a g e

Double Pass-Transistor Logic for High Performance,

Low Latency Wave Pipeline Circuit

Santimoy Mandal

Dept. of Electronics and CommunicationEngineering, RVS college of Engineering andTechnology

Jamshedpur, India

Shyam Sundar Prasad 

Dept.of Electronics and CommunicationNational Institute of Technology, Jamshedpur,India

Abstract —  High throughput and low latency designs

are required in modern high performance

systems, especially for signal processingapplications.Existing logic families cannot provide

both of them simultaneously.We propose Double

Pass Transistor Logic (DPL) which can be used as

a universal logic to provide finest grain pipelining

without affecting overall latency or increasing thearea. It does not require any special process steps

and hence, can be realized in a normal process

technology as against the CPL proposed by

Yano et al [2] which uses threshold voltage

adjustment of selected devices. The design

procedure is described for (a) low latency, (b) highthroughput and (c) low area requirements.In

addition to the various advantages, it is

envisioned that DPL designs can also be used to

build ultra-high speed pipelined system without

pipelining latches, viz., wave pipelined digital

systems, where the throughput achievable is

beyond that permitted by the delay of apipeline stage.

I.  INTRODUCTION High speed adders and multipliers are

required to meet the demands of signal processing andmultimedia applications.Wavepipelining or “maximal

rate pipelining” [l] is a design method that canincrease the throughput of a combinational circuit.In

conventional pipelining, the combinational circuit isbroken into smaller blocks or pipeline stages andsynchronizing elements like D-flip flops are used as

storage elements. The maximum speed is limited

By the number of pipe stages, the size of pipe stagesand the complexity of the clock distribution network.In the wave pipelining approach, flip flops are not

used as storage elements between pipeline stages.Instead, the internal capacitances of the gates are usedfor storing the intermediate values [l] [3] [4].There isconsiderable area reduction and minimization of power due to the elimination of storage elements. Thisalso eliminates clock distribution and clock skew

problems as no clock signal is required within thecombinational block. New inputs can be applied tothe circuit before the outputs are available,effectively allowing multiple waves of data to

propagate coherently through the circuit.Wave pipelining requires all paths from the inputs tothe outputs to be balanced. This is achieved by

inserting active delay buffers in the paths in which

there are less number of gates than the longest pathfrom the input to the output. The rough tuningmethod [6] ensures that the gate count along all thepaths is the same. However, rough-tuned circuit is stillnot balanced as there is bound to be different delaysdue to different fan-outs. The absence of 

synchronizing elements in the wave pipelined circuitcould lead to collision between adjacent waves of data. The clock period should be such that the wavesdo not collide with each other giving enough time for

the gates to complete its task. The pipe stages in awave pipelined circuit are composed of single gatesand the load capacitances of the gates are used forstorage. The load capacitance may vary for different

gates in the same stage depending on the fan-outs.Different load capacitances result in different rise andfall times for the driver gates. This delay variation is

reduced by fine tuning [5] [6]. Fine tuning involvessizing of the transistors in the output inverters of the

driver gate to balance the delay. Once fine tuned, thecircuit can be clocked at its maximum speed limited

only by the delay.Section II discusses the timing constraints of wavepipelining and the necessary features in basic gates to

be designed for wave pipelining. Section III gives anoverview of the existing logic styles for wavepipelining. The limitations of the logic styles and thetuning methods are also discussed. Section IVpresents the performance of basic gates highly suitablefor wave pipelining. The power analysis of 8 bit

multiplier is represent in section V.Section VI presentsconclusion and further research direction.

II.  TIMING CONSTRAINTS IN

WAVEPIPELININGWave pipelined circuits can be clocked at a

much higher frequency than conventional pipeliningbecause its maximum rate is limited only by the pathdelay difference instead of the maximum path delay.The minimum clock period for a wave pipelinedcircuit [7] can be represented by

Tcp≥MAX [∆Tp + 2∆C+Tsh + Trf, ∆Tx + ∆C + Tms + Trf] 

Where Tcp is the clock period of the circuit, ∆tp is

the difference between the longest and shortest pathsin the circuit, ∆C is the worst case clock skew, Tsh isthe setup plus hold time for the registers, Trf is the

Page 2: Hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 2/5

Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com 

Vol. 2, Issue 5, September- October 2012, pp.1285-1289 

1286 | P a g e

worst case rise/fall time at the last logic stage, ∆Tx, is

the propagation delay of the longest path from theinput to signal X at any intermediate node, and Tmsis the minimum time that X must be stable for thenext stage of logic to operate correctly. The operating

speed is limited by the delay between the shortest and

the longest path and not on the total delay of thecircuit as in conventional pipelining. The goal of thedesign process would be to reduce ∆Tp and ∆Tx as

much as possible while the other Parameters haveknown methods to reduce them.

III.  EXISTING LOGIC STYLES FOR

WAVEPIPELINING For a balanced wave pipelined circuit, the

gates designed should not have input dependent delayor fan-out dependent delay. All the gates in aparticular logic family should have the same delay.Conventional static CMOS is the most preferred logic

among designers because of its high reliability.A 2inputs NAND gate is shown in Fig.1.The architecture

of the basic gates result in input dependent andfunctionality dependent delays.Several design styleswere proposed by researchers satisfying the timingconstraints of wave pipelining.

V

A

B

 Fig.1 Different CMOS NAND logic style

 A.   Dual rail logic stylesNormal Process Complementary

Logic(NPCPL)[9], Wave pipeline Transmission GateLogic(WTGL)[3], are the dual rail logic styles usedfor wave-pipelining. NPCPL and WTGL are based onpass transistors and DRSCMOS is based on static

CMOS. In NPCPL, a basic building block is used todevelop all basic gates by properly choosing the input

signals Ai, Aj and B(for an AND/NAND gate(XY/ 

XY) Ai=X, Aj = Y and B = Y). The poor conductionof logic 1 by NMOS transistors in NPCPL result involtage degradation and poor noise margin.WTGLgates use transmission gates to obtain full logic swing

and better noise margin but static power dissipation is

there because here the use NMOS.WTGL and NPCPLare fast because of the high logic functionality and

low input capacitance of separate circuit paths foreach possible input combination, thus eliminating passtransistors. Dual rail logic styles are multi-functionalin nature and all the basic gates have the same

delay.System designed with dual rail styles can berough tuned because of the similarity in the basicarchitecture and the availability of  “DELAY” gates.

All the gates have output inverters for fine tuningpurposes. WTGL and NPCPL have unbalanced inputcapacitances resulting in complex.

 B.  Double Pass transistor logic (DPL)Suzuki et al. [8] proposed the double pass

transistor logic [9] that overcomes all the problems of CPL, namely, voltage degradation and noisemargin.DPL gates give improved circuit performance

at reduced supply voltage because of the use of bothNMOS and PMOS transistors. DPL gates aresymmetrical whereby the load in any DPL gate isdistributed equally among the inputs.DPL

XOR/XNOR gate is perfectly symmetrical. Thesymmetrical arrangement and the double transmissionproperty suggest that the DPL gates will performvery efficiently in wave pipelined circuits. The

PMOS and NMOS transistors are used such that dualcurrent path is set up for each input combinationresulting in smallest equivalent resistance for DPL

gates compared to other logic styles. In WTGL, hereare two paths but the same input is passed along boththe paths. The inputs are different in the two paths in

DPL thereby distributing the load among the inputs.DPL was claimed to be the most energy efficientlogic style among the discussed logic styles byUming KO et.al. [10]. The symmetrical input loading,

double transmission property and the energyefficiency of DPL gates make the DPL logic family

the best suited logic style for wave pipelining.

IV.  PERFORMANCE OF BASIC GATES

The power * delay product is a goodmeasure for comparing the logic styles that are to be

used in low power, high speed digital systems. Thebasic gates of all the logic styles were designed usingTANNER EDA V.13 with TSMC 0.18µm CMOStechnology at 2V rail to rail power supply. Table I

gives a measure of the power*delay product of various styles used in wave pipelining. Powermeasurement was done using the non invasive powermeasurement technique suggested by Kang [12].The

power*delay product of the various styles show thatDPL has the lowest power*delay product among thedual rail logic styles. The single rail logic styles have

A

B

 

+V +V

 B AY 

 

+V +V

BAY  

Page 3: Hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 3/5

Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com 

Vol. 2, Issue 5, September- October 2012, pp.1285-1289 

1287 | P a g e

low power because of the Lower number of transistors

and less switching activity.

Table I

logic

Risetime

Tr(ps)

Falltime

Tp(ps)

Tr-Tp(ps)

τphl  τplh 

Power

Dissipation(mW)

PDP

DP

LNAND

44.46 44.19 0.2732.6

529.50 .344 10.68

NPCP

LNAND

34.78 56.34 21.5642.4

261.32 .118 6.12

WT

GLNAND

29.56 27.53 2.0329.2

228.31 .34 9.741

DVL

NAND

69.77 70.53 0.5651.6

269.02 6.40

386.048

CMOSNA

ND

40.73 41.92 1.1933.1

627.43 2.03 55.68

Give a measure of the power*delay product of logic styles used in wave pipelining. Powermeasurement was done using the non invasive power

measurement technique suggested by Kang.Thoughthe power delay product of the WTGL and NPCPLlogic has low but in NPCPL logic need threshold levelrestorer and low noise margin and in WTGL it hasconstant static power dissipation due to PMOS.

 A.   Modification to the DPL gatesThe design goal for easier fine tuning is

to have balanced input capacitance, that is, the inputsof the gate should be perfectly symmetrical. The DPL

AND/NAND gates and the DPL OR/NOR gates arenot perfectly symmetrical. All the inputs in these gates

are connected to the gates of one NMOS and onePMOS transistor but source connections are either toPMOS or NMOS.The drain capacitances of the

NMOS and PMOS transistors are not the samebecause of the difference in sizes of the transistors andthe process parameters. Hence the gates are modifiedso that GND and supply connections are replaced by

primary inputs. Delay gate is necessary to develop acomplete library of basic gates. The delay gate has justone input unlike the other gates. Hence fewertransistors would be enough to design this gate. For

achieving dual current path for a DELAY/DELAYgate, transmission gates should be used. Dual current

paths require that the transistors are on all the time.

Hence the transistors should be driven by the suppliesand are not controlled by the inputs. TheMUX/DMUX gate is the only gate where perfectsymmetry could not be achieved. This is because the

multiplexer is a three input gate. The select input

drives only the gates of the transistors and the othertwo inputs have the same capacitance.

 B.   Performance of DPL basic gatesThe power * delay product is a good measure

for comparing the logic styles that are to be used inlow power, high speed digital systems. The basicgates of all the DPL logic styles were shown in

Table II designed using the layout editorTANNER in 0.18 micron technology and thesimulations were done using 2V supply in TSpice.

Table II

logic

Risetime

Tr(ps)

Falltime

Tp(ps)

Tr-Tp(p

s)

τphl

τplh PowerDissi

pation(mW)

PDP

DPLNAND

44.46 44.19 0.2732.6

529.50 .344

10.6

8

DPL

AND

59.68 72.56 12.8827.2

542.06 .118 8.17

DPLOR

50.14 24.32 25.8290.9

652.25 .114

8.133

DPLXNOR

42.69 43.93 1.2482.5

8107.0

7.147

13.93

DPL

XOR

47.01 64.35 17.3426.9

023.60 .113 2.85

DPLNO

R

52.47 36.00 16.47 117.35 75.50 .36134.8

0

DPL

MUX

44.49 40.95 3.5489.5

486.85 .112

10.75

DPL

DEMUX

25.27 43.05 17.78 15.57

26.56 .336 7.07

DPLDELA

Y

81.27 78.74 2.98 200203.4

0.227

45.78

C.  Wallace tree multiplierSeveral popular and well-known schemes,

with the objective of improving the speed of theparallel multiplier, have been developed in past. In

1964, C.S. Wallace observed that it is possible tofind a structure, which performs the addition

Page 4: Hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 4/5

Page 5: Hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 5/5

Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com 

Vol. 2, Issue 5, September- October 2012, pp.1285-1289 

1289 | P a g e

[11]  Liu, W. et al. “A 250MHz Wave PipelinedAdder in 2 CMOS,” in IEEE Joumal of Solid

State Circuits, September 1994.[12]  Sung MO Kang, “Accurate Simulation of Power 

Dissipation in VLSI Circuits,” IEEE Journal of 

Solid State Circuits, Vol SC-21., No 5., Oct

1986.[13]  V. G. Oklobdzija and D. Villeger, “Improving

Multiplier Design By Using Improved ColumnCompression Tree And Optimized Final AdderIn CMOS Technology,” IEEE Transactions onVLSI Systems, Vol.3, No.2, June, 1995, 25

pages.[14]  Z. Shun, O. A. Pfander, H.-J. Pfleiderer, and A.

Bermak, “A VLSI ar -chitecture for a run-time

multi- precision reconfigurable Booth multi- plier,” in Proc. 14th IEEE Int. Conf. Electron.,

Circuits, Syst., Dec.2007, pp. 975 – 978.