hb 2512851289

7/31/2019 Hb 2512851289

http://slidepdf.com/reader/full/hb-2512851289 1/5

Santimoy Mandal, Shyam Sundar Prasad / International Journal of Engineering Research and

Applications (IJERA) ISSN: 2248-9622 www.ijera.com

Vol. 2, Issue 5, September- October 2012, pp.1285-1289

1285 | P a g e

Double Pass-Transistor Logic for High Performance,

Low Latency Wave Pipeline Circuit

Santimoy Mandal

Dept. of Electronics and CommunicationEngineering, RVS college of Engineering andTechnology

Jamshedpur, India

Shyam Sundar Prasad

Dept.of Electronics and CommunicationNational Institute of Technology, Jamshedpur,India

Abstract — High throughput and low latency designs

are required in modern high performance

systems, especially for signal processingapplications.Existing logic families cannot provide

both of them simultaneously.We propose Double

Pass Transistor Logic (DPL) which can be used as

a universal logic to provide finest grain pipelining

without affecting overall latency or increasing thearea. It does not require any special process steps

and hence, can be realized in a normal process

technology as against the CPL proposed by

Yano et al [2] which uses threshold voltage

adjustment of selected devices. The design

procedure is described for (a) low latency, (b) highthroughput and (c) low area requirements.In

addition to the various advantages, it is

envisioned that DPL designs can also be used to

build ultra-high speed pipelined system without

pipelining latches, viz., wave pipelined digital

systems, where the throughput achievable is

beyond that permitted by the delay of apipeline stage.

I. INTRODUCTION High speed adders and multipliers are

required to meet the demands of signal processing andmultimedia applications.Wavepipelining or “maximal

rate pipelining” [l] is a design method that canincrease the throughput of a combinational circuit.In

conventional pipelining, the combinational circuit isbroken into smaller blocks or pipeline stages andsynchronizing elements like D-flip flops are used as

storage elements. The maximum speed is limited

By the number of pipe stages, the size of pipe stagesand the complexity of the clock distribution network.In the wave pipelining approach, flip flops are not

used as storage elements between pipeline stages.Instead, the internal capacitances of the gates are usedfor storing the intermediate values [l] [3] [4].There isconsiderable area reduction and minimization of power due to the elimination of storage elements. Thisalso eliminates clock distribution and clock skew

problems as no clock signal is required within thecombinational block. New inputs can be applied tothe circuit before the outputs are available,effectively allowing multiple waves of data to

propagate coherently through the circuit.Wave pipelining requires all paths from the inputs tothe outputs to be balanced. This is achieved by

inserting active delay buffers in the paths in which

there are less number of gates than the longest pathfrom the input to the output. The rough tuningmethod [6] ensures that the gate count along all thepaths is the same. However, rough-tuned circuit is stillnot balanced as there is bound to be different delaysdue to different fan-outs. The absence of

synchronizing elements in the wave pipelined circuitcould lead to collision between adjacent waves of data. The clock period should be such that the wavesdo not collide with each other giving enough time for

the gates to complete its task. The pipe stages in awave pipelined circuit are composed of single gatesand the load capacitances of the gates are used forstorage. The load capacitance may vary for different

gates in the same stage depending on the fan-outs.Different load capacitances result in different rise andfall times for the driver gates. This delay variation is

reduced by fine tuning [5] [6]. Fine tuning involvessizing of the transistors in the output inverters of the

driver gate to balance the delay. Once fine tuned, thecircuit can be clocked at its maximum speed limited

only by the delay.Section II discusses the timing constraints of wavepipelining and the necessary features in basic gates to

be designed for wave pipelining. Section III gives anoverview of the existing logic styles for wavepipelining. The limitations of the logic styles and thetuning methods are also discussed. Section IVpresents the performance of basic gates highly suitablefor wave pipelining. The power analysis of 8 bit

multiplier is represent in section V.Section VI presentsconclusion and further research direction.

II. TIMING CONSTRAINTS IN

WAVEPIPELININGWave pipelined circuits can be clocked at a

much higher frequency than conventional pipeliningbecause its maximum rate is limited only by the pathdelay difference instead of the maximum path delay.The minimum clock period for a wave pipelinedcircuit [7] can be represented by

Tcp≥MAX [∆Tp + 2∆C+Tsh + Trf, ∆Tx + ∆C + Tms + Trf]

Where Tcp is the clock period of the circuit, ∆tp is

the difference between the longest and shortest pathsin the circuit, ∆C is the worst case clock skew, Tsh isthe setup plus hold time for the registers, Trf is the

7/31/2019 Hb 2512851289





1286 | P a g e

worst case rise/fall time at the last logic stage, ∆Tx, is

the propagation delay of the longest path from theinput to signal X at any intermediate node, and Tmsis the minimum time that X must be stable for thenext stage of logic to operate correctly. The operating

speed is limited by the delay between the shortest and

the longest path and not on the total delay of thecircuit as in conventional pipelining. The goal of thedesign process would be to reduce ∆Tp and ∆Tx as

much as possible while the other Parameters haveknown methods to reduce them.

III. EXISTING LOGIC STYLES FOR

WAVEPIPELINING For a balanced wave pipelined circuit, the

gates designed should not have input dependent delayor fan-out dependent delay. All the gates in aparticular logic family should have the same delay.Conventional static CMOS is the most preferred logic

among designers because of its high reliability.A 2inputs NAND gate is shown in Fig.1.The architecture

of the basic gates result in input dependent andfunctionality dependent delays.Several design styleswere proposed by researchers satisfying the timingconstraints of wave pipelining.

V

A

B

Fig.1 Different CMOS NAND logic style

A. Dual rail logic stylesNormal Process Complementary

Logic(NPCPL)[9], Wave pipeline Transmission GateLogic(WTGL)[3], are the dual rail logic styles usedfor wave-pipelining. NPCPL and WTGL are based onpass transistors and DRSCMOS is based on static

CMOS. In NPCPL, a basic building block is used todevelop all basic gates by properly choosing the input

signals Ai, Aj and B(for an AND/NAND gate(XY/

XY) Ai=X, Aj = Y and B = Y). The poor conductionof logic 1 by NMOS transistors in NPCPL result involtage degradation and poor noise margin.WTGLgates use transmission gates to obtain full logic swing

and better noise margin but static power dissipation is

there because here the use NMOS.WTGL and NPCPLare fast because of the high logic functionality and

low input capacitance of separate circuit paths foreach possible input combination, thus eliminating passtransistors. Dual rail logic styles are multi-functionalin nature and all the basic gates have the same

delay.System designed with dual rail styles can berough tuned because of the similarity in the basicarchitecture and the availability of “DELAY” gates.

All the gates have output inverters for fine tuningpurposes. WTGL and NPCPL have unbalanced inputcapacitances resulting in complex.

B. Double Pass transistor logic (DPL)Suzuki et al. [8] proposed the double pass

transistor logic [9] that overcomes all the problems of CPL, namely, voltage degradation and noisemargin.DPL gates give improved circuit performance

at reduced supply voltage because of the use of bothNMOS and PMOS transistors. DPL gates aresymmetrical whereby the load in any DPL gate isdistributed equally among the inputs.DPL

XOR/XNOR gate is perfectly symmetrical. Thesymmetrical arrangement and the double transmissionproperty suggest that the DPL gates will performvery efficiently in wave pipelined circuits. The

PMOS and NMOS transistors are used such that dualcurrent path is set up for each input combinationresulting in smallest equivalent resistance for DPL

gates compared to other logic styles. In WTGL, hereare two paths but the same input is passed along boththe paths. The inputs are different in the two paths in

DPL thereby distributing the load among the inputs.DPL was claimed to be the most energy efficientlogic style among the discussed logic styles byUming KO et.al. [10]. The symmetrical input loading,

double transmission property and the energyefficiency of DPL gates make the DPL logic family

the best suited logic style for wave pipelining.

IV. PERFORMANCE OF BASIC GATES

The power * delay product is a goodmeasure for comparing the logic styles that are to be

used in low power, high speed digital systems. Thebasic gates of all the logic styles were designed usingTANNER EDA V.13 with TSMC 0.18µm CMOStechnology at 2V rail to rail power supply. Table I

gives a measure of the power*delay product of various styles used in wave pipelining. Powermeasurement was done using the non invasive powermeasurement technique suggested by Kang [12].The

power*delay product of the various styles show thatDPL has the lowest power*delay product among thedual rail logic styles. The single rail logic styles have

A

B

+V +V

B AY

+V +V

BAY

7/31/2019 Hb 2512851289





1287 | P a g e

low power because of the Lower number of transistors

and less switching activity.

Table I

logic

Risetime

Tr(ps)

Falltime

Tp(ps)

Tr-Tp(ps)

τphl τplh

Power

Dissipation(mW)

PDP

DP

LNAND

44.46 44.19 0.2732.6

529.50 .344 10.68

NPCP

LNAND

34.78 56.34 21.5642.4

261.32 .118 6.12

WT

GLNAND

29.56 27.53 2.0329.2

228.31 .34 9.741

DVL

NAND

69.77 70.53 0.5651.6

269.02 6.40

386.048

CMOSNA

ND

40.73 41.92 1.1933.1

627.43 2.03 55.68

Give a measure of the power*delay product of logic styles used in wave pipelining. Powermeasurement was done using the non invasive power

measurement technique suggested by Kang.Thoughthe power delay product of the WTGL and NPCPLlogic has low but in NPCPL logic need threshold levelrestorer and low noise margin and in WTGL it hasconstant static power dissipation due to PMOS.

A. Modification to the DPL gatesThe design goal for easier fine tuning is

to have balanced input capacitance, that is, the inputsof the gate should be perfectly symmetrical. The DPL

AND/NAND gates and the DPL OR/NOR gates arenot perfectly symmetrical. All the inputs in these gates

are connected to the gates of one NMOS and onePMOS transistor but source connections are either toPMOS or NMOS.The drain capacitances of the

NMOS and PMOS transistors are not the samebecause of the difference in sizes of the transistors andthe process parameters. Hence the gates are modifiedso that GND and supply connections are replaced by

primary inputs. Delay gate is necessary to develop acomplete library of basic gates. The delay gate has justone input unlike the other gates. Hence fewertransistors would be enough to design this gate. For

achieving dual current path for a DELAY/DELAYgate, transmission gates should be used. Dual current

paths require that the transistors are on all the time.

Hence the transistors should be driven by the suppliesand are not controlled by the inputs. TheMUX/DMUX gate is the only gate where perfectsymmetry could not be achieved. This is because the

multiplexer is a three input gate. The select input

drives only the gates of the transistors and the othertwo inputs have the same capacitance.

B. Performance of DPL basic gatesThe power * delay product is a good measure

for comparing the logic styles that are to be used inlow power, high speed digital systems. The basicgates of all the DPL logic styles were shown in

Table II designed using the layout editorTANNER in 0.18 micron technology and thesimulations were done using 2V supply in TSpice.

Table II

logic

Risetime

Tr(ps)

Falltime

Tp(ps)

Tr-Tp(p

s)

τphl

τplh PowerDissi

pation(mW)

PDP

DPLNAND

44.46 44.19 0.2732.6

529.50 .344

10.6

8

DPL

AND

59.68 72.56 12.8827.2

542.06 .118 8.17

DPLOR

50.14 24.32 25.8290.9

652.25 .114

8.133

DPLXNOR

42.69 43.93 1.2482.5

8107.0

7.147

13.93

DPL

XOR

47.01 64.35 17.3426.9

023.60 .113 2.85

DPLNO

R

52.47 36.00 16.47 117.35 75.50 .36134.8

0

DPL

MUX

44.49 40.95 3.5489.5

486.85 .112

10.75

DPL

DEMUX

25.27 43.05 17.78 15.57

26.56 .336 7.07

DPLDELA

Y

81.27 78.74 2.98 200203.4

0.227

45.78

C. Wallace tree multiplierSeveral popular and well-known schemes,

with the objective of improving the speed of theparallel multiplier, have been developed in past. In

1964, C.S. Wallace observed that it is possible tofind a structure, which performs the addition

7/31/2019 Hb 2512851289


7/31/2019 Hb 2512851289





1289 | P a g e

[11] Liu, W. et al. “A 250MHz Wave PipelinedAdder in 2 CMOS,” in IEEE Joumal of Solid

State Circuits, September 1994.[12] Sung MO Kang, “Accurate Simulation of Power

Dissipation in VLSI Circuits,” IEEE Journal of

Solid State Circuits, Vol SC-21., No 5., Oct

1986.[13] V. G. Oklobdzija and D. Villeger, “Improving

Multiplier Design By Using Improved ColumnCompression Tree And Optimized Final AdderIn CMOS Technology,” IEEE Transactions onVLSI Systems, Vol.3, No.2, June, 1995, 25

pages.[14] Z. Shun, O. A. Pfander, H.-J. Pfleiderer, and A.

Bermak, “A VLSI ar -chitecture for a run-time

multi- precision reconfigurable Booth multiplier,” in Proc. 14th IEEE Int. Conf. Electron.,

Circuits, Syst., Dec.2007, pp. 975 – 978.

hb 2512851289

Documents