# ANALYSIS AND DESIGN OF POSITIVE FEEDBACK SOURCE COUPLED LOGIC

Thesis submitted to the Delhi Technological University

for the Award of Degree of

**Doctor of Philosophy** 

in

# **Electronics and Communication Engineering**

by *Ranjana Sivaram* (Enrollment No.:2K16/PHD/E&C/09)

Under the Supervision of

Prof. Neeta Pandey & Prof. Kirti Gupta



# **DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING**

# DELHI TECHNOLOGICAL UNIVERSITY

# **DELHI-110042 INDIA**

January 2022

© Delhi Technological University-2022

All Rights Reserved



Delhi Technological University Formerly Delhi College of Engineering Shahbad Daulatpur, Bawana Road, Delhi -110042

# CERTIFICATE

This is to certify that the thesis entitled- "ANALYSIS AND DESIGN OF POSITIVE FEEDBACK SOURCE COUPLED LOGIC" submitted by Ranjana Sivaram (2K16/PhD/E&C/09) to the Department of Electronics and Communication Engineering, Delhi Technological University for the award of the degree of Doctor of Philosophy is based on the original research work carried out by her under our guidance and supervision. In our opinion, the thesis has reached the standards fulfilling the requirements of the regulations relating to the degree. It is further certified that the work presented in this thesis is not submitted to any other university or institution for the award of any degree or diploma.

Neile

**Prof. Neeta Pandey** Supervisor, Department of ECE Delhi Technological University Delhi-110042 India

ellepupt

**Prof. Kirti Gupta** Co-Supervisor, Department of ECE Bharati Vidyapeeth's College of Engg. Delhi-110063 India

**Prof. N.S. Raghava** Head of Department, Department of ECE Delhi Technological University Delhi-110042 India

### **CANDIDATE'S DECLARATION**

I hereby certify that the research work, which is being presented in the thesis, titled,

"ANALYSIS AND DESIGN OF POSITIVE FEEDBACK SOURCE COUPLED LOGIC" in fulfillment of requirements of the award of the degree of Doctor of Philosophy is an authentic record of my research work carried under the supervision of Prof. Neeta Pandey and Prof. Kirti Gupta. The matter presented in this thesis has not been submitted elsewhere in part or fully to any other University or Institute for the award of any degree.

payerne Ranjana Sivaram

(2K16/PHD/E&C/09)

#### ACKNOWLEDGMENTS

The support and expert advice of my supervisors Prof. Neeta Pandey, Department of Electronics and Communication Engineering (ECE), Delhi Technological University (DTU) and Prof. Kirti Gupta, Department of Electronics and Communication Engineering (ECE), Bharati Vidyapeeth's College of Engineering is acknowledged with deep gratitude. Their continuous motivation, time bound goals and frequent follow ups has helped me to stay focussed and I'm grateful for their supervision, continuous motivation, and their humanitarian considerations.

My sincere thanks to Prof. N.S. Raghava, Head, ECE department, DTU, Delhi for providing me with all the necessary facilities for the completion of my work. I am thankful to all the faculty members of the department for their timely advice, support and help on various occasions. I am also grateful to Prof. Jai Prakash Saini, Vice-Chancellor, DTU, Delhi, for providing the research environment in the institute. I am also thankful to all non-teaching staff at DTU, who have helped me directly or indirectly in completion of the Ph.D. work.

I would also like to thank my seniors and colleagues of the ECE department for their support, especially Mr. Gurumurthy, Mr. Bharat, Ms. Parveen, Ms. Garima, Ms. Neetika, Ms. Damyanti and Ms. Monica. In continuation, I feel this deep sense of respect for my parents, Dr.N.Sridhar and Mrs. Lalitha Sridhar, for having brought me up this far, and would like to express my gratitude for having inculcated in me a love for research. I would like to especially thank my sister, Shobhana, for motivating me to go on with my doctorate from time to time. Last but not the least I would like to thank my husband, Sivaram and my family for supporting me through this long journey.

Ranjana Sivaram v

#### ABSTRACT

The demand for consumer electronics, biomedical implantable systems, sensor nodes in sensor networks etc. have all contributed towards the drive for portable systems with long battery life. This in turn has driven the need for mixed signal integrated circuits with analog and digital logic on the same substrate as this reduces dimensions of the integrated circuits, reduces the cost and also allows operation at higher speed. However, the noise transmission from the digital part to the analog part which is sensitive to noise becomes a problem, because of the loss in accuracy of the analog circuits or the reduction of the dynamic range. In CMOS circuits, the noise is due to current peaks and voltage variations during the switching of logic states. To solve this, several solutions are proposed – in terms of layout, placement of pins and routing of signals and selection of alternate logic styles.

To address the switching noise, various low switching noise logic styles have emerged wherein power supply current is kept nearly constant during the switching event and/or working with smaller voltage swings. Some notable logic styles are current steering logic (CSL) style, current balance logic (CBL) style, folded source-coupled logic (FSCL) style and the source coupled logic (SCL) style.

The positive feedback source coupled logic (PFSCL) style, a variant of SCL, among these is an attractive alternative as it addresses the issue and is explored in this thesis for design modifications to improve the overall circuit performance. The work in this thesis encapsulates its analysis and design.

The PFSCL FC based gates help in accommodating complex logic function into a single gate, but requires larger footprint. The concept of multithreshold technique is incorporated and two topologies are presented. The former results in area advantage while the latter improves power consumption. The static and dynamic parameters are modelled and validated.

Next, a PFSCL Quadtail cell that can accommodate more inputs in a single gate is presented thereafter. The usefulness of the proposal is illustrated with a three input PFSCL XOR gate. The proposition is analysed for static and dynamic parameters. It is found that the proposed XOR gate topology outperforms the possible existing counterparts.

The presence of a constant current source in PFSCL causes static power consumption which restricts its application for battery constrained devices. The availability of dynamic PFSCL style is an effort made in this direction. This style, however, requires cascading of gates for multilevel realisation and also intermittent buffers for correct evaluation. Two schemes are worked out to address this – the former modifies the pull down network of existing D-PFSCL gate while the latter relies in inclusion of transmission gates.

The ultra-low power PFSCL gate is introduced in this work wherein low power supply is used and the constituent transistors operate in subthreshold region. The basic principles for design of PFSCL in subthreshold region are identified and trends are noted. From analysis and simulations it is noted that the circuits implemented using PFSCL style in subthreshold region offer more flexibility to the design of ultra-low power applications compared to those implemented using CMOS in subthreshold region.

All the propositions are validated through extensive simulative investigations.

# **Table of Contents**

| ACKNO     | WL   | EDGMENTS                                  |
|-----------|------|-------------------------------------------|
| ABSTRA    | АСТ  | V.                                        |
| Table of  | Cor  | itentsvii                                 |
| List of F | igur | esx                                       |
| List of T | able | SXV                                       |
| List of P | ubli | cationsxv                                 |
| Chapter   | 1    | Introduction1                             |
| 1.1       | Bac  | kground                                   |
| 1.2       | Pos  | itive Feedback Source Coupled Logic style |
| 1.3       | Mo   | tivation10                                |
| 1.4       | The  | esis Organization13                       |
| Chapter   | 2    | Existing PFSCL architectures              |
| 2.1       | Intr | oduction17                                |
| 2.2       | PFS  | SCL Style Fundamentals17                  |
| 2.2.      | 1    | PFSCL operation and analysis18            |
| 2.2.      | 2    | Realisation of Conventional PFSCL gates24 |
| 2.3       | PFS  | SCL fundamental cell architecture26       |
| 2.3.      | 1    | Fundamental cell operation and analysis   |
| 2.3.      | 2    | Implementation of gates                   |
| 2.4       | Dyı  | namic PFSCL (D-PFSCL)                     |
|           |      | viii                                      |

| 2.4.    | 1    | D-PFSCL operation and analysis                | .34 |
|---------|------|-----------------------------------------------|-----|
| 2.4.    | 2    | Realisation of gates                          | .36 |
| 2.5     | Con  | clusion                                       | .38 |
| Chapter | 3    | Multithreshold PFSCL Architecture             | .39 |
| 3.1     | Intr | oduction                                      | .41 |
| 3.2     | Proj | posed architecture-1                          | .41 |
| 3.2.    | 1    | Analysis                                      | .44 |
| 3.2.    | 2    | Simulations                                   | .50 |
| 3.3     | Proj | posed architecture-2                          | .58 |
| 3.3.    | 1    | Analysis                                      | .60 |
| 3.3.    | 2    | Simulations                                   | .61 |
| 3.4     | Sun  | nmary                                         | .66 |
| Chapter | 4    | Modified PFSCL architecture for higher fan-in | .68 |
| 4.1     | Intr | oduction                                      | .70 |
| 4.2     | Proj | posed Architecture-3                          | .70 |
| 4.2.    | 1    | Analysis                                      | .74 |
| 4.2.    | 2    | Simulations                                   | .82 |
| 4.3     | Con  | nclusion                                      | .88 |
| Chapter | 5    | Modified Dynamic PFSCL architectures          | .89 |
| 5.1     | Intr | oduction                                      | .91 |
| 5.2     | Proj | posed architecture-4                          | .91 |

| 5.2.1    | 1     | Analysis                 | 99  |
|----------|-------|--------------------------|-----|
| 5.2.2    | 2     | Simulations              | 100 |
| 5.3      | Prop  | pposed Architecture -5   | 110 |
| 5.3.     | 1     | Analysis                 | 112 |
| 5.3.2    | 2     | Simulations              | 113 |
| 5.4      | Con   | nclusion                 | 124 |
| Chapter  | 6     | Subthreshold PFSCL gates | 126 |
| 6.1      | Intro | roduction                | 128 |
| 6.2      | Prop  | pposed architecture-6    | 128 |
| 6.2.     | 1     | Analysis                 | 130 |
| 6.2.2    | 2     | Simulations              | 134 |
| 6.3      | Con   | nclusion                 | 142 |
| Chapter  | 7     | Conclusion               | 143 |
| 7.1      | Con   | ncluding Remarks         | 145 |
| 7.2      | Ave   | enues for future work    | 147 |
| Referenc | ces   |                          | 149 |

# List of Figures

| Fig. 2.1 Basic architecture of a generic PFSCL gate                                                           | 7  |
|---------------------------------------------------------------------------------------------------------------|----|
| Fig. 2.2 PFSCL inverter [37]                                                                                  | 9  |
| Fig. 2.3 Voltage transfer characteristics of the PFSCL inverter                                               | 20 |
| Fig. 2.4 Linear half circuit                                                                                  | 21 |
| Fig. 2.5 PFSCL style a) NOR2 b) NAND2 c) XOR2                                                                 | 26 |
| Fig. 2.6 a) PFSCL FC based D-Latch [66] b) Block representation                                               | 27 |
| Fig. 2.7 Fundamental cell based OR2 gate                                                                      | 29 |
| Fig. 2.8 CLB [71]                                                                                             | 32 |
| Fig. 2.9 D-PFSCL inverter [67]                                                                                | 33 |
| Fig. 2.10 STB a) MOS schematic b) symbol                                                                      | 36 |
| Fig. 2.11 a) D-PFSCL NOR2 b) D-PFSCL NOR2 symbol c) D-PFSCL XOR2 gate [66]3                                   | 37 |
| Fig. 3.1 Proposed architecture-1 a) XOR2 gate b) Symbol                                                       | 12 |
| Fig. 3.2 Proposed architecture-1 generic gate                                                                 | 13 |
| Fig. 3.3 Linear half circuit of proposed architecture-1                                                       | 17 |
| Fig. 3.4 Simulation waveform of proposed architecture-1 based XOR2 gate                                       | 50 |
| Fig. 3.5 Predicted and Simulated results with error versus $I_{SS}$ for static parameters wi                  | th |
| V <sub>SWING</sub> of a) 0.4V and b) 0.5V                                                                     | 51 |
| Fig. 3.6 Predicted and Simulated results with error in delay versus $I_{SS}$ for a) $V_{SWING} = 0.4$         | V  |
| and $C_L=50 fF b$ ) $V_{SWING}=0.4V$ and $C_L=500 fF c$ ) $V_{SWING}=0.4V$ and $C_L=1 pF d$ ) $V_{SWING}=0.5$ | V  |
| and $C_L$ =50fF e) $V_{SWING}$ =0.5V and $C_L$ =500fF f) $V_{SWING}$ =0.5V and $C_L$ =1pF                     | 52 |
| Fig. 3.7 Area vs $I_{C2}/I_{D3}$ vs $\alpha$ for proposed architecture-1 based XOR2 gate                      | 53 |
| Fig. 3.8 Monte Carlo results for $V_{SWING}$ and Delay for a) proposed and b) existing PFSC                   | Ľ  |
| FC based XOR2 gate                                                                                            | 54 |

| Fig. 3.9 Impact of process corners on a) Delay and b) $V_{SWING}$                            |
|----------------------------------------------------------------------------------------------|
| Fig. 3.10 Proposed architecture-1 based a) Sum and b) Carry c) AND2 gate and d) OR2 gate     |
|                                                                                              |
| Fig. 3.11 Simulation waveforms for Proposed architecture-1 full adder57                      |
| Fig. 3.12 Existing PFSCL FC based XOR2 gate                                                  |
| Fig. 3.13 Proposed architecture-2 based XOR2 gate                                            |
| Fig. 3.14 Propagation delay versus bias current for XOR2 gate                                |
| Fig. 3.15 Power consumption versus bias current for XOR2 gate63                              |
| Fig. 3.16 PDP versus bias current for XOR2 gate63                                            |
| Fig. 3.17 Monte Carlo analysis on Delay and Voltage swing of XOR2 gate based on a)           |
| proposed architecture-2 b) PFSCL FC c) conventional PFSCL                                    |
| Fig. 3.18 Effect of process corners on XOR2 gate a) Delay b) Voltage Swing65                 |
| Fig. 4.1 Quadtail cell- Basic structure of the proposed architecture-371                     |
| Fig. 4.2 XOR3 based on a) proposed architecture-3 b) conventional PFSCL c) existing          |
| PFSCL FC73                                                                                   |
| Fig. 4.3 Proposed architecture-3 generic gate74                                              |
| Fig. 4.4 Linear half circuit of proposed architecture-379                                    |
| Fig. 4.5 Simulation waveforms of the proposed architecture-3 XOR3 gate82                     |
| Fig. 4.6 Performance comparison versus $I_{SS}$ a) Delay b) Power dissipation c) PDP of XOR3 |
| gate                                                                                         |
| Fig. 4.7 Impact of Monte Carlo for 500 runs a) Proposed architecture-3 b) PFSCL FC c)        |
| conventional PFSCL                                                                           |
| Fig. 4.8 Impact of process corners on the three architectures a) Delay and b) $V_{SWING}$    |
| Fig. 4.9 Carry a) Proposed architecture-3 b) Conventional PFSCL c) PFSCL FC87                |
| Fig. 5.1 Modified schematic of a D-PFSCL with the addition of Mc192 xii                      |

| Fig. 5.2 Voltages at different nodes of modified D-PFSCL                                                      |
|---------------------------------------------------------------------------------------------------------------|
| Fig. 5.3 Complete schematic of the proposed architecture-4 MUX295                                             |
| Fig. 5.4 MUX2 a) Gate level schematic b) Existing D-PFSCL97                                                   |
| Fig. 5.5 Proposed architecture-4 generic gate                                                                 |
| Fig. 5.6 Proposed architecture-4 XOR2 gate                                                                    |
| Fig. 5.7 $I_{Md3}/I_{Mc2}$ for different values of $\alpha$                                                   |
| Fig. 5.8 Existing D-PFSCL XOR2 gate102                                                                        |
| Fig. 5.9 Simulation waveforms of the XOR2 gate102                                                             |
| Fig. 5.10 Proposed architecture-4 based XOR2 gate output under Monte Carlo analysis105                        |
| Fig. 5.11 Monte Carlo variation in Proposed architecture-4 XOR2 gate a) $\tau_{PHL}$ b) $\tau_{pre}$ c)       |
| V <sub>SWING</sub> d) Dynamic power dissipation106                                                            |
| Fig. 5.12 Process corner results of XOR2 gate a) $\tau_{PHL}$ b) $\tau_{pre}$ c) power108                     |
| Fig. 5.13 Block diagram of the proposed architecture-4 MUX8109                                                |
| Fig. 5.14 Proposed architecture-5 generic gate111                                                             |
| Fig. 5.15 Proposed architecture-5 MUX16 gate112                                                               |
| Fig. 5.16 XOR2 gate a) Proposed architecture-5 b) Proposed architecture-4 c) D-PFSCL [67]                     |
| d) Simulation waveforms e) Transitions at the proposed architecture-5 XOR2 gate output.116                    |
| Fig. 5.17 Performance with respect to power supply variations a) $\tau_{pre}$ b) $\tau_{PHL}$ c) dynamic      |
| power d) EDP118                                                                                               |
| Fig. 5.18 a) Maximum Operating frequency with respect to different supply voltages b)                         |
| Operating frequency with respect to dynamic power consumption119                                              |
| Fig. 5.19 Monte Carlo simulation results (500 runs) for proposed architecture-5 XOR2 a) $\tau_{pre}$          |
| b) τ <sub>PHL</sub> c) dynamic power                                                                          |
| Fig. 5.20 Process corner results of the XOR2 gate in all the architectures a) $\tau_{pre}$ b) $\tau_{PHL}$ c) |
| dynamic power                                                                                                 |

| Fig. | 5.21 Full adder based on proposed architecture-5 a) XOR3 b) Carry               | 123 |
|------|---------------------------------------------------------------------------------|-----|
| Fig. | 6.1 ST-PFSCL inverter                                                           | 129 |
| Fig. | 6.2 Linear half circuit of ST-PFSCL inverter                                    | 133 |
| Fig. | 6.3 Simulation waveform of ST-PFSCL inverter                                    | 135 |
| Fig. | 6.4 ST-PFSCL a) NOR2 gate b) XOR2 gate                                          | 136 |
| Fig. | 6.5 Delay versus I <sub>SS</sub> of the ST-PFSCL XOR2 gate                      | 136 |
| Fig. | 6.6 PDP versus delay of the ST-PFSCL XOR2 gate                                  | 137 |
| Fig. | 6.7 Monte Carlo results for ST-PFSCL XOR2 a) Delay b) Voltage swing             | 138 |
| Fig. | 6.8 Process corner results of the ST-PFSCL XOR2 (a) Delay (b) $V_{SWING}$       | 139 |
| Fig. | 6.9 a) ST-PFSCL D latch gate b) ST-PFSCL divide-by-8 circuit                    | 140 |
| Fig. | 6.10 Frequency of operation versus power dissipation of the divide-by-8 circuit | 141 |
| Fig. | 6.11 PDP versus delay of the divide-by-8 circuit                                | 141 |

# List of Tables

| Table 2.1 Output voltages for various combinations of inputs    3                             | 0  |
|-----------------------------------------------------------------------------------------------|----|
| Table 2.2 Realisation of different logic functions using CLB                                  | 2  |
| Table 3.1 Performance summary of the full adder                                               | 7  |
| Table 3.2 Power Supply voltage V <sub>DDmin</sub> 6                                           | 1  |
| Table 4.1 Realisation of different 3 input logic functions based on proposed architecture-3.7 | 4  |
| Table 4.2 Working of proposed architecture-3 based XOR3 gate                                  | 7  |
| Table 4.3 Summary of results for a Full Adder    8                                            | 7  |
| Table 5.1 Summary of the proposed architecture-4 MUX2 operation                               | 6  |
| Table 5.2 Performance comparison    10                                                        | 3  |
| Table 5.3 Monte Carlo simulation results for the XOR2 gate in different styles         10     | 6  |
| Table 5.4 Performance summary of MUX8    10                                                   | 9  |
| Table 5.5 Performance Comparison of gates based on D-PFSCL, proposed architecture-4 an        | ıd |
| proposed architecture-511                                                                     | 7  |
| Table 5.6 Monte Carlo results for XOR212                                                      | 0  |
| Table 5.7 Performance Comparison of full adder based on existing D-PFSCL, propose             | d  |
| architecture-4 and proposed architecture-512                                                  | 4  |

# **List of Publications**

### **List of Journal Papers:**

- 1. Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta. "A new realisation scheme for dynamic PFSCL" *Integration* 75C (2020) pp. 169-177. (SCI Journal Impact Factor: 1.214)
- Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta. "Impact of multi threshold transistor in positive feedback source coupled logic (PFSCL) fundamental cell" Analog Integrated Circuits & Signal Process 109(2021) pp.173-185. (SCI Journal Impact Factor: 1.337)

### List of International Conference Papers:

- 1. Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta. "New improved low power triple tail cell with controlled current source" 2018 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1-5. IEEE, 2018.
- **2.** Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta "Low power design for source coupled logic gates" 2018 15th IEEE India Council International Conference (INDICON), pp. 1-5. IEEE, 2018.
- **3.** Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta, "**Power efficient architecture for PFSCL**" *2021 International Conference on Emerging Trends in Industry 4.0 (ETI 4.0)*, pp. 1-6.IEEE, 2021.
- **4.** Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta, **"Exploration of PFSCL in subthreshold region of operation for use in ultra low power applications" 2021 2nd International Conference for Emerging Technology (INCET). IEEE, 2021.**

### List of Journal papers under Communication:

 Sivaram, Ranjana; Gupta, Kirti; Pandey, Neeta. "On Improving the Performance of Dynamic Positive-Feedback Source-Coupled Logic (D-PFSCL) through inclusion of Transmission Gates" *Microprocessors and Microsystems Journal*. Communicated. (SCI-E Journal Impact Factor 1.525) Chapter 1 Introduction

#### 1.1 Background

There is huge demand for electronic devices, especially portable and consumer electronics, which requires that as many logical functions as possible be implemented on the same chip in the form of System of Chip (SoC) so as to meet the speed and area requirements [1-7]. SoC designs reduce cost by saving on the total number of chips, and can yield significant performance improvements by reducing inter-chip communication time. Due to this, SoCs where both the analog and digital circuitry is on the same semiconductor die have nowadays become pervasive. The conventional CMOS logic style, used widely in SoCs, exhibits a large switching noise that lowers the performance of the analog building blocks and therefore it cannot be used on the same substrate with high-resolution analog circuits [8-10]. In order to successfully interface the digital and analog components of a system onto the same die, analog designers must either accept the noise inherent to CMOS digital circuits or try to isolate the digital and analog components as much as possible, making system level integration of SoC designs a challenging task. In order to isolate the noise generated by the digital system from the analog counterpart or to mitigate it, several techniques are available and these can be mainly be categorised in terms of - layout related methods, placement of pins and routing of signals, usage of alternate substrate material and usage of alternate logic styles [11-34].

Alternate logic styles that are suited to a mixed signal environment need to maintain a nearly constant current during the switching event and have reduced voltage swing in order to reduce the switching noise. The logic styles inheriting the features of maintaining constant current during switching activity are classified in four categories namely current balance logic (CBL) style [21-24], current steering logic (CSL) style [25-27], folded source coupled

logic (FSCL) style [7,28-30] and the MOS current mode logic style [31-34]. The CBL style is further classified into enhanced current balance logic (E-CBL) [22], complementary output current balance logic(C-CBL) [23], modified current balance logic (M-CBL) [24]. Similarly, the CSL style includes folded current steering logic (FCSL) [26], differential current steering logic (D-CSL) [27] whereas the enhanced folded source coupled logic (E<sup>2</sup>FSCL) [29], and enhancement source coupled logic [30] belongs to FSCL style. The CBL and FSCL based circuits use multiple current sources for logic function realisation whereas the CSL circuits works on larger voltage swing. The MOS current mode logic based circuits, which is based on the source-coupled pair of NMOS transistors, needs a single current source and works on smaller voltage swing; therefore it is a preferred choice for mixed signal IC applications. Also, MOS current mode logic or Source Coupled Logic (SCL), permits switching noise reduction by two orders of magnitude compared to standard CMOS logic [14-16].

The SCL gates are classified as differential or single-ended [35-37]. The former is based on the series-gating approach whereas in the latter the transistors share a common source and drain terminal and are source-coupled to a transistor whose gate is connected to reference voltage source [35, 37]. SCL gates are explored widely for building complex digital logic like ADCs, DACs, front end receivers, optical fiber front end transreceivers, arithmetic circuits etc [38-59].

### **1.2** Positive Feedback Source Coupled Logic style

An improved form of single-ended SCL style named as positive feedback source coupled logic (PFSCL) style is suggested in [37]. The logic style replaces the reference voltage source used in conventional single-ended SCL gates with a positive feedback to improve their performance. It is targeted towards high speed and low power dissipation and power efficient behaviour, and shows better performance compared to SCL. The logic style provides single

ended output and uses a NOR based architecture for implementing logic with a constant bias current per gate. The NOR based architecture is an advantage in that it allows operation of the gate at minimum supply voltage, which allows low power consumption requirements to be met. This also has the advantage that the PFSCL can work easily in deep submicron technologies, where the maximum power supply that is available is low. PFSCL style is explored in [60-68] in terms of design and analysis and further studied in terms of applications in arithmetic circuits, buffer, pipeline, latches, error detection and correction circuits in [69-79].

The paper [37] explains the working of the basic PFSCL gate and derives the expressions for static design parameters like small signal gain, noise margin and the voltage swing. The analytical delay model of PFSCL gates is also given in terms of the bias current, process parameters and the transistors' aspect ratios. Simulation results are shared which confirm that the proposed models are sufficiently accurate.

The analytical gate delay model for PFSCL is derived in [60] where the delay is given in terms of the associated parasitic capacitances at the output node. To derive the delay model of PFSCL gates the circuit is linearised around the logic threshold and circuit analysis is simplified through suitable approximation. Since the positive feedback in PFSCL gates significantly enhances the small-signal voltage gain, a much lower NMOS aspect ratio can be used to achieve the same value of small signal voltage gain, compared to conventional SCL logic. As a consequence, the contribution of NMOS parasitic capacitances to the output node is strongly reduced, and so, the delay reduces. Simulations in 350 nm are carried out to validate the delay model for PFSCL gates. For comparison, PFSCL gate based 5 stage ring oscillator is simulated against conventional SCL based 5 stage ring oscillator for same design parameters and it is found that PFSCL gates show better performance.

In [61-62], the issue of design of PFSCL gates is addressed as in today's scenario, delay, power and area occupied are all important design parameters that can be traded off against each other depending upon the specific application. This needs that interrelation between power, delay and area should be quantified and impact of variables like bias current, transistor dimensions for NMOS and PMOS on power, delay and area should be analysed. PFSCL gates offer lower delay compared to the traditional SCL gates. This excess of speed could also be exploited to save power consumption for an assigned speed requirement, thus allowing for a more flexible power-delay trade off management. Since the overall power consumption usually limits the number of gates that can be integrated in a mixed-signal chip, the reduction in the power consumption per gate allows for implementing more complex logic circuits for a given power budget of the digital section.

The delay expression is derived using the parasitic node capacitances and then, using the standard expressions for capacitances and some approximations according to meaningful cases in the design space, i.e., power-efficient, high-speed and low-power design, the delay is expressed in terms of the voltage swing and the bias current, which in turn define the gate power consumption. To validate the analytical expressions, simulations for different design cases like high speed, low power etc. are carried out and it is found that the analytical expressions are suitably accurate.

In [63], comparison between MOS Current Mode Logic (MCML) and PFSCL AND/OR/NOR gates is carried out through simulations in 180 nm to quantify the improvement offered by PFSCL gates in terms of delay, power and area. On the basis of simulations, it is found that for given power supply and bias current, PFSCL gates are faster and occupy lesser area compared to MCML gates.

6

In [64], hysteresis is utilised to improve the performance of PFSCL gates. To understand the impact of hysteresis, the noise margin of PFSCL gate is modelled from which it is found that hysteresis improves the noise margin. This implies that for a given noise margin, lower voltage swing will satisfy the noise margin requirement, in turn reducing the gate delay. For a given speed constraint, this reduction in delay due to hysteresis can be traded for reduction in power by reducing the bias current. Thus, use of hysteresis can make PFSCL gates more power efficient, which is important for high speed low power applications. Simulations were carried out to evaluate the accuracy of the noise margin model, which was verified. PFCSL gates were also simulated to verify the speed advantage under hysteresis.

In [65], a PFSCL style with higher speed than the existing PFSCL style is proposed where the load in existing PFSCL is replaced with a new load which exhibits capacitive coupling, which enhances the switching speed of the circuits. The capacitive coupling occurs between the output node and the gate of the PMOS load which speeds up the process of charging/discharging of the PFSCL gate. The mechanism of capacitive coupling is modelled and its effect on the propagation delay is described. On the basis of simulation of AND/NOR/OR gates based on proposed style versus conventional PFSCL based AND/NOR/OR gates; it is shown that the logic gates based on the propagation delay by 31 percent compared to existing PFSCL logic gates.

Implementation of logic gates based on conventional PFSCL is a NOR based implementation. This implies that, while PFSCL can implement OR/NOR functions easily as a single gate implementation, complex logical functions having sum-of-minterms expression requiring both AND/OR implementations are more difficult and require a multi-gate cascaded implementation. PFSCL multi-gate cascaded implementation adds to the delay and also increases the power dissipation due to the use of constant bias currents by each gate.

This factor negates a lot of the speed and power dissipation advantages of PFSCL. In [66], the use of triple-tail cell concept in conjunction with PFSCL, called the PFSCL fundamental cell, is proposed to solve the issue of multi gate implementation of complex logic. This is because triple-tail cell through the addition of control branch easily provides the AND functionality. Thus, by implementing PFSCL fundamental cell based gates, complex two input logic functions like XOR, multiplexer etc. including AND/NAND terms can be implemented without need for cascading the gates which in turn improves the delay and the power dissipation.

An efficient PFSCL fundamental cell based D-Latch is proposed and its performance is compared to PFSCL based D–Latch [66]. It is found that the proposed D-Latch shows lesser delay and lower power consumption compared to a PFSCL based D–Latch. Use of the triple-tail concept also reduces the number of stages and the gate count compared to conventional PFSCL based D–Latch.

In [69-74], an efficient circuit realisation scheme based on PFSCL style using generalized PFSCL fundamental cell, called the configurable logic block (CLB) is given and used to implement adder, serialiser, demultiplexer, linear feedback shift register (LFSR) and razor flip-flop respectively. From simulations carried out in CMOS 180 nm technology, it is seen that the proposed circuit utilising PFSCL fundamental cell architecture shows the best delay and power dissipation compared to conventional PFSCL based implementation.

PFSCL style suffers from the disadvantage of having constant bias current which contributes towards the power dissipation. With the increased need for power efficient circuits, a modification to the existing PFSCL style is proposed in [67] which addresses the issue. Here, a logic style called dynamic PFSCL (D-PFSCL) is introduced that uses dynamic current source in contrast to constant current source of PFSCL to attain lower power consumption. Two techniques to implement multi-stage D-PFSCL application are also suggested. Several D-PFSCL gates are simulated and compared with the respective PFSCL and significant power reduction is achieved for D-PFSCL gates. These gates also show an improvement in speed compared to existing PFSCL gates.

In [75], high speed and low power dissipation PFSCL based tristate buffers are proposed, which are important for bus implementation. The paper discusses existing PFSCL tristate buffers i.e. switch based and sleep based. While the sleep based tri state buffer is more power efficient than the switch based tristate, it suffers from incomplete isolation of the common output node from the tristate disabled buffers. Next in the paper, PFSCL tristate buffer switch based topologies are proposed where the output enable/disable switch is used to maintain the high impedance state and the load or the current source section is also modified to restrict the current flow during high impedance state. Four proposed topologies are simulated in CMOS 180nm and their parameters like propagation delay, power dissipation and output enable time are given. From simulation results, it is seen that one of the proposed topologies with modified current source section performs the best.

In [76], further improvements to high speed and low power dissipation PFSCL based tristate buffers as given in [75] are proposed. In the paper, the load section is modified to feed the enable signal to the PMOS load directly instead of adding a transistor in series to the load section to cut off the power supply to get tristate output. With this modified load section, three topologies are presented. Simulations are done to characterise the propagation delay, power dissipation and output enable time of the proposed topologies and these are compared against the topologies given in [75] as well as existing switch based topologies and it is found that one of the proposed topologies give the best performance. In [77], PFSCL based asynchronous pipeline implementation is explored based on both existing conventional PFSCL and a more efficient fundamental cell based PFSCL. Further, a new hybrid implementation of the pipeline is proposed. A FIFO sequencer is implemented using the three different architectures and it is observed that the hybrid implementation of asynchronous pipeline results in lesser number of gates as well as lower average power dissipation.

In [78], a clocked current comparator based on PFSCL is presented and its operation is verified through simulations using 90nm CMOS technology parameters. From simulations, it is seen that the proposed clocked current comparator is power efficient design.

In [79], the paper presents a modified configurable cell in PFSCL style that enhances the capability of existing configurable cell in realizing complex logic functions for the case when the fan-in increases. Three different realisation of magnitude comparator based on conventional NOR/OR gate, existing configurable cell and proposed modified configurable cell methods are introduced. The proposed modified configurable cell introduces one more transistor in the outer branch in parallel to the existing transistor and this leads to reduction in number of gates required to implement the magnitude comparator. SPICE simulations using TSMC CMOS 180nm technology parameters are used to verify functionality and compare the performance where the simulation results show that modified configurable cell-based comparator's performance is superior to its counterpart, thereby establishing the concept and its usefulness.

### 1.3 Motivation

One of the issues with PFSCL is that conventional NOR based implementation [37] leads to implementation of complex logic using multiple cascaded gates, where each is biased by constant current source, in turn leading to increased delay and power consumption. This issue can be resolved by exploring alternate implementation architectures for PFSCL style that support higher fan-in, optimise the implementation of complex logic circuits and thus lower the delay while also having beneficial effect on power consumption. An alternate implementation requiring lesser number of gates through which complex logic can be implemented is through PFSCL fundamental cell architecture [66]. Based on [66], any two input logic can be implemented in a single stage leading to power efficient design. Some improvement is suggested in [79] leading to further reduction in the number of gates required to implement a complex logic function. However, there is scope for more improvement in the performance in terms of delay, power and area. In SCL style, the multithreshold technique is explored with positive impact on performance parameters [48]. This has not been yet explored in PFSCL style. The impact on parameters like area, power and delay with the introduction of multithreshold technique in PFSCL fundamental cell architecture is explored in this thesis. Also, the implementation of any two input complex logic in PFSCL style can be carried out in a single stage using the PFSCL fundamental cell architecture, however, for implementation of logic function requiring higher fan in, it can be extended with the resultant advantage in performance and this is also explored in this thesis.

Another concern in modern VLSI design is that with the increased demand for portable electronics, the power consumption has to be as low as possible in order to limit overheating and facilitate portability, thus simplifying the design of packaging and heat dissipation. In the case of PFSCL style, where the static power consumption is a direct product of power supply and bias current, one of the ways to reduce the static power consumption is to explore D-PFSCL [67] further. The application of dynamic current source to various implementation architectures based on PFSCL style is expanded further in this thesis.

Another way in which the static power consumption can be reduced is by correspondingly reducing the supply voltage, and when the supply voltage is drastically reduced to the point where it is lower than the threshold voltage, the operation is in subthreshold region. Operation in this region has its own advantages and disadvantages. Logic circuit operation in the subthreshold region is explored for logic styles such as CMOS, SCL etc. in [81-100]. Operation of complex arithmetic circuits based on PFSCL has not yet been explored in the sub threshold region and this is explored in this thesis.

In this work, implementation of PFSCL style using different architectures is investigated to achieve the following objectives:

- a. Investigation into the effect of lower threshold voltage transistor in PFSCL based implementation
- b. Enhancement of conventional circuit realisation schemes for high fan in.
- c. Design of improved D-PFSCL architectures.
- d. Investigation into the use of PFSCL in sub threshold region.

In order to achieve the objectives, the following are the highlights of the work carried out:

- a. Two different architectures are proposed with the introduction of multithreshold technique in PFSCL based implementation leading to reduction in area and reduction in minimum power supply voltage respectively.
- b. PFSCL fundamental cell architecture is extended for higher fan-in such that three input complex logic circuits can be implemented in a single stage with advantage in delay and power consumption.
- c. D-PFSCL architecture is modified so that any two input logic function can be implemented in a single stage giving reduction in power consumption and speed.

d. Operation of PFSCL circuits in subthreshold region and its advantages compared to performance of CMOS circuits in subthreshold region is investigated.

### **1.4 Thesis Organization**

This thesis examines the proposed PFSCL based architectures so as to be able to implement complex logic with optimised delay, power and area. It also explores the PFSCL based architectures that can support higher fan-in and work as a dynamic logic. It further explores the PFSCL in subthreshold region of operation and details how the performance in subthreshold region may be improved. Chapter 2 investigates basic PFSCL operation and provides an overview of the logic style and includes an examination of performance characteristics of basic PFSCL gates. It also discusses in detail the PFSCL fundamental cell and dynamic PFSCL.

Chapter 3 discusses two new PFSCL based architectures based on multithreshold technique that introduce the low threshold voltage transistor— i) in pull down network and ii) in constant current source. The two new architectures are analysed and it is seen that the first architecture leads to reduction in area while the second leads to reduction in power consumption.

Chapter 4 discusses a new PFSCL architecture that increases the fan-in and thus enables the implementation of complex logic in reduced number of cascaded stages leading to reduction in delay and power dissipation.

Chapter 5 builds upon the work done in D-PFSCL and presents ways to implement complex logic circuits in PFSCL style using dynamic current source and requiring lesser number of stages for implementation with improvement in delay and power consumption.

Chapter 6 examines the behaviour of circuit implemented using PFSCL style operating in subthreshold region. This region of operation is suitable for ultra low power applications, used for biomedical, implantable devices and devices used in sensor networks. The circuits implemented using PFSCL style in subthreshold region offers benefits compared to CMOS in subthreshold region and these results are presented.

Chapter 7 provides a final summary of the work throughout the thesis and summarizes avenues of potential future work where many projects or theses could build upon the base line principles established here.

Chapter 2 Existing PFSCL architectures

### 2.1 Introduction

In the previous chapter, the PFSCL literature survey is discussed followed by the observations on research gaps for further work in this thesis. The existing structures that have been picked are therefore discussed to better understand their working principle, design and their benefits. The architectures that are discussed in this chapter include basic PFSCL style with NOR based architecture, PFSCL fundamental cell architecture, D-PFSCL architecture.

### 2.2 PFSCL Style Fundamentals

The Positive Feedback Source Coupled Logic (PFSCL) [37] is a modified form of singleended SCL style. This logic style introduces a positive feedback into the conventional singleended SCL style to improve the switching speed of the gates and is targeted towards high speed and low power dissipation. Basic architecture of a generic PFSCL gate is given in Fig. 2.1.



Fig. 2.1 Basic architecture of a generic PFSCL gate

It consists of three major parts: a pull-down network (PDN) comprising of transistors Md1-MdN along with feedback transistor Mf, a constant current source realized by transistor (Ms) and a load transistor (Mr1). The circuit works on the principle of current steering.

Based on the logic level of the inputs  $A_1$ - $A_N$ , the bias current  $I_{SS}$  flows through either Md1-MdN or Mf. If any of the inputs  $A_1$ - $A_N$  is/are high,  $I_{SS}$  flows through left side branch and is converted into equivalent output voltage by the PMOS Mr1 presenting equivalent resistance  $R_P$ . This is the output low voltage  $V_{OL}$ , given by  $V_{DD}$ - $I_{SS}R_P$ . For the case where logic inputs  $A_1$ - $A_N$  are low, the output remains high, given by  $V_{DD}$ , as no current is drained from the left side branch and consequently there is no voltage drop across the resistance  $R_P$ . This is the output high voltage  $V_{OH}$ , given by  $V_{DD}$ . The difference between  $V_{OH}$  and  $V_{OL}$  is called the voltage swing,  $V_{SWING}$ . From Fig. 2.1, it can be observed that output Q generates the NOR of all the inputs  $A_1$ - $A_N$ . For the case where the inputs are  $\overline{A_1}$ - $\overline{A_N}$ , the output Q is the AND of all the inputs. Correspondingly, the OR and NAND output can be taken from the drain of Mf, with help of De Morgan's law. Thus, the architecture of the conventional PFSCL based gate leads to direct implementation of NOR/OR functions. However, from the architecture, it is observed that for implementation of logic functions that are sum-of-minterms expression, multiple gates are needed, each gate implementing either a AND/NAND or an OR/NOR function.

#### 2.2.1 **PFSCL** operation and analysis

To analyze its behaviour, a PFSCL inverter as shown in Fig. 2.2 is considered. The operation of the PFCSL inverter is explained as follows: Based on the logic level of the input A, the bias current  $I_{SS}$  flows through either Md1 or Mf. For the case when input A is high,  $I_{SS}$  flows through Md1 and is converted into equivalent output voltage by the load transistor Mr1 presenting equivalent resistance  $R_P$ . This is the output low voltage  $V_{OL}$ , given by  $V_{DD}$ - $I_{SS}R_P$ .

For the case where logic inputs A is low, the output remains high, given by  $V_{DD}$ , as no current is drained from the left side branch and consequently there is no voltage drop across the resistance  $R_P$ . This is the output high voltage  $V_{OH}$ , given by  $V_{DD}$ . The difference between  $V_{OH}$  and  $V_{OL}$  is called the voltage swing,  $V_{SWING}$ .



Fig. 2.2 PFSCL inverter [37]

The analysis of the PFSCL inverter is done with approximations such that designing of any PFSCL circuit can be done through hand calculations as described in [37]. The analysis is presented in two parts: static model and delay model. The quantities such as voltage swing  $(V_{SWING})$ , small-signal voltage gain  $(A_v)$  and the noise margin (NM) are constituents of static model and are mathematical represented.

The values of high, low output voltages ( $V_{OH}$  and  $V_{OL}$ ) and  $V_{SWING}$  are respectively given as

 $V_{OL} = V_{DD} - I_{SS} R_P \tag{2.1b}$ 

$$V_{SWING} = I_{SS}R_{P}$$
(2.2)



Fig. 2.3 Voltage transfer characteristics of the PFSCL inverter

The voltage transfer characteristics (VTC) of the PFSCL inverter is shown in Fig. 2.3. It may be noted that VTC is symmetrical around the logic threshold voltage ( $V_{LT}$ ). The small-signal voltage gain ( $A_V$ ) is evaluated around  $V_{LT}$ . The value of  $A_V$  is computed around the  $V_{LT}$  by using superposition of the input voltages at the gate of the transistors Md1 and Mf, and is given as

$$A_{\rm V} = \frac{\frac{g_{\rm mn}R_{\rm P}}{2}}{1 - \frac{g_{\rm mn}R_{\rm P}}{2}}$$
(2.3)

Where  $g_{mn}$  corresponds to the transconductance of the NMOS transistor and its value around the V<sub>LT</sub> is  $\sqrt{\mu_n C_{ox} \frac{W_N}{L_N} I_{SS}}$ .

The NM of a PFSCL inverter [61] is given by (2.4).

$$NM = \frac{V_{SWING}}{2} f \left( \frac{g_{mn} R_P}{2} \right)$$
(2.4)

where function f is expressed as

$$2\sqrt{\frac{1}{2}\left(1-\frac{1}{16x^{2}}\right)\left(1-\sqrt{1-\frac{1-\frac{1}{4x^{2}}}{\left(1-\frac{1}{16x^{2}}\right)^{2}}}\right)\left\{2\sqrt{1-\frac{1}{2}\left(1-\frac{1}{16x^{2}}\right)\left(1-\sqrt{1-\frac{1-\frac{1}{4x^{2}}}{\left(1-\frac{1}{16x^{2}}\right)^{2}}\right)-\frac{1}{x}}\right\}}$$
(2.5)

For  $x = \frac{g_{mn}R_P}{2} < 1$ , the function of (2.5) can be approximated by (2.6)

$$f(x) = 1.4x - 0.65$$
 (2.6)

The function exhibits hysteresis for x greater than unity. The above relation is valid for  $x = \frac{g_{mn}R_P}{2} < 1$ , whereas it 1. Alternatively, by using the piecewise-linear approximation of the VTC [61], the NM can also be computed as

$$NM = \frac{V_{SWING}}{2} \left( 1 - \frac{1}{A_V} \right)$$
(2.7)



Fig. 2.4 Linear half circuit

To evaluate the propagation delay, the circuit may be linearised around the logic threshold. The presence of feedback, however, makes the analysis complex and an alternate approach is given in [62] which suggests the use of half circuit model depicted in Fig. 2.4. The propagation delay is computed as

$$\tau_{PD} = R_P C_{out} = R_P (C_{db,d1} + C_{gd,d1} + C_{db,r1} + C_{gd,r1} + C_{gd,f1} + \frac{1}{2} C_{gs,f} + C_L)$$
(2.8)

Here,  $C_{out}$  represents overall capacitance at the output node and includes capacitive contribution of the transistors and the external load capacitance  $C_L$ . The drain-bulk junction capacitance ( $C_{db}$ ), the gate-to-source capacitance ( $C_{gs}$ ) and the gate-to-drain capacitance ( $C_{gd}$ ) are capacitive contribution from transistors in PDN (Md1-Mf) and load section (Mr1). For the transistors operating in saturation region, only overlap capacitance need be considered. Therefore, the  $C_{gst}$  contribution from NMOS transistors will be equal to  $C_{gst}W_N$  where  $W_N$  represents width of NMOS transistor in PDN [62]. As the PMOS transistor in load section operates in linear region, so both overlap capacitance and the intrinsic contribution associated with its channel charge [62] are taken into account. The contribution of junction capacitance for the transistors is adopted from [62]. The Miller effect associated with the gain from the gate-to-source makes capacitance contribution of transistor Mf into  $\frac{1}{2}C_{gs}$  [62]. Further,  $R_P$  in Fig. 2.4 is the equivalent resistance of the PMOS Mr1, given as in (2.11).

Based on the expressions for small signal voltage gain, voltage swing etc. the design approach of a PFSCL inverter for a given value of the bias current  $I_{ss}$  and the noise margin is presented in [62]. Considering saturation region operation of NMOS transistors in the PDN and  $g_{mn}R_P/2=1$ , the value of voltage swing for a specified value of NM is computed using (2.4) as

$$V_{SWING} = \frac{2NM}{f(1)} = 2.7NM$$
 (2.9)

Once the voltage swing is obtained, the next step is to size load transistor with equivalent resistance  $R_P$  (=V<sub>SWING</sub>/I<sub>SS</sub>). The equivalent resistance, of minimum sized PMOS transistor ( $R_{Pmin}$ ), is assessed first which is followed by determining the bias current I<sub>HIGH</sub> using (2.10)

$$I_{\text{HIGH}} = \frac{V_{\text{SWING}}}{R_{\text{Pmin}}}$$
(2.10)

If the computed value of bias current is higher than  $I_{HIGH}$ , then  $R_P < R_{Pmin}$ . To achieved this, LP is set to its minimum value (LMIN) and the required WP is computed using the standard BSIM3v3 MOSFET model [80] given by (2.11) as

$$R_{\rm P} = \frac{R_{\rm int}}{1 - \frac{R_{\rm DS}}{R_{\rm int}}}$$
(2.11)

where  $R_{DS} = \frac{(R_{DSW}^{*1E-6)}}{W_P}$  models the source/drain parasitic resistance with  $R_{DSW}$  as the empirical model parameter,  $W_P$  as the width of the PMOS load transistors and  $R_{int}$  as the intrinsic resistance of the load transistor in the linear region given as

$$R_{int} = \left[\mu_{effp} C_{ox} \frac{W_{P}}{L_{P}} (V_{DD} - |V_{TP}|)\right]^{-1}$$
(2.12a)

$$W_{P} = \frac{I_{SS}}{V_{SWING}} \frac{L_{Pmin}}{\mu_{effp} C_{ox} (V_{DD} - |V_{TP}|) \left( 1 - \frac{R_{DSW} 10^{-6} \mu_{effp} C_{ox} (V_{DD} - |V_{TP}|)}{L_{Pmin}} \right)}$$
(2.12b)

Similarly, if  $I_{SS} < I_{HIGH}$ , then the value of  $R_P > R_{Pmin}$ . In such case minimum value is taken for  $W_P$  and the values of WMIN and LP are calculated from (2.11) as

$$L_{P} = \mu_{effp} C_{ox} W_{Pmin} \left( V_{DD} - |V_{TP}| \right) \left[ \frac{V_{SWING}}{I_{SS}} - \frac{R_{DSW} 10^{-6}}{W_{Pmin}} \right]$$
(2.13)

After this, the dimensions of transistors in the PDN (Md1, Mf1) are derived by substituting  $\frac{g_{mn}R_{p}}{2} = \sqrt{\mu_{n}C_{ox}\frac{W_{N}}{L_{N}}\frac{1}{I_{ss}}} * \frac{V_{SWING}}{2}$ in the derived equation of A<sub>V</sub> in (2.3). The width W<sub>N</sub> of the PDN

transistors is calculated as.

$$W_{N} = \left(\frac{A_{v}}{1 - A_{v}}\right)^{2} \frac{L_{Nmin}I_{SS}}{2\mu_{n}C_{ox}V_{SWING}^{2}}$$
(2.14)

If bias current is taken lower than supported by minimum sized NMOS transistor, ( $I_{LOW}$ ) then (2.14) yields in a value smaller than the minimum channel width. The  $W_N$  is also set to minimum value  $W_{Nmin}$  in such cases. Using (2.14),  $I_{LOW}$  is given by (2.15).

$$I_{LOW} = \frac{1}{4} \frac{W_{Nmin}}{L_{min}} \mu_n C_{ox} V_{SWING}^2$$
(2.15)

Further to the discussion on the design of the gates, the minimum power supply,  $V_{DDmin}$  under which the PFSCL inverter can function correctly is also calculated. This is an important parameter as knowing the minimum power supply can provide scope in reducing the static power dissipation, which is directly proportional to the power supply and is given by  $V_{DD}$ .I<sub>SS</sub>. For any source coupled logic gate with N levels of source coupled transistor pair, the  $V_{DDmin}$ can be expressed as:

$$V_{DDmin} = (N+1)V_{B} - (N-1)V_{TN} - V_{Tcurr}$$
 (2.16)

Where  $V_{TN}$ ,  $V_B$ , and  $V_{Tcurr}$  represent the typical threshold voltage, bias voltage and threshold voltage for current source respectively. With the PFSCL inverter having single level of source coupled transistor pair i.e. N=1, the minimum power supply ( $V_{DDmin}$ ) given by (2.16) changes to:

$$V_{\text{DDmin}} = 2V_{\text{B}} - V_{\text{Tcurr}}$$
(2.17)

## 2.2.2 Realisation of Conventional PFSCL gates

The PFSCL realisation of generic N input NOR and NAND gates are discussed in section 2.2 and shown in Fig. 2.5a and Fig. 2.5b respectively. Based on this, the 2 input XOR (XOR2) PFSCL gate requires two levels as shown in Fig. 2.5c. Other complex gates can also be implemented in similar way.





(a)



(b)



Fig. 2.5 PFSCL style a) NOR2 b) NAND2 c) XOR2

## 2.3 PFSCL fundamental cell architecture

In the previous section, we discussed conventional PFSCL style, where the gates are implemented using NOR based architecture [37]. As we saw for the case of XOR2 gate and similarly for other complex logic, conventional PFSCL based implementation leads to multiple cascaded PFSCL gates with corresponding increase in its delay and power. This issue is mitigated in [65], which proposes a new architecture, called fundamental cell, that can implement any 2-input complex logic function in a single stage, thus reducing the number of cascaded stages needed for function implementation.

A D-latch based on PFSCL fundamental cell architecture (PFSCL FC) is shown in Fig. 2.6a [66], with the associated block diagram in Fig. 2.6b. The fundamental cell is based on the triple-tail cell concept, consisting of two triple-tail cells TT-1:(Md3, Mc1, Md4) and TT-2: (Md5, Mc2, Md6) biased by separate current sources of  $I_{SS}/2$  value such that the new cell draws the same current as that of the single traditional PFSCL gate. The said D-latch also uses a PFSCL inverter in the first stage to generate complement of the CLK input. Comparing

this implementation with a NOR based implementation that needs 6 NOR gates and 3 stages, it is observed that the PFSCL FC offers advantages in terms of delay and power dissipation.



Fig. 2.6 a) PFSCL FC based D-Latch [66] b) Block representation

The transistors Ms1 and Ms2 operate in saturation in order to maintain a constant bias current of  $I_{SS}/2$  value. The four PMOS transistors (Mr1, Mr2, Mr3 and Mr4) work as load. To realize a D-latch, the transistors Mc2 and Mc1 are driven by the input clock (CLK) and its complement. The complement of the CLK is generated by the PFSCL inverter in the first stage. Mc1 and Mc2 are connected between the supply terminal and the common source terminal of transistor pairs Md3–Md4 and Md5–Md6 respectively, as shown in Fig. 2.6. A high voltage on CLK turns ON the transistor Mc2, and deactivates the transistor pair Md5–

Md6. At the same time, the transistor Mc1 turns OFF so that the transistor pair Md3–Md4 generates the output according to the input D. For the case when CLK is low, the transistor pair Md3–Md4 gets activated and preserves the previous output. Thus, the PFSCL FC based D-latch models the positive level sensitive D-latch.

Proper operation demands that when either Mc1 or Mc2 is ON, the side transistors Md3-Md4 or Md5-Md6 should be OFF. However, if all transistors are of the same dimension, then this would not be possible. When a triple-tail cell is inactive i.e. Mc1 or Mc2 is ON, it has to be ensured that all of the bias current flows through the central branch and not through the side branches. If the dimensions of all the transistors in the PDN are the same, then the current would get equally divided between the side branch and the central branch for the case where side branch is also ON along with the central branch. To avoid such a situation, the central branch in the triple-tail cell is designed to have a dimension which is N times the dimensions of the side branch, ensuring the majority of the bias current flows through the central branch deactivating the cell properly. Further, at any given time, either of the two cells (TT-1/TT-2) gets activated and determines the output of the gate.

## 2.3.1 Fundamental cell operation and analysis

To analyse the operation of the fundamental cell, a OR2 gate is considered, as in Fig. 2.7.



Fig. 2.7 Fundamental cell based OR2 gate

In the OR2 gate, for the case when B is asserted high, TT-1 is activated while TT-2 gets deactivated. In this case, since for OR2 gate the output should be logic high irrespective of the value of input A, and the TT-1 is driven by  $V_{DD}$ . The other TT-2 does not contribute to the output since whole of the bias current  $I_{SS}/2$  flows through Mc2. Further, for the case where input B is low, TT-1 is deactivated and TT-2 contributes to the output, which in this case depends on the input A. The general expression for the current flowing through the central branch (Mci) and side branch (Mdi), keeping in mind that the ratio of dimensions of central branch to side branch is N, is as in (2.18). The value of N is generally taken from 5-20.

$$I_{Mdi} = \frac{I_{SS}}{2} \frac{1}{N+1}$$

$$I_{Mci} = \frac{I_{SS}}{2} \frac{N}{N+1}$$
(2.18)

Considering all the input combinations for the OR2 gate, the currents through the central branch and the side branches were derived using which the output high voltage and output

low voltage were derived. The output voltage is decided by the currents flowing through Md2 and Md4. Table 2.1 gives the output voltages for various combinations of inputs.

| S.No. | b. Inputs |      | Output voltage V <sub>Q</sub>                                            |  |
|-------|-----------|------|--------------------------------------------------------------------------|--|
|       | A         | В    |                                                                          |  |
| 1.    | High      | High | $V_{OH} = V_{DD}$                                                        |  |
| 2.    | Low       | Low  | $V_{OL1} = V_{DD} - \frac{R_P I_{SS}}{2} \left(1 + \frac{1}{N+1}\right)$ |  |
| 3.    | High      | Low  | $V_{OL2} = V_{DD} - \frac{R_P I_{SS}}{2}$                                |  |
| 4.    | Low       | High | $V_{OL2} = V_{DD} - \frac{R_P I_{SS}}{2}$                                |  |

Table 2.1 Output voltages for various combinations of inputs

Since there are two values of output low voltage  $V_{OL1}$  and  $V_{OL2}$ , there are two values of voltage swing, as given in (2.19).

$$V_{SWING1} = \frac{R_{P}I_{SS}}{2} \left( 1 + \frac{1}{N+1} \right)$$
(2.19a)

$$V_{SWING2} = \frac{R_P I_{SS}}{2}$$
(2.19b)

For large values of N, the voltage swing V<sub>SWING</sub> can be approximated as:

$$V_{SWING} = \frac{R_P I_{SS}}{2}$$
(2.20)

where  $R_P$  is the equivalent PMOS resistance given in [37].

The small signal voltage gain,  $A_V$  and the noise margin, NM for the new fundamental cell are computed by the method outlined in [37] and are given in (2.21) and (2.22) respectively.

$$A_{V} = \frac{g_{mn}R_{P}/2}{1-g_{mn}R_{P}/2}$$
(2.21)

30

$$NM = \frac{V_{SWING}}{2} \left( 1 - \frac{1}{A_V} \right)$$
(2.22)

Where  $g_{mn} = \sqrt{\mu_n C_{ox} \frac{W_N}{L_N} \frac{I_{ss}}{2}}$ , is the transconductance of the transistors Md1-Md4,  $\mu_n$ ,  $W_N$  and  $L_N$  are the effective electron mobility, the effective channel width and length of the said transistors, respectively.

The use of fundamental cell offers high performance circuits but there exists a limitation in terms of area requirement. The proper operation of fundamental cell requires that the complete bias current  $I_{SS}/2$  should flow through the centre transistor in a deactivated triple-tail cell. But in practice it is difficult to achieve since the bias current  $I_{SS}/2$  divides between the centre branch and one of the two outer branches as both of them are driven by high inputs. To address this limitation and facilitate proper activation/ deactivation, the aspect ratio of centre transistors is made N times of the outer transistors [66]. However, it is obvious that while realizing complex function this approach leads to significant area overhead due to larger aspect ratio of centre transistors.

#### 2.3.2 Implementation of gates

The concept of fundamental cell is generalized by defining a configurable logic block (CLB) [69], as shown in Fig. 2.8. It consists of a PFSCL inverter to generate complement of input M and two triple-tail cells (TT-1, TT-2) and can be configured to realise various two input logic functions. Its usage has also been extended to efficiently realize complex circuits such as comparators, adders, multipliers, LFSR etc. [70-74].



Fig. 2.8 CLB [71]

The mapping of various inputs to X, Y and  $\overline{M}$  and interconnection of the outputs Q1-Q4 for the implementation of various two input logic functions like XOR2, MUX2 etc. is given in the following Table 2.2.

| Logic    | Actual    | Mapping with the CLB inputs |    |    | Output nodes |
|----------|-----------|-----------------------------|----|----|--------------|
| Function | Inputs    | М                           | X  | Y  |              |
| OR       | A, B      | В                           | В  | А  | Q2, Q4       |
| NOR      | A, B      | В                           | В  | А  | Q1, Q3       |
| NAND     | A, B      | В                           | А  | В  | Q1, Q3       |
| AND      | A ,B      | В                           | А  | В  | Q2, Q4       |
| XOR      | A, B      | В                           | А  | А  | Q1, Q4       |
| XNOR     | A, B      | В                           | А  | А  | Q2, Q3       |
| MUX      | SEL,I0,I1 | SEL                         | I1 | IO | Q2, Q4       |

Table 2.2 Realisation of different logic functions using CLB

From the architecture, it is further observed that any two input complex function can be easily implemented using fundamental cell, however, three input complex logic functions would require cascading of two fundamental cell based gates.

## 2.4 Dynamic PFSCL (D-PFSCL)

The Dynamic Positive Feedback Source Coupled Logic (D-PFSCL) gates [67] are designed by adapting the method suggested for differential dynamic gates in [52, 56] and they show an improvement in speed and power consumption over the static PFSCL counterpart. Like static PFSCL, the D-PFSCL supports only NOR/OR operation and the structure of a D-PFSCL inverter is shown in Fig. 2.9. It comprises of three main parts namely a pull down network (PDN), precharge transistors and a dynamic current source (DCS).



Fig. 2.9 D-PFSCL inverter [67]

The PMOS transistor  $M_{r1}$  is the precharge transistor while the transistors  $M_{s1}$ ,  $M_{s2}$  and the capacitor  $C_1$  form the DCS. Both of these are driven by a clock (CLK) input. The PDN consists of transistor Md1-Mf, with Md1-driven by input A. The transistor  $M_f$  is driven by a feedback connection from the output node Q voltage to provide a feedback connection. The output node, Q provides the inverted output. However, in cases, where buffer output is required, then it is obtained by placing an additional precharge transistor Mr2 in Fig 2.9 and

then taking the output from its drain terminal (drain of  $M_f$ ). At any given instance, output from only one node is obtained as these gates belong to single-ended SCL gates.

A D-PFSCL gate operates in two phases known as precharge phase and the evaluation phase, depending on the CLK signal. For low value of CLK, the circuit works in precharge phase wherein the output node is charged to  $V_{DD}$  since the precharge transistor is ON. Simultaneously, the capacitor C<sub>1</sub> gets discharged to ground potential via the conducting M<sub>s2</sub> transistor. Further, any changes in the applied input A does not influence the output as M<sub>s1</sub> is OFF such that no conductive path can be established between the output nodes and ground. Alternatively, for high value of CLK, the logic function is evaluated since M<sub>s1</sub> is ON and the transistors M<sub>s2</sub>, M<sub>r1</sub> are OFF. In this evaluation phase, either the output remains at logic high i.e. V<sub>DD</sub> or discharges to logic low i.e. V<sub>DD</sub>-V<sub>SWING</sub>. The mechanism for fixing the voltage swing is discussed in the next section.

### 2.4.1 D-PFSCL operation and analysis

To achieve the required voltage swing  $V_{SWING}$  during the evaluation, the capacitor  $C_1$  is sized such that the output is discharged to the low level i.e.  $V_{DD}$ - $V_{SWING}$ . Using the charge conservation principle, we can write:

$$V_{DD}C_{OUT} = (C_{OUT} + C_1)(V_{DD} - V_{SWING})$$

$$(2.23)$$

where  $C_{OUT}$  is the total load capacitance at the output node which include the parasitic capacitances of the transistors and the external load capacitance  $C_L$ .

$$C_1 = \frac{(V_{SWING}C_{OUT})}{(V_{DD}-V_{SWING})}$$
(2.24)

In practice, the capacitor  $C_1$  is realized by a dummy transistor with its source and drain terminals connected together. Considering the width and length of dummy transistor to be  $W_{C1}$  and  $L_{C1}$ , the size of dummy transistor can be calculated as:

$$W_{C1}L_{C1} = \frac{(V_{SWING}C_{OUT})}{C_{ox}(V_{DD}-V_{SWING})}$$
(2.25)

where  $C_{\text{ox}}$ , is the gate oxide capacitance per unit area.

The D-PFSCL gates do not consume static power. This is due to the fact that there is no constant current path from the power supply to the ground because the transistor pairs  $M_{s1}$  and  $M_{s2}$  (Fig. 2.9) in the dynamic current source would never turn ON simultaneously. The D-PFSCL gates, however, consume dynamic power which is

$$P_{dyn} = C_{OUT} V_{DD} V_{SWING} f_{CLK}$$
(2.26)

where  $f_{CLK}$  is frequency of the clock signal.

The dynamic power consumption of the logic gates depends on the switching activity of the output node [1]. It is already known that for uniformly distributed inputs, the low-to-high transition probability for an N input gate is

$$\alpha_{L \to H} = \frac{N_0}{2^N} \tag{2.27}$$

where  $N_0$  is the number of zero entries in the truth table of the logic function. Thus, the power consumption of an N-input D-PFSCL gate with transition probability  $\alpha_{L \to H}$  is given as:

$$P_{dyn} = \alpha_{L \to H} C_{OUT} V_{DD} V_{SWING} f_{CLK}$$
(2.28)

### 2.4.2 Realisation of gates

To implement complex logic, multiple D-PFSCL gates need to be cascaded and D-PFSCL gates are cascaded by two methods: one by inserting a PFSCL inverter in between each stage so that only a low to high transition at the output will occur and will avoid erroneous outputs of the subsequent stages. However, insertion of PFSCL inverter leads to static power consumption that contradicts the basic aim of the proposed logic style. Second method involves inserting a self-timed buffer (STB) to avoid malfunction because of the succeeding stage, after the preceding stage output has stabilized ensuring that the evaluation of a succeeding stage starts only after the completion of evaluation in its preceding stage [56]. The STB uses two clocked cascaded inverters as shown in Fig. 2.10a, with its symbol shown in Fig.2.10b. The CLK signal (CLK<sub>i</sub>) and the node Y voltage in dynamic current source of the preceding state (stage i) drives the first inverter. The output of STB (CLK<sub>i+1</sub>) is now given as the CLK signal to the subsequent stage (stage i+1).



Fig. 2.10 STB a) MOS schematic b) symbol

The overall power consumption for multi-stage D-PFSCL gate based on use of STB between the stages with K D-PFSCL identical gates and M identical STBs can be approximated as Where  $C_{STB}$  is the capacitance at drain of M1-M2 of STB (and consequently of M3-M4 of STB) and the factor of 2 is due to the two transistor pairs M1-M2, M3-M4.

An implementation of 2-input D-PFSCL XOR gate is shown in Fig. 2.11c using 2-input D-PFSCL NOR gates (Fig. 2.11a) with symbol as in Fig.2.11b using an intermittent STB (Fig. 2.10b). Similarly, the 3-input D-PFSCL XOR gate can be implemented by connecting two identical XOR gates.



Fig. 2.11 a) D-PFSCL NOR2 b) D-PFSCL NOR2 symbol c) D-PFSCL XOR2 gate [66]

# 2.5 Conclusion

In this chapter, the analysis and design of the existing PFSCL architectures are presented. The modelling of static parameters and delay is discussed under fundamentals of PFSCL style which is followed by design considerations. PFSCL FC is also described and its advantages in realizing XOR2 gate are also elucidated. The D-PFSCL, a dynamic variant of PFSCL which helps in avoiding static power consumption is next discussed. The operation, design and methods for cascading D-PFSCL gates are also elaborated.

Chapter 3 Multithreshold PFSCL Architecture

## 3.1 Introduction

Multithreshold technique is widely used for achieving higher speed in active mode of operation and lowering power in sleep mode [49]. The use of lower threshold transistor provides higher current drive and this fact may be exploited in reducing device dimensions. It is also seen that the minimum power supply for the operation for SCL and its variants depends on threshold voltage of the current source,  $V_{Tcurr}$ . Thus, the introduction of lower threshold transistor may help in reducing footprint as well as power supply.

This chapter focuses on two aspects of using low threshold transistor in PFSCL fundamental cell. It is introduced in central branch transistor to reduce the footprint, and in current source transistor to lower minimum power supply. Both their behaviours are analysed and supported by mathematical formulations followed by simulations.

### **3.2 Proposed architecture-1**

The existing PFSCL fundamental cell (PFSCL FC) [66] requires that for proper operation, the dimension of the centre transistor be N times the dimension of the outer transistors and this approach leads to significant area overhead due to larger aspect ratio of centre transistors. Proposed architecture-1 modifies the existing PFSCL FC [66] to include a low threshold voltage transistor as the centre transistor, instead of normal threshold voltage transistor.

The proposed architecture-1 achieves activation/deactivation by lowering the threshold voltage of the centre transistor by a factor  $\alpha$  in comparison to the outer transistors. For understanding the working, consider the proposed architecture-1 based XOR2 gate given in Fig. 3.1a, with symbol as in Fig. 3.1b. The PDN has two modified triple-tail cells MTT-1 (Md1, Mc1, Md2) and MTT-2 (Md3, Mc2, Md4) each biased by I<sub>SS</sub>/2 with the inputs A, B and  $\overline{B}$ . In the schematic, the low threshold voltage centre transistors Mc1- Mc2 are made bold

to differentiate from others having typical threshold voltage. The transistors Mr1-Mr4 act as loads and generate the output voltage Q.



Fig. 3.1 Proposed architecture-1 a) XOR2 gate b) Symbol

From the schematic, the working of the proposed architecture-1 based XOR2 gate is as follows: As in the case of existing PFSCL FC, for proper operation of the gate, only one of the modified triple-tail cells MTT-1 or MTT2 is active, that is one of the central branches (Mc1-Mc2) is OFF (active) and the other is ON (inactive). Going further, for the case where input B is high, MTT-2 is inactive as Mc2 drains the bias current  $I_{SS}/2$  from  $V_{DD}$  and does not affect the output. This happens because Mc2 is low threshold voltage transistor compared to Md3-Md4 and conducts majority of the current even when Md3 or Md4 is ON. Thus, low 42

threshold voltage for Mc1-Mc2 enables correct operation. Correspondingly, MTT-1 is active for input B at high level, and the output Q is high if A is low else it is high. Similarly for the case when input B is low, MTT-1 is inactive and MTT-2 is active. In this case, if A is high, output Q is high else it is low. Thus, the functionality of the gate can be modeled as:

$$Q = \begin{cases} A & \text{if } B = 0 \\ \\ \overline{A} & \text{if } B = 1 \end{cases}$$
(3.1)

The schematic of a generic gate based on proposed architecture-1 is shown in Fig. 3.2. The PDN of the proposed architecture-1 generic gate has two modified triple-tail cells MTT-1 with the generic inputs A, B and M. Here, the output voltage is generated by combining any one of the two output nodes of modified triple-tail cells i.e. (MTT-1: either Q1 or Q2; MTT-2 either Q3 or Q4). To generate any two input logic function, the inputs A, B and M are mapped appropriately as per the Table 2.2.



Fig. 3.2 Proposed architecture-1 generic gate

### 3.2.1 Analysis

To analyze the proposed architecture-1, a XOR2 gate is considered for the derivation of the static model and the delay model and is discussed in detail further.

The static behaviour is modelled in terms of three parameters namely the voltage swing  $V_{SWING}$ , the small signal voltage gain,  $A_V$ , and the noise margin, NM. Based on the insight into the working of the proposed XOR2 gate as given in Fig.3.1, currents in MTTs of the proposed XOR2 gate are derived. In a deactivated MTT, it is clear that one of the outer transistors and the centre transistor are ON. Therefore, in such a situation, based on the design assumption that the threshold voltage of the centre transistor is lower than the outer by  $\alpha$  factor, the currents through the i<sup>th</sup> centre transistor (I<sub>Ci</sub>) and the j<sup>th</sup> ON outer transistor (I<sub>Dj</sub>) where i  $\epsilon$  (1,2) and j  $\epsilon$  (1,2,3,4), can be expressed as:

$$I_{Ci} = \frac{\mu_{n} C_{ox}}{2} \frac{W_{N}}{L_{N}} \left( V_{GS} - \frac{V_{TN}}{\alpha} \right)^{2}$$
(3.2)

$$I_{Dj} = \frac{\mu_n C_{ox}}{2} \frac{W_N}{L_N} (V_{GS} - V_{TN})^2$$
(3.3)

 $\mu_n$ ,  $V_{GS}$  and  $V_{TN}$  are the effective electron mobility, the gate source voltage and the threshold voltage of NMOS transistor respectively. As each MTT-1 and MTT-2 are biased by the current source with bias current  $I_{SS}/2$ , the two currents can be related as:

$$I_{Ci} + I_{Dj} = \frac{I_{SS}}{2}$$
 (3.4)

## Using (3.2)-(3.4), I<sub>Ci</sub> and I<sub>Dj</sub> are derived as:

$$I_{Ci} = \frac{I_{SS}}{2} \left( \frac{1}{2} + \frac{\sqrt{\frac{\mu_{n}C_{ox}W_{N}}{2}V_{TN}^{2}(\frac{\alpha-1}{\alpha})^{2}} \sqrt{\left(I_{SS} - \frac{\mu_{n}C_{ox}W_{N}}{2L_{N}}V_{TN}^{2}(\frac{\alpha-1}{\alpha})^{2}\right)}}{I_{SS}} \right)$$
(3.5)

$$I_{Dj} = \frac{I_{ss}}{2} \left(1 - \frac{1}{2} - \frac{\sqrt{\frac{\mu_{n}C_{ox}W_{N}}{2}V_{TN}^{2}(\frac{\alpha - 1}{\alpha})^{2}} \sqrt{\left(I_{ss} - \frac{\mu_{n}C_{ox}W_{N}}{2L_{N}}V_{TN}^{2}(\frac{\alpha - 1}{\alpha})^{2}\right)}}{I_{SS}}\right)$$
(3.6)

Substituting  $p = \frac{1}{2} + \frac{\sqrt{\frac{\mu_n C_{ox} W_N}{2} V_{TN}^2 (\frac{\alpha - 1}{\alpha})^2} \sqrt{\left(I_{ss} - \frac{\mu_n C_{ox} W_N}{2L_N} V_{TN}^2 (\frac{\alpha - 1}{\alpha})^2\right)}}{I_{SS}}$ , the current equations (3.5) and (3.6)

are simplified as:

$$I_{\rm Ci} = \frac{I_{\rm ss}}{2} . \mathbf{p} \tag{3.7}$$

$$I_{\rm Dj} = \frac{I_{\rm ss}}{2} (1-p) \tag{3.8}$$

Based on the derived current expression, the output voltage for different input combinations fed to the proposed architecture-1 based XOR2 gate (Fig. 3.1) is obtained and used to derive the  $V_{SWING}$ , which is defined as the difference between high output voltage ( $V_{OH}$ ) and low output voltage ( $V_{OL}$ ) subsequently.

Case 1. Both inputs (A and B) are at high logic level: In this condition, the transistors Md1, Md3 and Mc2 are ON and transistors Mc1, Md2, Md4 are OFF. The current through the transistors Md1, Md3 and Mc2 is written as:

$$I_{D1} = \frac{I_{SS}}{2}; I_{C2} = \frac{I_{SS}}{2} .p; I_{D3} = \frac{I_{SS}}{2} (1-p)$$
(3.9)

This input condition produces a low output voltage V<sub>OL</sub> computed as:

$$V_{OL} = V_{DD} - \frac{R_{P}I_{SS}}{2}$$

$$(3.10)$$

Case 2. Both inputs (A and B) are at low logic level: The transistors Md2, Mc1 and Md4 are ON and transistors Md1, Mc2, Md3 are OFF in this case. Therefore, the current through the transistors Md2, Mc1 and Md4 is found as:

$$I_{D2} = \frac{I_{SS}}{2} (1-p); I_{C1} = \frac{I_{SS}}{2} \cdot p; I_{D4} = \frac{I_{SS}}{2}$$
(3.11)

This input condition corresponds to low output voltage  $V_{\text{OL}}$  given as:

$$V_{OL} = V_{DD} - \frac{R_P I_{SS}}{2}$$

$$(3.12)$$

Case 3. Input A is high and input B is low logic levels: Under this condition, the transistors Md1, Mc1, Md3 are ON and transistors Md2, Mc2, Md4 are OFF. The current through the ON transistors Md1, Mc1, Md3 is computed as:

$$I_{D1} = \frac{I_{SS}}{2} (1-p); I_{C1} = \frac{I_{SS}}{2} \cdot p; I_{D3} = \frac{I_{SS}}{2}$$
(3.13)

Consequently, the expression for the high output voltage  $V_{OH}$  is evaluated as:

$$V_{OH} = V_{DD} - \frac{R_{P}I_{SS}(1-p)}{2}$$
(3.14)

Case 4. Input A is low and Input B is high: Here, the transistors Md2, Mc2 and Md4 are ON and the transistors Md1, Mc1, Md3 are OFF. The current through the ON transistors Md2, Mc2 and Md4 are expressed as:

$$I_{D2} = \frac{I_{SS}}{2}; I_{C2} = \frac{I_{SS}}{2}.p; I_{D4} = \frac{I_{SS}}{2} (1-p)$$
(3.15)

Consequently, the expression for the high output voltage  $V_{OH}$  is evaluated as:

$$V_{OH} = V_{DD} - \frac{R_{P}I_{SS}(1-p)}{2}$$
(3.16)

Using the above equations (3.9)-(3.16), the V<sub>SWING</sub> is expressed as:

$$V_{SWING} = V_{OH} - V_{OL} = \frac{pR_P I_{SS}}{2}$$

$$(3.17)$$

The  $A_v$  and NM for the proposed architecture-1 based XOR2 gate, is calculated as per [37] and is given in (3.18)-(3.19).

$$A_{v} = \frac{g_{mn}R_{p}/2}{1-g_{mn}R_{p}/2}$$
(3.18)

where g<sub>mn</sub> is the transconductance of the modified triple-tail cell, given by

$$g_{mn} = \sqrt{\mu_n C_{ox} \frac{W_N}{L_N} \frac{I_{ss}}{2}}$$

$$NM = \frac{V_{SWING}}{2} (1 - \frac{1}{A_v})$$
(3.19)

Now, the propagation delay of the proposed architecture-1 is derived on the basis of the behaviour of the proposed architecture-1 XOR2 gate. The propagation delay depends on the contribution of parasitic MOS capacitances at the output node and the load capacitance. The parasitic capacitance for the proposed XOR2 gate (Fig. 3.1) is calculated by considering input B as low, such that MTT-2 is activated and MTT-1 is deactivated. Now, for a low-to-high transition on input A, total capacitance at the output node is depicted in Fig. 3.3, consisting of the parasitic capacitances and the load capacitance  $C_L$ .



Fig. 3.3 Linear half circuit of proposed architecture-1

The propagation delay  $\tau_{PD}$  can be expressed as

$$\tau_{PD} = R_P C_{out}$$

$$= R_{P}(C_{db,d1}+C_{gd,d1}+C_{db,r1}+C_{gd,r1}+C_{gd,2}+\frac{1}{2}C_{gs,d2}+C_{db,d4}+C_{gd,d4}+C_{db,r4}+C_{gd,r4}+C_{L}) (3.20)$$

Where  $R_P$  here is the equivalent resistance of the PMOS Mr1, Cdb, Cgd and Cgs are the parasitic capacitances as discussed in section 2.2.1, the capacitive contribution from transistors in PDN (Md1, Md4) and load section (Mr1, Mr4). The gate-to-source capacitance contribution of transistor Md2 is  $\frac{1}{2}C_{gs}$  due the Miller effect [62]. Using [81], we get an expression relating the delay analytically to the V<sub>SWING</sub> and the I<sub>SS</sub>.

$$\begin{aligned} \tau_{\text{PD}} = V_{\text{SWING}} \left( (a) V_{\text{SWING}}^{2} + (b) \frac{V_{\text{SWING}}}{I_{\text{SS}}^{2}} + (c) \frac{1}{I_{\text{SS}}} \right) \end{aligned} \tag{3.21} \\ \text{where } a = \frac{2}{p} \left( \frac{L_{\text{Nmin}}}{2\mu_{n} C_{\text{ox}} \left( \frac{g_{\text{mn}} R_{\text{P}}}{2} \right)^{2} \left( V_{\text{GS}} \cdot V_{\text{TN}} /_{a} \right)^{4}} \right) \left( 2L_{\text{dn}} C_{j} K_{\text{eqn}} + 4C_{j\text{sw}} K_{\text{eqsw}} + 3C_{\text{gd0}} + L_{\text{N}} C_{\text{ox}} \right) \\ b = \frac{2}{p^{2}} \left( 3A_{\text{bulkmax}} C_{\text{ox}} \mu_{\text{p}} W_{\text{Pmin}} \left( V_{\text{DD}} - |V_{\text{TP}}| \right) \right) \\ c = \frac{2}{p} \left( 4L_{\text{dn}} C_{j\text{sw}} K_{\text{eqsw}} + 4L_{\text{dp}} C_{j\text{sw}} K_{\text{eqsw}} + 2W_{\text{Pmin}} \left( L_{\text{dp}} C_{j} K_{\text{eqp}} + 2C_{j\text{swp}} K_{\text{eqswp}} \right) + \\ 4L_{\text{dp}} C_{j\text{swp}} K_{\text{eqswp}} + 2W_{\text{Pmin}} C_{\text{gd0p}} - \frac{3}{2} A_{\text{bulkmax}} C_{\text{ox}}^{2} \mu_{\text{p}} (V_{\text{DD}} - |V_{\text{TP}}|) R_{\text{DSW}} 10^{-6} ) \end{aligned}$$

and the symbols have their usual meanings.

For a given NM and  $A_v$  values and using the static model expressions, the design approach of the proposed architecture-1 based XOR2 gate for a given value of the bias current  $I_{ss}$  is presented. Firstly, the required value of  $V_{SWING}$ ,  $R_P$  is calculated as:

$$V_{\rm SWING} = \frac{2NM}{1 - \frac{1}{A_{\rm v}}}$$
(3.22)

$$R_{\rm P} = \frac{2V_{\rm SWING}}{p.I_{\rm SS}}$$
(3.23)

Thus, the expression for  $I_{HIGH}$  can be written as:

$$I_{\text{HIGH}} = \frac{2 \, V_{\text{SWING}}}{p \, R_{\text{Pmin}}} \tag{3.24}$$

Where R<sub>Pmin</sub> represents the resistance of minimum sized PMOS load transistor (Mr1-Mr4).

The calculated  $I_{HIGH}$  is compared with the required bias current  $I_{SS}$  value. For values of  $I_{SS}>I_{HIGH}$ ,  $R_P$  will be less than  $R_{Pmin}$  and to calculate its value,  $L_P$  is set to minimum  $L_{Pmin}$  and  $W_P$  is calculated using (3.23) and (2.11).

$$W_{P} = \frac{pI_{SS}}{2V_{SWING}} \frac{L_{Pmin}}{\mu_{p}C_{ox}(V_{DD} - |V_{TP}|) \left(1 - R_{DSW} 10^{-6} \mu_{p}C_{ox}(V_{DD} - |V_{TP}|)\right)}$$
(3.25)

Similarly, for values of  $I_{SS} < I_{HIGH}$ ,  $R_P$  will be greater than  $R_{Pmin}$  and to calculate its value,  $W_P$  is set to  $W_{Pmin}$  and  $L_P$  is calculated as per following expression, derived using (3.23) and (2.11) and mathematical simplification.

$$L_{P} = \mu_{p} C_{ox} W_{Pmin} \left( V_{DD} - |V_{TP}| \right) \left[ \frac{2V_{SWING}}{pI_{SS}} - \frac{R_{DSW} 10^{-6}}{W_{Pmin}} \right]$$
(3.26)

After this, the dimensions of transistors in the PDN is derived by substituting  $\frac{g_{mn}R_{P}}{2} = \sqrt{2\mu_{n}C_{ox}\frac{W_{N}}{L_{N}}\frac{1}{I_{ss}}} * \frac{V_{SWING}}{p}$ in the derived equation of  $A_{v}$  in (3.18) for  $I_{SS} > I_{LOW}$ . The width  $W_{N}$  of the PDN transistors is calculated as:

$$W_{N} = p^{2} \left(\frac{A_{v}}{1 - A_{v}}\right)^{2} \frac{L_{Nmin} I_{SS}}{2\mu_{n} C_{ox} V_{SWING}^{2}}$$
(3.27)

where  $L_{Nmin}$  is the minimum length of the NMOS transistor, and all other variables are as previously defined. For the case having  $I_{SS} < I_{LOW}$ , the  $W_N$  for all the PDN transistors is kept at their minimum value,  $W_{Nmin}$ .

### 3.2.2 Simulations

In this section, the behaviour of the proposed architecture-1 based XOR2 gate is verified through simulations and the results are shown in Fig. 3.4 by considering power supply and voltage swing of 1.1 V and 0.4 V respectively. It can be observed that for the cases when input B is at low logic level, the output is same as input A while it is complement of input A otherwise. Thus, the waveforms confirm the correct behaviour.



Fig. 3.4 Simulation waveform of proposed architecture-1 based XOR2 gate

Now, the static and the delay model, set forth in the section 3.2.1 is verified through simulations. This is followed by observation of the behaviour of the proposed architecture-1 XOR2 gate under process variations. A proposed architecture-1 based full adder as an application of the proposed architecture-1 is presented at the end as an application. The proposed XOR2 gate is designed by considering the equations (3.17)-(3.19) for  $V_{DD}$ ,  $\alpha$  of 1.1V and 1.3 respectively, for a wide range of I<sub>SS</sub> (10 µA to 200 µA) and V<sub>SWING</sub> (0.4 V and 0.5 V).

The simulated results for variation of  $A_V$ , NM and  $V_{SWING}$  with respect to bias currents are recorded and are placed in Figs. 3.5a and 3.5b for  $V_{SWING}$  of 0.4V and 0.5V respectively. The

predicted  $V_{SWING}$ ,  $A_V$  and NM values using (3.17)-(3.19) are plotted along with corresponding simulated values in Fig. 3.5. In all plots of Fig. 3.5, percentage error in static parameters is also plotted and maximum error of 17.4% is observed.



Fig. 3.5 Predicted and Simulated results with error versus  $I_{SS}$  for static parameters with  $V_{SWING}$  of a) 0.4V and b) 0.5V

The derived delay expression in (3.21) is validated by designing and performing simulations for  $V_{DD}$ , Av,  $\alpha$  of 1.1V, 19 and 1.3 respectively. The delay is measured for I<sub>SS</sub> ranging from 10 $\mu$ A to 200  $\mu$ A with V<sub>SWING</sub> of 0.4V and 0.5V. The simulated delay and predicted delay values obtained by using the expression (3.21) for load capacitance value of 50fF, 500fF and 1 pF is plotted in Fig. 3.6 for V<sub>SWING</sub> of 0.4V and 0.5V. It is observed that the propagation delay increases with increasing load capacitance. For a given load capacitance, the delay decreases with increasing I<sub>SS</sub>, due to the availability of higher current for charging/discharging of load capacitance. Further, a maximum error of 27% can be observed from the error plot between the predicted and simulated values in Fig. 3.6.



Fig. 3.6 Predicted and Simulated results with error in delay versus I<sub>SS</sub> for a)  $V_{SWING} = 0.4V$ and  $C_L = 50 \text{ fF b}$   $V_{SWING} = 0.4V$  and  $C_L = 500 \text{ fF c}$   $V_{SWING} = 0.4V$  and  $C_L = 1 \text{ pF d}$   $V_{SWING} = 0.5V$ and  $C_L = 50 \text{ fF e}$   $V_{SWING} = 0.5V$  and  $C_L = 500 \text{ fF f}$   $V_{SWING} = 0.5V$  and  $C_L = 1 \text{ pF}$ 

To showcase the advantage of the proposed architecture-1 based XOR2 gate, it is compared with the existing PFSCL FC based XOR2 gate. For this, an optimum value of  $\alpha$ , the threshold voltage reduction factor, is determined. The proposed architecture-1 based XOR2 gate is simulated for  $\alpha$  ranging from 1.1 to 1.9, for a bias current I<sub>SS</sub> of 100µA. For the particular case when input A and input B are high and MTT-2 is deactivated, the ratio of currents through Mc2 and Md3 i.e. I<sub>C2</sub>/I<sub>D3</sub> was measured. The area and current ratio against  $\alpha$  is plotted in Fig. 3.7. It is observed that the area reduces with the lowering threshold voltage of centre transistor. An optimum value of  $\alpha$ =1.7 (I<sub>C2</sub>/I<sub>D3</sub>=6) is chosen as it provides good activation/deactivation.



Fig. 3.7 Area vs  $I_{C2}/I_{D3}$  vs  $\alpha$  for proposed architecture-1 based XOR2 gate

To compare the performance of the proposed architecture-1 with its existing counterpart, simulations are performed for measuring the propagation delay of XOR2 gate by keeping same current ratio ( $I_{C2}/I_{D3} = 6$ ) in both the implementations. The delay of XOR2 is observed to be 405 ps in proposed architecture-1 based gate while it is 407 ps in existing counterpart. The corresponding area for the proposed architecture-1 based XOR2 gate is 1.048µm<sup>2</sup> while for the existing PFSCL FC based XOR2 counterpart is 1.656 µm<sup>2</sup>. Thus, the proposed architecture-1 based XOR2 gate shows an area advantage of 36.7% with no negative impact on the propagation delay.

Further, the effect of parameter variations on voltage swing and delay of proposed architecture-1 based XOR2 gate and existing PFSCL FC based XOR2 gate is studied by performing Monte Carlo analysis for 500 simulation runs. The corresponding variation in voltage swing and delay for both the gates are plotted in Figs. 3.8a and 3.8b respectively. It is seen that voltage swing variations for the proposed and existing PFSCL FC based XOR2 gate are 37.8% and 38.6% respectively and are of the same order. However, the delay shows lesser variation for the proposed architecture-1 based XOR2 gate (29.2%) as compared to the existing PFSCL FC based XOR2 gate (34.7%).



Fig. 3.8 Monte Carlo results for  $V_{SWING}$  and Delay for a) proposed and b) existing PFSCL FC based XOR2 gate

The process corner analysis for both proposed architecture-1 based XOR2 gate and existing PFSCL FC based XOR2 gate is plotted in Fig.3.9. In Fig.3.9a, the delay is plotted for different process corners while in Fig. 3.9b, the variation in the voltage swing with different process corners is plotted. It can be observed that the SS process corner gives the highest delay while the FF process corner leads to the lowest delay values. With respect to voltage swing, the ratio of variation between the highest and lowest voltage swing for both the

proposed architecture-1 and existing PFSCL FC is the same. The proposed architecture-4 functions correctly under various process corners.



Fig. 3.9 Impact of process corners on a) Delay and b) V<sub>SWING</sub>

As an application a full adder was designed and simulated using both the existing and proposed architecture-1 for  $I_{SS}$ =100  $\mu$ A and  $V_{SWING}$ =0.4V, for the same  $I_{Ci}/I_{Dj}$ =6. The gate level schematic for the sum and carry logic is shown in Fig. 3.10a,b. The sum logic as in Fig 3.10a consists of two cascaded XOR2 gates with the proposed architecture-1 XOR2 gate as in

Fig. 3.1. The proposed architecture-1 based realisation of the two input AND (AND2) and OR (OR2) are drawn in Figs. 3.10c and 3.10d respectively. The simulation waveforms for the full adder using proposed architecture-1 is shown in Fig. 3.11 and the performance summary is tabulated in Table 3.2. It is seen that the proposed architecture-1 based design provides an area advantage of 66% while maintaining the same power and delay performance.





Fig. 3.10 Proposed architecture-1 based a) Sum and b) Carry c) AND2 gate and d) OR2 gate



Fig. 3.11 Simulation waveforms for Proposed architecture-1 full adder

| Function | Scheme   | Delay (ns) | Power (µW) | PDP (fJ) | Area (µm <sup>2</sup> ) |
|----------|----------|------------|------------|----------|-------------------------|
| Sum      | Proposed | 0.871      | 220        | 191.6    | 2.088                   |
|          | Existing | 0.870      | 220        | 191.4    | 6.148                   |
| Carry    | Proposed | 2.07       | 550        | 1138.5   | 5.22                    |
|          | Existing | 2.147      | 550        | 1180.85  | 15.37                   |

Table 3.1 Performance summary of the full adder

## 3.3 Proposed architecture-2

In proposed architecture-2, a low threshold voltage NMOS is introduced for generating the constant current used to bias the PFSCL FC based gate [66]. The diagram of the existing PFSCL FC based XOR2 gate is shown in Fig.3.12.



Fig. 3.12 Existing PFSCL FC based XOR2 gate

As was discussed in (2.16), the minimum power supply ( $V_{DDmin}$ ) for any gate based on PFSCL style is given by  $2V_B-V_{Tcurr}$ , where  $V_B$  and  $V_{Tcurr}$  represent the bias voltage and threshold voltage for current source respectively. Therefore, it is apparent that one of the ways to reduce the power dissipation is to reduce the  $V_{Tcurr}$  as it leads to direct reduction in  $V_B$  and thus, to the power supply.

In the proposed architecture-2, low threshold voltage transistors Ms1-Ms2 are introduced in the existing PFSCL FC architecture as shown in Fig. 3.13. This implies that for a given bias current, the required  $V_B$  also reduces, leading to corresponding reduction in the power supply.



Fig. 3.13 Proposed architecture-2 based XOR2 gate

The working of the proposed architecture-2 based XOR2 gate, which is given in Fig. 3.13, is same as that of PFSCL FC based architectures, discussed in chapter 2, as the structure of the proposed architecture -2 remains the same. Further, the ratio of dimensions of central branch to side branch is kept as N, with the value of N generally taken from 5-20, for proper activation/deactivation of the constituent triple-tail cells. However, for understanding, it is explained again. The proposed architecture-2, consisting of two triple-tail cells TT-1 (Md1-Mc1-Md2) and TT-2 (Md3-Mc2-Md4) is biased by a constant current source of I<sub>SS</sub>/2, which here is generated by a low threshold voltage transistor Ms1, Ms2. The central branches, Mc1-Mc2, are directly connected to the power supply and whichever one of them is ON at any instant, will not contribute towards the output or in other words is deactivated. Thus, when Mc1 or Mc2 is OFF, the TT-1 or TT-2 is active and the active triple-tail contributes to the output based on the input A. For example, when input B is high, from Fig. 3.13 it is observed that the TT-1 is active and depending upon whether input A is high or low, the output Q is either low or high, which is the expected output for XOR2 gate. Similarly, for the case where input B is low, TT-2 is active and depending upon whether input A is high or low, the output Q is either high or low, which is the expected output for XOR2 gate.

### 3.3.1 Analysis

The behaviour of the proposed architecture-2 XOR2 gate is now analysed in terms of static parameters like voltage swing  $V_{SWING}$ , small signal voltage gain,  $A_V$  and the noise margin, NM and the delay. Looking at Fig. 3.13 showing the proposed architecture-2 XOR2 gate, it is observed that the PDN remains the same as in existing PFSCL FC architecture. Thus,  $V_{SWING}$ ,  $A_V$  and NM can be expressed as in (2.20)-(2.22).

The propagation delay  $\tau_{PD}$  can be expressed as in (3.20), which is reproduced below.

$$\tau_{\rm PD} = R_{\rm P} C_{\rm out} \tag{3.28}$$

Here the overall capacitance  $C_{out}$  at the output node remains the same as in (3.20) as it is determined by considering the capacitive effects associated with the transistors and the external load capacitance CL and the structure of the PDN remains the same. However, the  $R_P$  in (3.31) is different for proposed architecture-2. Analysing the PMOS equivalent resistance  $R_P$ , we observe that it can be expressed as in (3.32) as a first order approximation. This expression shows the dependence of  $R_P$  on the power supply voltage and the dimensions of the transistor.

$$R_{\rm P} = \frac{L_{\rm P}}{\mu_{\rm p} C_{\rm ox} (V_{\rm DD} - |V_{\rm TP}|) W_{\rm P}}$$
(3.29)

Thus, from (3.32), we see that due to the dependence of the PMOS based resistor on the power supply voltage,  $V_{DDmin}$ , the dimensions of the PMOS resistance in the proposed architecture-2 based XOR2 gate varies from conventional PFSCL and the existing PFSCL FC based gates for a given voltage swing and bias current. For a gate operated at the minimum power supply voltage, since the  $V_{DDmin}$  has reduced as per (2.16), the dimension  $W_P$  has to be increased for the same voltage swing, as compared to conventional PFSCL and the existing PFSCL FC gates.

Since we know that the delay is proportional to the parasitic capacitances at the output node, as in (3.31), the increase in  $W_P$  for the proposed architecture-2 XOR2 gate corresponds to a slight increase in the delay compared to existing PFSCL FC based gates. However, with the reduction in power consumption due to lower supply voltage and the corresponding reduction in PDP, the proposed architecture-2 gate shows power efficient behaviour compared to conventional PFSCL and the existing PFSCL FC gates.

## **3.3.2** Simulations

The proposed architecture-2 XOR2 gate was simulated through SPICE simulations. In the proposed architecture-2 XOR2 gate, the threshold voltage of the current source NMOS was set to 0.3V while the threshold voltages of other NMOS were set to their typical value according to the technology node. The XOR2 gate as shown in Fig.3.13 was designed for voltage swing of 400mV.

For comparison, conventional PFSCL and the existing PFSCL FC based XOR2 gates were also simulated for the same simulation conditions. Each gate was simulated for wide ranging value of bias current from 20 $\mu$ A to 100  $\mu$ A at the minimum required power supply voltage, according to (2.16), which is given in Table 3.2. The V<sub>DD</sub> for proposed architecture-2 based XOR2 gate comes out to 0.9V as per (2.16).

| S.No. | Architectures                   | V <sub>DDmin</sub>             |      |  |  |
|-------|---------------------------------|--------------------------------|------|--|--|
| 1.    | Proposed architecture-2         | $2V_{B}-V_{Tcurr}=2X0.6V-0.3V$ | 0.9V |  |  |
| 2.    | Existing PFSCL FC               | $2V_{B}-V_{Tcurr}=2X0.8V-0.5V$ | 1.1V |  |  |
| 3.    | PFSCL conventional architecture | $2V_{B}-V_{Tcurr}=2X0.8V-0.5V$ | 1.1V |  |  |

Table 3.2 Power Supply voltage V<sub>DDmin</sub>

The propagation delay, the power dissipation and the power delay product were measured for the conventional PFSCL (given in Fig. 2.5c), existing PFSCL FC and the proposed architecture-2 XOR2 gates. The simulation results for delay is shown in Fig. 3.14 where the notation PFSCL FC is used to refer to existing PFSCL FC. From Fig. 3.14, it is seen that the conventional PFSCL gate shows much higher delay as compared to existing PFSCL FC gate and the proposed gate respectively. It is also noted that the delay of the proposed architecture-2 XOR2 gate is slightly higher that the delay of PFSCL FC XOR2 gate, in line with the analysis in section 3.3.1.



Fig. 3.14 Propagation delay versus bias current for XOR2 gate

With respect to power dissipation, it is observed from Fig. 3.15 that the power dissipation of the proposed architecture-2 XOR2 gate is lower than the existing PFSCL FC based gates, shown as PFSCL FC in the figure, due to the reduction in the minimum required supply voltage. The same factor also leads to an improvement in the PDP for the proposed architecture-2 XOR2 gate, which is shown in Fig. 3.16.



Fig. 3.15 Power consumption versus bias current for XOR2 gate



Fig. 3.16 PDP versus bias current for XOR2 gate

The proposed PFSCL architecture-2 based XOR2 gate shows a maximum improvement in the power consumption and the PDP of 18.18% and 8.05% with respect to existing PFSCL FC based XOR2 gate. It also shows a maximum improvement in the power dissipation and the PDP of 72% and 95% with respect to conventional PFSCL based XOR2 gate.

To study the behaviour of the architecture statistically under process variations, Monte Carlo simulation of 500 runs was carried out, with respect to delay and voltage swing. The results of Monte Carlo simulation on delay and volatge swing is shown in Fig. 3.17.



Fig. 3.17 Monte Carlo analysis on Delay and Voltage swing of XOR2 gate based on a) proposed architecture-2 b) PFSCL FC c) conventional PFSCL

64

The results as in Fig. 3.17 bring out the fact that the reponse of proposed architecture-2 is similar to existing PFSCL FC under the effect of process variations, This implies that the proposed architecture-2, which leads to a reduction in the power supply does not otherwise impact its working.

To check the impact of process corners on the behaviour of the proposed architecture-2 XOR2 gate, the variation in voltage swing and delay were measured for the different process corners and the results are plotted in Fig. 3.18. The corresponding impact on XOR2 gate based on existing PFSCL FC and conventional PFSCL have also been plotted.



(a)



<sup>(</sup>b)

Fig. 3.18 Effect of process corners on XOR2 gate a) Delay b) Voltage Swing

From the Fig. 3.18a, we see that the impact of process corner on delay is as expected, with the FF case showing the least delay. From the Fig. 3.18b, we observe that the voltage swing shows the maximum variations for the SF and FS cases which duly correspond to the case where the driving power of the NMOS of reduced along with the reduction in the PMOS equivalent resistance and vice versa. However, the architectures function correctly even in the presence of process corners.

#### 3.4 Summary

In this chapter, the use of low threshold voltage transistor in the existing PFSCL FC based gate is explored. It is introduced in central branch transistor to reduce the footprint, and in current source transistor to lower minimum power supply. In the proposed architecture-1, low threshold voltage transistor is used in the central branch and its behaviour is analysed on the basis of static model and the delay model. Through simulations, it is observed that the proposed architecture-1 based full adder provides an area advantage of 66% while maintaining the same power and delay performance. In proposed architecture-2, a low threshold voltage transistor is introduced in the constant current source of the existing PFSCL FC architecture in order to reduce the power dissipation. This premise is proven through the simulation results where proposed architecture-2 based XOR2 gate is compared to existing PFSCL FC based XOR2 gate and conventional PFSCL XOR2 gate. From the simulations results, it is clear that use of the proposed architecture-2 leads to significant power saving of 18.18% and PDP of 8% with respect to PFSCL FC based XOR2 gate. The reduction in power dissipation is a direct consequence of the reduction in the minimum power supply voltage that is needed due to the use of low threshold voltage transistor for generating the bias current. For both proposed architecture-1 and proposed architecture-2, effect of process

variations is also studied through Monte Carlo analysis and process corner analysis and the simulation results show that both the proposed architecures function correctly.

Chapter 4 Modified PFSCL architecture for higher fan-in

## 4.1 Introduction

The conventional PFSCL gate is a NOR/OR based generic gate which infers multistage implementation for sum of minterms expressions that requires AND-OR implementation, a fact which culminates into higher delay and power consumption. An alternate architecture to realize AND-OR function is through fundamental cell based approach. An overall improvement in performance is achieved; however, it allows only two input logic function realisation in a single stage. This chapter focuses on the issue of increasing the fan-in of a gate based on PFSCL style. A new architecture capable of implementing AND-OR functions with higher fan is presented, which extends the fundamental cell so that it can accept higher number of inputs.

The existing PFSCL fundamental cell architecture (PFSCL FC) is modified to accept three inputs by proposing a Quadtail cell. This would lead to a corresponding reduction in number of stages needed to implement any logic function. The concept, its analysis and simulations are presented further.

# 4.2 Proposed Architecture-3

This architecture is an extension of the PFSCL FC [66] which consists of two triple-tail cells. Essentially, the triple-tail cell is modified by adding a central branch to increase the number of logic inputs that can be handled by a single cell. Basic structure of the proposed architecture, called the Quadtail cell, is shown in Fig. 4.1.



Fig. 4.1 Quadtail cell- Basic structure of the proposed architecture-3

The Quadtail cell has two central branches and two side branches. The two central branches are connected directly to the power supply and the two side branches are connected to PMOS based resistances for converting the bias current  $I_{BIAS}$  flowing through that branch into the equivalent output voltage. Since these gates are single ended, the output is taken from either the drain of Md1 or Mf. The cell is inactive if the input logic levels B and C both are high or one of them is high as majority of bias current will flow through central branch(s) irrespective of whether A is at logic high/low level. To ensure this, the dimensions of Mc1-Mc2 are kept at N times the dimensions of Md1-Md2. The cell is only active for the case where both B and C are logic low level, leading to bias current flowing through either Md1 or Mf depending on whether A is high or low.

It is worth mentioning here that the Quadtail cell of Fig. 4.1 gives appropriate output only if both the inputs in central branch are at logic low. Therefore, three more such cells need to be connected in a similar way as discussed in Chapter 2.

To illustrate the point further Quadtail based three input XOR (XOR3) gate implementation is given in Fig. 4.2a. The Quadtail cells QTT-1, QTT-2, QTT-3 and QTT-4 respectively implement minterms  $\overline{A}$  BC,  $\overline{ABC}$ ,  $\overline{ABC}$  and  $\overline{ABC}$ . Thus, the proposed architecture-3 based XOR3 gate can be expressed in terms of the minterms generated by each of the four QTT as follows.

$$XOR3 = \overline{A} BC + A\overline{B}C + A\overline{B}\overline{C} + \overline{A}\overline{B}\overline{C}$$
(4.1)

Considering total current drawn by a XOR3 gate as  $I_{SS}$ , each Quadtail cell is biased by bias current  $I_{BIAS}$  equal to  $I_{SS}/4$ . For comparison, the XOR3 gate based on conventional PFSCL and existing PFSCL FC is shown in Fig. 4.2b and Fig. 4.2c respectively, from which it is observed that while the proposed architecture-3 based XOR3 is a single gate implementation, the conventional PFSCL XOR3 is a four stage implementation with seven gates and the existing PFSCL FC based XOR3 is a two stage implementation with two gates. Thus, the proposed architecture-3 based XOR3, which is a single gate implementation, leads to performance improvement.



(a)







Fig. 4.2 XOR3 based on a) proposed architecture-3 b) conventional PFSCL c) existing PFSCL FC

The schematic of a generic gate based on proposed architecture-3 is shown in Fig. 4.3. The PDN of the proposed architecture-3 generic gate has four quad-tail cells QTT-1 to QTT-4 with the generic inputs X1, X2, X3, X4 and M1, M2 for central branch. Here, the output voltage is generated by combining any one of the two output nodes of the Quadtail cells i.e. (QTT-1: either Q1 or Q2; QTT-2 either Q3 or Q4 etc.). The generic gate may generate any

three input logic function by appropriately choosing inputs. Table 4.1 enlists the input combination for carry output of full adder and 4:1 MUX (MUX4).



Fig. 4.3 Proposed architecture-3 generic gate

Table 4.1 Realisation of different 3 input logic functions based on proposed architecture-3

| Logic    | Actual                | Mapping with the inputs |    |    |     |    |    | Output nodes |
|----------|-----------------------|-------------------------|----|----|-----|----|----|--------------|
| Function | Inputs                | X1                      | X2 | X3 | X4  | M1 | M2 |              |
| Carry    | A, B,C                | <b>'</b> 0 <b>'</b>     | Ā  | Ā  | '1' | В  | С  | Q1,Q3,Q5,Q7  |
| MUX4     | I0,I1, I2,<br>I3, B,C | IO                      | I1 | I2 | I3  | В  | С  | Q1,Q3,Q5,Q7  |
| SUM      | A,B,C                 | Ā                       | А  | А  | Ā   | В  | С  | Q1,Q3,Q5,Q7  |

## 4.2.1 Analysis

To analyze the proposed architecture-3, the static model and the delay model are derived on the basis of the working of the proposed architecture-3 based XOR3 gate and are discussed in detail further.

The static behaviour is modelled in terms of three parameters namely the voltage swing  $V_{SWING}$ , the small signal voltage gain,  $A_V$ , and the noise margin, NM. Based on the insight into the working of the proposed architecture-3 XOR3 gate as given in Fig.4.2, currents in

QTTs of the proposed architecture-3 XOR3 gate are derived, using which the static parameters namely the  $V_{SWING}$ , the  $A_V$ , and the NM are derived.

Based on the working of the Quadtail cell mentioned previously, only one of the four Quadtail cells QTT1-QTT4 will be active at any time and others will drain  $I_{SS}/4$  directly from the power supply. Each Quadtail cell would be either inactive or active. In case it is active, the central branch will be OFF and the bias current of  $I_{SS}/4$  will be drained from either of the side branches, as per the logic input. In case of inactive cell, one of the side branches and the central branches will be ON. Using our design assumption that the dimension of the central branches are N times the dimension of the side branch, the following expressions for the different scenarios of currents flowing through the central branches are described below.

Case 1: Considering the situation when both the central branches of a Quadtail cell i.e. both B and C are logic high. Then both the central branches of a QTT-4 will be ON along with one of the side branches depending upon logic level of input A. For A at logic input level high, we may write

$$I_{Mc7} + I_{Mc8} + I_{Md7} = \frac{I_{SS}}{4}; \quad I_{Md7} = 0$$
(4.2)

As central branch transistors are N times wider than side branch transistors, they will be carrying N times current of side branch transistors, so

$$I_{Mc7} = I_{Mc8} = NI_{Md7}$$
(4.3)

Solving further, the current through central and side branches are obtained as

$$I_{Mc7} = I_{Mc8} = \frac{NI_{SS}}{4(2N+1)}; \quad I_{Md8} = \frac{I_{SS}}{4(2N+1)}; \quad I_{Md7} = 0$$
 (4.4a)

If input A assumes logic low, then (4.3a) may be rewritten as

$$I_{Mc7} = I_{Mc8} = \frac{NI_{SS}}{4(2N+1)}; \quad I_{Md7} = \frac{I_{SS}}{4(2N+1)}; \quad I_{Md8} = 0$$
 (4.4b)

Case 2: Considering one of the central branches of a Quadtail cell is ON i.e. either B or C is logic high.

Specifically, let us take the case where B is logic high and C is logic low. Then for Quadtail cell QTT-4, for A at logic input level high, we may write

$$I_{Mc7} + I_{Md8} = \frac{I_{SS}}{4}$$
;  $I_{Md7} = 0$ ;  $I_{Mc8} = 0$  (4.5)

As central branch transistors are N times wider than side branch transistors, they will be carrying N times current of side branch transistors, so

$$I_{Mc8} = NI_{Md7} \tag{4.6}$$

Solving further, the current through central and side branches are obtained as

$$I_{Mc7} = \frac{NI_{SS}}{4(N+1)}; \quad I_{Md8} = \frac{I_{SS}}{4(N+1)}; \quad I_{Md7} = 0; \quad I_{Mc8} = 0$$
 (4.7a)

If input A assumes logic low, then (4.6a) may be rewritten as

$$I_{Mc7} = \frac{NI_{SS}}{4(N+1)}; \quad I_{Md7} = \frac{I_{SS}}{4(N+1)}; \quad I_{Md8} = 0; \quad I_{Mc8} = 0$$
(4.7b)

Case 3: Considering the situation where both of the central branches of a Quadtail cell are OFF. Specifically, let us take the case where B and C are logic low. Then for Quadtail cell QTT-4, for A at logic input level high, we may write

$$I_{Md8} + I_{Mc7} + I_{Mc8} + I_{Md8} = \frac{I_{SS}}{4}; \quad I_{Mc7} = I_{Mc8} = I_{Md7} = 0;$$
 (4.8a)

$$I_{Md8} = \frac{I_{SS}}{4}$$
(4.8b)

76

If input A assumes logic low, then (4.8a) and (4.8b) may respectively be rewritten as

$$I_{Md8} + I_{Mc7} + I_{Mc8} + I_{Md8} = \frac{I_{SS}}{4}; \quad I_{Mc7} = I_{Mc8} = I_{Md8} = 0;$$
 (4.9a)

$$I_{Md7} = \frac{I_{SS}}{4}$$
(4.9b)

The following Table 4.2 shows the active and inactive cells and current through the different branches using (4.2)-(4.9).

| S.No. | A | B | C | Inactive  | Active | Output* | Current through branches connected to output node XOR3 <sup>+</sup> |  |
|-------|---|---|---|-----------|--------|---------|---------------------------------------------------------------------|--|
| 1.    | L | L | L | QTT-1,2,3 | QTT-4  | L       | Id1= $I_a$ , Id3=Id5=0, Id7= $I_b$                                  |  |
| 2.    | L | L | Н | QTT-1,3,4 | QTT-2  | Н       | $Id1=I_c$ , $Id3=Id5=0$ , $Id7=I_c$                                 |  |
| 3.    | L | Η | L | QTT-1,2,4 | QTT-3  | Н       | $Id1=I_c$ , $Id3=Id5=0$ , $Id7=I_c$                                 |  |
| 4.    | L | Η | Н | QTT-2,3,4 | QTT-1  | L       | $Id1=I_b$ , $Id3=Id5=0$ , $Id7=I_a$                                 |  |
| 5.    | Н | L | L | QTT-1,2,3 | QTT-4  | Н       | Id1=0, Id3= Id5= $I_c$ , Id7=0                                      |  |
| 6.    | Н | L | Н | QTT-1,3,4 | QTT-2  | L       | Id1=0, Id3=I <sub>b</sub> , Id5=I <sub>a</sub> , Id7=0              |  |
| 7.    | Н | Η | L | QTT-1,2,4 | QTT-3  | L       | Id1=0, Id3=I <sub>a</sub> , Id5=I <sub>b</sub> , Id7 =0             |  |
| 8.    | Н | Η | Н | QTT-2,3,4 | QTT-1  | Н       | Id1=0, Id3= Id5= $I_c$ , Id7=0                                      |  |

Table 4.2 Working of proposed architecture-3 based XOR3 gate

\* where L=Low; H=High;

$$^{+}$$
 I<sub>a</sub> =  $\frac{I_{SS}}{4(2N+1)}$ ; I<sub>b</sub> =  $\frac{I_{SS}}{4}$ ; I<sub>c</sub> =  $\frac{I_{SS}}{4(N+1)}$ 

with N =  $\frac{\text{Width of central branch transistor}}{\text{Width of side branch transistor}}$ 

From the Table 4.2, the output high voltage  $V_{OH}$ , for the case where the logic inputs A, B and C are low, low and high respectively, is given by (4.10). The same expression is obtained for other combinations of the input that lead to output logic high level.

$$V_{OH} = V_{DD} - R_P \left[ \frac{I_{SS}}{4(N+1)} + \frac{I_{SS}}{4(N+1)} \right] = V_{DD} - R_P \frac{I_{SS}}{2(N+1)}$$
(4.10)

Further, the output low voltage  $V_{OL}$ , for the case where the logic inputs A, B and C are Low, High and High respectively, is given by (4.11). The  $V_{OL}$  is the same for other input combinations that lead to output low voltage.

$$V_{OL} = V_{DD} - R_P \left[ \frac{I_{SS}}{4} + \frac{I_{SS}}{4(2N+1)} \right] = V_{DD} - R_P \frac{I_{SS}(N+1)}{2(2N+1)}$$
(4.11)

From (4.10)-(4.11), the voltage swing, V<sub>SWING</sub> can be calculated as follows.

$$V_{\text{SWING}} = \frac{R_{\text{P}}I_{\text{SS}}}{2} \left[ \frac{N^2}{(2N+1)(N+1)} \right]$$
(4.12)

Further, analyzing the working of the proposed architecture-3, it is observed that at any instant only one of the QTT is on, meaning that the central branch is OFF and structure being similar to PFSCL. The  $A_v$  and NM, for the proposed architecture-3 based XOR2 gate, is calculated as per [37] and is given in (4.13)-(4.14).

$$A_{\rm V} = \frac{g_{\rm m} \frac{R_{\rm P}}{2}}{1 - g_{\rm m} \frac{R_{\rm P}}{2}}$$
(4.13)

where  $g_{mn} = \sqrt{\mu_n C_{ox} \frac{W_N}{L_N} \frac{I_{SS}}{4}}$ , is the transconductance of the Quadtail cell.

$$NM = \frac{V_{SWING}}{2} \left(1 - \frac{1}{A_v}\right)$$
(4.14)

Now, the propagation delay of the proposed architecture-3 is derived on the basis of the behaviour of the proposed architecture-3 based XOR3 gate. The propagation delay depends on the contribution of parasitic MOS capacitances at the output node and the load capacitance. The parasitic capacitance for the proposed architecture-3 based XOR3 gate is

calculated by considering all the inputs A, B, C as logic high, such that QTT-1 is activated and all other QTT2-4 are deactivated. Now, for a high–to-low transition on input A, total capacitance at the output node,  $C_{out}$ , is depicted in Fig. 4.4.



Fig. 4.4 Linear half circuit of proposed architecture-3

As per [37], the propagation delay,  $\tau_{PD}$  can be expressed as in (4.15).

$$\tau_{\rm PD} = R_{\rm P} C_{\rm out} \tag{4.15a}$$

$$= R_{P}(C_{db,d1}+C_{db,d3}+C_{db,d5}+C_{db,d7}+C_{gd,d1}+C_{gd,d2}+C_{gd,d3}+C_{gd,d4}+C_{gd,d5}+C_{gd,d6}+C_{gd,d7}+C_{gd,d8}+C_{dbr1}+C_{gdr1}+C_{gs,d2}+C_{gs,d4}+C_{gs,d8}+C_{L})$$
(4.15b)

Where  $R_P$  here is the equivalent resistance of the PMOS Mr1,  $C_{out}$  is expressed in terms of the parasitic capacitances associated with the transistors and the external load capacitance  $C_L$ .

The parasitic capacitances are  $C_{db}$ ,  $C_{gd}$  and  $C_{gs}$  as given in section 2.2.1 of the transistors in PDN (Md1, Md8) and load section (Mr1, Mr2). The gate-to-source capacitance contribution of transistor Md2-Md8 is  $\frac{1}{2}C_{gs}$  due the Miller effect [61], leading to the factor of  $2C_{gs2}$ . Using [78] and the fact that the dimensions of all the transistors in the side branches are the same, we get an expression relating the delay analytically to the V<sub>SWING</sub> and the I<sub>SS</sub>.

$$\tau_{\rm PD} = \frac{1}{V_{\rm SWING}} (q1) + \frac{V_{\rm SWING}}{I_{\rm SS}} (q2) + (q3)$$
(4.16)

Where

$$q_{1} = \left(\frac{N^{2}}{(2N+1)(N+1)}\right)^{2} \left( \frac{\left(4\left(\frac{A_{V}}{1+A_{V}}\right)^{2}\frac{4L_{n}}{\mu_{n}C_{ox}}\right)\left(L_{dn}C_{j}K_{eqn} + 2C_{jsw}K_{eqsw}\right) + \left(10\left(\frac{A_{V}}{1+A_{V}}\right)^{2}\frac{4L_{n}}{\mu_{n}C_{ox}}\right)C_{gd0} + \left(\frac{8}{3}\left(\frac{A_{V}}{1+A_{V}}\right)^{2}\frac{4L_{n}}{\mu_{n}C_{ox}}\right)L_{n}C_{ox}\right)$$

 $q2 \!\!=\!\! 8L_{dn}C_{jsw}K_{eqsw} \!\!+\! 2L_{dp}C_{jswp}K_{eqswp} \!\!+\! C_L$ 

$$q3 = \left(\frac{N^{2}}{(2N+1)(N+1)}\right) \left(\frac{L_{P}}{\mu_{n}C_{ox}(V_{DD}-V_{TP})}\right) \left(L_{dp}C_{jp}K_{eqp} + 2C_{jswp}K_{eqswp} + C_{gd0p} + \frac{3}{4}A_{bulkmax}L_{pmin}C_{ox}\right)$$

For a given NM and  $A_v$  values and using the static model expressions, the design approach of the proposed architecture-3 based gate for a given value of the bias current  $I_{ss}$  is presented. Firstly, the required value of  $V_{SWING}$ ,  $R_P$  is calculated as:

$$V_{\rm SWING} = \frac{2NM}{1 - \frac{1}{A_{\rm V}}}$$
(4.17)

$$R_{P} = \frac{2V_{SWING}}{I_{SS}} \left[ \frac{(2N+1)(N+1)}{N^{2}} \right]$$
(4.18)

Thus, the expression for  $I_{HIGH}$  can be written as:

$$I_{\text{HIGH}} = \frac{2 \, V_{\text{SWING}}}{R_{\text{Pmin}}} \left[ \frac{(2N+1)(N+1)}{N^2} \right]$$
(4.19)

80

Where R<sub>Pmin</sub> represents the resistance of minimum sized PMOS load transistor (Mr1-Mr2).

The calculated  $I_{HIGH}$  is compared with the required bias current  $I_{SS}$  value. For values of  $I_{SS}>I_{HIGH}$ ,  $R_P$  will be less than  $R_{Pmin}$  and to calculate its value,  $L_P$  is set to minimum  $L_{Pmin}$  and  $W_P$  is calculated using (4.18) and (2.11).

$$W_{P} = \left[\frac{N^{2}}{(2N+1)(N+1)}\right] \frac{I_{SS}}{2V_{SWING}} \frac{L_{Pmin}}{\mu_{p}C_{ox}(V_{DD}-|V_{TP}|)\left(1-R_{DSW}10^{-6}\mu_{p}C_{ox}(V_{DD}-|V_{TP}|)\right)}$$
(4.20)

Similarly, for values of  $I_{SS} < I_{HIGH}$ ,  $R_P$  will be greater than  $R_{Pmin}$  and to calculate its value,  $W_P$  is set to  $W_{Pmin}$  and  $L_P$  is calculated as per following expression, derived using (4.18) and (2.11) and mathematical simplification.

$$L_{P} = \mu_{p} C_{ox} W_{Pmin} \left( V_{DD} - |V_{TP}| \right) \left[ \frac{((2N+1)(N+1))2V_{SWING}}{N^{2} I_{SS}} - \frac{R_{DSW} 10^{-6}}{W_{Pmin}} \right]$$
(4.21)

From (4.21), it is observed that the  $W_P$  is less than the value of  $W_P$  in case of conventional PFSCL and it is due to the bias current of  $I_{SS}/4$  used in each QTT cell.

Further, the dimension of transistors in the PDN is derived using (4.13)-(4.14), as follows.

$$W_{N} = \frac{N^{4}}{[(2N+1)(N+1)]^{2}} \left(\frac{A_{V}}{(1+A_{V})}\right)^{2} \frac{4I_{SS}L_{N}}{\mu_{n}C_{ox}V_{SWING}^{2}}$$
(4.22)

Looking at the expression of  $R_P$  in (4.18), it is higher for proposed architecture-3 compared to [37] and [66].

Using the expression for transistor dimensions and a given voltage swing, the circuits are designed for various logic functions. For a logic function like XOR3, which is implemented in a single stage using proposed architecture-3, the delay for [66] and [37] is higher due to the multistage implementation.

### 4.2.2 Simulations

Firstly, the behaviour of the proposed architecture-3 based XOR3 gate is verified through simulations and the results are shown in Fig.4.5 confirming the correct behaviour. The simulations are carried out considering a power supply and voltage swing of 1.1 V and 0.4 V respectively.



Fig. 4.5 Simulation waveforms of the proposed architecture-3 XOR3 gate

Next, to compare the behaviour of the proposed architecture-3 with the existing architectures i.e. PFSCL FC [66] and conventional PFSCL [37], a XOR3 gate is designed and simulated. For purpose of comparison, simulation conditions were kept uniform with supply voltage  $V_{DD}$  of 1.1V, voltage swing  $V_{SWING}$  of 400mV and N=15 and a load capacitance C<sub>L</sub> of 50fF.

To study the behaviour of delay with respect to the bias current, the bias current was varied from  $20\mu$ A to  $100\mu$ A and the corresponding power dissipation and PDP were also noted. The results are shown in Fig. 4.6.



Fig. 4.6 Performance comparison versus  $I_{SS}$  a) Delay b) Power dissipation c) PDP of XOR3

gate

From the results shown in Fig. 4.6, it is seen that the delay for all the three architectures are very similar, however, the power dissipation is the least for the proposed architecture-3 leading to the minimum PDP while the conventional PFSCL NOR-based architecture shows the maximum power dissipation and hence maximum PDP. Thus the proposed architecture-3 is power efficient. For  $I_{SS}$  of 100µA, the proposed architecture-3 with respect to [37] shows a maximum reduction in delay, power dissipation and PDP of 6.7%, 85% and 87% respectively.

Next, to verify the behaviour of the proposed architecture-3 XOR3 gate under effect of process variations, Monte Carlo analysis was also carried out on XOR3 based on proposed architecture-3 followed by simulations under various process corners.

The results of the Monte Carlo analysis, carried out for 500 runs, are shown in Fig. 4.7. The results for the existing PFSCL FC [66] and conventional PFSCL [37] are also presented for comparison. From the results, it is seen that with respect to delay, all three architectures show similar order of variation; however, the voltage swing shows maximum variation for conventional PFSCL NOR based architecture while the proposed architecture-3 shows least variation.



(a)



(c)

Fig. 4.7 Impact of Monte Carlo for 500 runs a) Proposed architecture-3 b) PFSCL FC c) conventional PFSCL

The simulation results for the process corners are shown in Fig. 4.8. It is seen that proposed architecture-3 based XOR3 gate shows the least variation.





Fig. 4.8 Impact of process corners on the three architectures a) Delay and b) V<sub>SWING</sub>

To illustrate the usefulness of the proposed architecture-3, a full adder is implemented in all three architectures. Since the sum functionality is implemented by the XOR3 function which is already shown, the carry circuit has also been designed and simulated. The carry, expressed as in (4.24) can be implemented using proposed architecture-3, is shown in Fig.4.9a. The carry circuit implemented based on conventional PFSCL [37] and the existing PFSCL FC [66] is shown in Fig. 4.9b and Fig. 4.9c respectively.





Fig. 4.9 Carry a) Proposed architecture-3 b) Conventional PFSCL c) PFSCL FC

These gates are simulated under earlier simulation conditions and the performance parameters for  $I_{SS}$  of 100µA are summarised in Table 4.3.

| Paramater                 | Conventional | PFSCL FC | Proposed       |  |
|---------------------------|--------------|----------|----------------|--|
|                           | PFSCL        |          | architecture-3 |  |
| Sum                       | 1            | -1       |                |  |
| Delay (ns)                | 0.84         | 0.82     | 0.78           |  |
| Power Dissipation (µW)    | 770.00       | 220.00   | 110.00         |  |
| PDP (10 <sup>-13</sup> J) | 646.00       | 180.00   | 86             |  |
| Carry                     |              |          |                |  |
| Delay (ns)                | 1.45         | 0.92     | 0.81           |  |
| Power Dissipation (µW)    | 550.00       | 550.00   | 110.00         |  |
| PDP (10 <sup>-13</sup> J) | 798.00       | 506.55   | 88.55          |  |

Table 4.3 Summary of results for a Full Adder

From the results in Table 4.3, it is observed that proposed architecture-3 has the best performance in terms of lowest delay and minimum PDP.

# 4.3 Conclusion

In this chapter, architecture is proposed that supports higher fan-in than the existing PFSCL FC [66] leading to reduced requirement for number of gates for implementation of complex logic function and to lower delay and power dissipation. The structure and the working of a XOR3 gate are presented followed by analysis of its voltage swing, gain and delay. Simulations are carried out for full adder and through comparison with existing PFSCL architectures, it is observed that proposed architecture-3 based gates leads to maximum reduction with respect to conventional PFSCL in delay, power dissipation and PDP of 6.7%, 85% and 87% respectively. Hence, the use of proposed architecture-3 can lead to efficient PFSCL circuit design.

# Chapter 5 Modified Dynamic PFSCL architectures

### 5.1 Introduction

The analysis and design of static PFSCL combinational gates is the theme of previous chapters. The emphasis is placed on improving area/power consumption of PFSCL FC [66] and presenting a new cell for optimizing performance of complex logic gates. This approach, however, requires constant current source(s) for proper operation and infers static power consumption.

The reduction of power consumption remains mainstay in the designs operating with limited battery resources and D-PFSCL circuits discussed in section 2.3 fill this space. These designs are primarily based on precharge – evaluate logic and employ dynamic current source to mitigate the static power consumption and also confer a speed advantage. Existing D-PFSCL gates have limitation of implementation of logic functions in only NOR/OR forms. But in practice, most of the functions are expressed as sum of minterms expressions needing AND-OR implementation. Therefore, in such case, the existing architecture infers high gate count as well as cascading of multiple gates which degrades the performance in terms of both power and delay. Two modified D-PFSCL architectures are proposed in this chapter to overcome the drawback in the existing D-PFSCL architecture. The first architecture namely proposed architecture-4 enable implementation of any 2-input logic in a single stage by modifying the PDN of existing D-PFSCL gate. The latter introduces transmission gates in the design for embedding AND-OR functionality in existing D-PFSCL and is termed as proposed architecture-5. The functionality of the proposals is verified through simulations.

## 5.2 Proposed architecture-4

The concept behind proposed architecture-4 is to generate logical AND term by modifying the existing D-PFSCL gate. This is done by inserting a low threshold voltage NMOS transistor (Mc1) between the source-coupled transistor pair (Md1, Mf), from the power supply to common source node X, called the modified triple-tail (MTT) as shown in Fig. 5.1. For clear distinction, the low threshold voltage transistor Mc1 is marked with bold line while the usual MOS symbol is used for remaining transistors. The threshold voltage of the middle transistor Mc1 is maintained lower than other transistors in the PDN by a factor  $\alpha$ . The PMOS transistors Mr1-Mr2 act as precharge transistors and output Q here is taken from the drain of Mr2. The effect of introducing Mc2 as the central branch in existing D-PFSCL is analysed by connecting inputs I0 and SEL to Md1 and Mc1 respectively and generating the output from node Q. In the precharge phase, i.e. clock signal is at low logic level (CLK=0), the output node Q is charged to V<sub>DD</sub> and C<sub>1</sub> is discharged to ground potential in the same way as in existing D-PFSCL gate. The potential of node X depends on the SEL input during this phase. It becomes (V<sub>X</sub>=V<sub>DD</sub>- V<sub>TN</sub>/ $\alpha$ ) or (V<sub>X</sub>=V<sub>DD</sub>- V<sub>TN</sub>) for the high and low value of SEL respectively. The inputs I0 and SEL do not influence the output as Ms1 is OFF. Further, in the precharge phase, C<sub>1</sub>, which is part of the dynamic current source (DCS), discharges to ground through Ms2, which is driven by  $\overline{\text{CLK}}$  and hence ON.



Fig. 5.1 Modified schematic of a D-PFSCL with the addition of Mc1

During the evaluation phase, the transistors Ms2, Mr1 and Mr2 are OFF as the clock signal is at high logic level. The output is now evaluated since Ms1 is ON. The capacitor  $C_1$  is charged till its potential becomes equal to node X potential to stop the charge transfer further. The impact of adding Mc2 is studied by assuming low and high values of the SEL signal.

Case 1: When SEL is low

In this case, Mc2 is OFF and the output is evaluated according to input I0. If I0 is high (low), the output node Q will attain high (low) logic level.

Case 2: When SEL is high

In this condition, Mc2 is ON and depending on the logic level at I0 either Md1 or Md2 will be ON. Let us assume that I0 is at low logic level such that Md2 is ON and Md1 is OFF. As the transistors Md2 and Mc1 both conduct therefore both of them participate in the charging of C<sub>1</sub>. But since the threshold voltage of Mc1 is lower than Md2, larger current flows through Mc1 and charges C<sub>1</sub> from power supply without much contribution of Md2. Thus, the charge transfer from C<sub>L</sub> to C<sub>1</sub> does not happen and the output node Q remains at high logic level. Similarly, for high value of I0, the output remains high due to charging of C<sub>1</sub> by Mc1 instead of Md1. Thus, the output remains high irrespective of I0 value. The node voltages are shown in graphical form for both Case 1 and Case 2 in Fig. 5.2.



Fig. 5.2 Voltages at different nodes of modified D-PFSCL

It is to be noted that  $C_1$  discharges to ground in precharge phase and gets charged till its potential becomes equal to V2 of node X. It is clear from the above discussion that for high value of SEL the output of proposed architecture-4 remains independent of I0. Thus, an extra circuitry is needed such that the output node responds to transition at the input when SEL is high. This is accomplished by adding an identical circuit with input I1, complement of SEL (SEL) and a DCS as shown previously in Fig. 5.1. The complete schematic of the proposed architecture-4 gate is drawn in Fig. 5.3. It now consists of two MTTs namely MTT1: (Md1, Md2, Mc1) and MTT2: (Md3, Md4, Mc2). The signals (I0, I1, SEL and SEL) are inputs and the CLK signal drives the precharge and DCSs. For high values of SEL, SEL is at low logic level, such that the output is obtained according to the I1 input. The operation of the proposed architecture-4 gate for various combinations of the inputs is summarized in Table 5.1. From the Table 5.1, it is seen that the complete functionality of the proposed architecture-4 gate can be modelled as

$$Q = \begin{cases} I0 & \text{if SEL=0} \\ I1, & \text{if SEL=1} \end{cases}$$
(5.1)

which can be written in Boolean expression as:

$$Q=I0 \overline{SEL}+I1 SEL$$
(5.2)

Thus, the proposed architectire-4 gate incorporates the AND-OR functionality. Further, (5.2) represents the function of a MUX2 with data inputs I0 and I1 and select input SEL. The comparison of proposed architecture-4 MUX2 realisation with the existing D-PFSCL MUX2 realisation in Fig. 5.4 clearly indicates the reduction in gate count and STB, which leads to performance improvement.



Fig. 5.3 Complete schematic of the proposed architecture-4 MUX2

| IN    | PUT(s)                                                                               |                | State of transistors in<br>MTT-1 |                 |                 | State o         | of transis<br>MTT-2 | tors in         | OUTPUT<br>Q |
|-------|--------------------------------------------------------------------------------------|----------------|----------------------------------|-----------------|-----------------|-----------------|---------------------|-----------------|-------------|
| SEL   | I <sub>0</sub>                                                                       | I <sub>1</sub> | M <sub>d1</sub>                  | M <sub>c1</sub> | M <sub>d2</sub> | M <sub>d3</sub> | M <sub>c2</sub>     | M <sub>d4</sub> | ×           |
| L     | L                                                                                    | L              | OFF                              | OFF             | ON              | OFF             | ON                  | ON              | L           |
| L     | L                                                                                    | Н              | OFF                              | OFF             | ON              | ON              | ON                  | OFF             | L           |
| L     | Н                                                                                    | L              | ON                               | OFF             | OFF             | OFF             | ON                  | ON              | Н           |
| L     | Н                                                                                    | Н              | ON                               | OFF             | OFF             | ON              | ON                  | OFF             | Н           |
| Н     | L                                                                                    | L              | OFF                              | ON              | ON              | OFF             | OFF                 | ON              | L           |
| Н     | L                                                                                    | Н              | OFF                              | ON              | ON              | ON              | OFF                 | OFF             | Н           |
| Н     | Н                                                                                    | L              | ON                               | ON              | OFF             | OFF             | OFF                 | ON              | L           |
| Н     | Н                                                                                    | Η              | ON                               | ON              | OFF             | ON              | OFF                 | OFF             | Н           |
| Where | Where H=High logic level ( $V_{DD}$ ), L=Low logic level ( $V_{DD}$ - $V_{SWING}$ ). |                |                                  |                 |                 |                 |                     |                 |             |

Table 5.1 Summary of the proposed architecture-4 MUX2 operation



(a)



Fig. 5.4 MUX2 a) Gate level schematic b) Existing D-PFSCL

The proposed architecture-4 gate (Fig. 5.3) can be transformed into a generalized structure by inserting separate precharge transistors such that four nodes (O1-O4) are available to configure the gate for any given functionality. The complete schematic for the proposed architecture-4 generic gate is given in Fig. 5.5. At any given instant, only one output node from each cell i.e. (either O1 or O2) and (either O3 or O4) is combined to define the output. The generic structure also offers the advantage of obtaining output either in true or the compliment of the implemented function. To explain the same, the implementation of D-PFSCL XOR2 gate is considered. The XOR2 gate functionality with input A, B can be expressed as:

$$Q = \begin{cases} A & \text{if } B = 0 \\ \\ \overline{A} & \text{if } B = 1 \end{cases}$$
(5.3)

So, mapping it with Fig. 5.5, in MTT -1  $\overline{M}$  is replaced by input  $\overline{B}$  while in MTT-2, M is replaced by input B respectively, while the inputs A1 and A2 are both replaced by input A, as shown in Fig. 5.6. Extending this concept, any two input gate can be implemented using

the proposed architecture-4 generic gate by mapping appropriate inputs, as given in Table 2.2.



Fig. 5.5 Proposed architecture-4 generic gate



Fig. 5.6 Proposed architecture-4 XOR2 gate

## 5.2.1 Analysis

In the design of proposed architecture-4 gate, the value of  $C_1$  plays an important role since it affects the voltage swing of the gate. Since the structure of the DCS remains the same as in existing D-PFSCL, the dimensioning of  $C_1$  remains as per (2.23)-(2.25).

Further, seeing that  $C_1$  determines the required voltage swing at the output node, the PMOS transistors Mr1 - Mr4 as in Fig. 5.3 are kept at minimum dimensions. Also, since in a dynamic circuit, the charging and the discharging of the nodes occurs through the instantaneous currents, the dimensions of the PDN are kept minimum as per the technology node used.

The static power consumption in the proposed architecture-4 gate is negligible as the transistors Mr1-Mr2 (Mr3-Mr4) and Ms1-Ms2 (Ms3-Ms4) never turn ON simultaneously, and a direct path from power supply to ground is never established. Further, the capacitors will charge up during different phases of clock and therefore will consume dynamic power ( $P_{dyn}$ ). During the precharge phase, the load capacitor ( $C_L$ ) is precharged by the power supply ( $V_{DD}$ ). In the evaluation phase, depending on the input,  $C_1$  is charged via middle transistors of PDN or by charge transfer from the  $C_L$ . The dynamic power consumption of the logic gate depends on the switching activity of the output node [1]. It is already known that for uniformly distributed inputs, the low-to-high transition probability for an N input gate is

$$\alpha_{\text{L-H}} = \frac{N_0}{2^N} \tag{5.4}$$

where  $N_0$  is the number of zero entries in the truth table of the logic function. Thus, the power consumption of the proposed architecture-4 gate with transition probability  $\alpha_{L-H}$  is given as:

$$P_{dyn} = \alpha_{L-H} C_L V_{DD} V_{SWING} f_{CLK} + C_1 V_{DD} (V_{DD} - V_{T,n}) f_{CLK}$$
(5.5)

Where  $f_{CLK}$  is the clock frequency.

If K identical gates are used in the realisation of the multi-stage D-PFSCL gate then the power can be written as (5.14), including the dynamic power consumption of the M number of STBs used.

$$P_{dyn} = K\alpha_{L-H}C_LV_{DD}V_{SWING}f_{CLK} + KC_1V_{DD}(V_{DD} - V_{T,n})f_{CLK} + 2MC_{STB}V_{DD}^2f_{CLK}$$
(5.6)

Where  $C_{STB}$  is the capacitance at drain of M1-M2 of STB (and consequently of M3-M4 of STB) and the factor of 2 is due to the two transistor pairs M1-M2, M3-M4.

## 5.2.2 Simulations

In this section, the performance of the proposed architecture-4 gates is first compared with existing gates and followed by a performance examination of a proposed architecture-4 gate under different conditions.

Different logic functions are implemented using the proposed architecture-4 and existing D-PFSCL, static PFSCL and dynamic CMOS styles. The simulations are performed with a power supply  $V_{DD}$ ,  $V_{SWING}$ , clock frequency and load capacitance of 1.8 V, 0.4V, 1 GHz and 100 fF respectively. During simulations, it is assumed that the compliment of all the inputs is available. The simulations of the proposed architecture-4 XOR2 gate are carried out to get an insight on the impact of  $\alpha$  (the factor by which the threshold voltage of the middle transistor Mc1/Mc2 is reduced with respect to the outer transistors Md1-Md4) on current flow. By assuming inputs, A= $\overline{B}$ =1 in proposed XOR2 (Fig. 5.6), the ratio of currents in Md3 and Mc2 (I<sub>Md3</sub>/I<sub>Mc2</sub>) is plotted in Fig. 5.7 for various value of  $\alpha$ . The correct operation requires that maximum current should flow through Mc2. To achieve this, it is observed that high value of  $\alpha$  increases the current flow through middle transistor, thereby minimizing the current ratio. A value of  $\alpha$ =1.5 is chosen for simulation since the further increase in its value does not 100

cause much reduction in the current ratio.



Fig. 5.7  $I_{Md3}/I_{Mc2}$  for different values of  $\alpha$ 

The realisation of XOR2 gate based on existing D-PFSCL is shown in Fig. 5.8. A STB is inserted between the stages in accordance with the existing scheme. The simulation waveform of XOR2 gate realized using the existing D-PFSCL and the proposed architecture-4 is shown in Fig. 5.9 and is examined for the marked time duration (A-B-C-D-E). In the durations (A-B) and (C-D), the CLK input is at low logic level (0 V), the circuit works in precharge phase and the XOR2 gate outputs for both the schemes is precharged to high logic level (1.8 V). Also, any changes in the inputs do not influence the output. Further, in the intervals (B-C) and (D-E), the circuit works in evaluation phase. In the interval B-C, for high value of inputs A and B, the output is at low logic level (1.4 V). In the interval D-E, for low value of input A and high value of input B, the output is at high logic level (1.8 V). So, the XOR2 gate based on both the schemes exhibits the same behaviour. However, a close observation of the waveforms reveals a reduction in delay for proposed architecture-4 XOR2 in comparison to existing D-PFSCL XOR2. The other logic functions namely MUX2, NAND2, XOR3 are also realized and similar behaviour is observed.







Fig. 5.9 Simulation waveforms of the XOR2 gate

Further, the above logic functions are also implemented and simulated using existing static PFSCL [37] and dynamic CMOS [81] styles. Their performance is compared in terms of parameters namely gate count, power, precharge ( $\tau_{pre}$ ) and evaluation delay ( $\tau_{PHL}$ ) and other parameters. For the sake of fair comparison, the aspect ratios of transistors are maintained same in all the styles. The simulation results are listed in Table 5.2 and from the results the following conclusions are derived.

- a. The realisations of NAND2, XOR2, XOR3, MUX2 using the existing architectures requires three to nine D-PFCSL gates in contrast to one to three in the proposed architecture-4. Thus, gate count in the proposed architecture-4 is minimum among PFSCL variants, which directly benefits the performance of the circuits. It can be seen that the circuit based on the proposed architecture-4 outperforms the existing D-PFSCL gates in all performance parameters.
- b. It can also be observed that the proposed architecture-4 gates consume less power than the static PFSCL counterparts. In terms of  $\tau_{PHL}$ , the static PFSCL circuits have larger delay values as the cascading of NOR gates adds to the delay.
- c. The simulation results of the proposed architecture-4 gates indicate better performance than the dynamic CMOS circuits in terms of power and energy delay product (EDP). This is due to the fact that the proposed architecture-4 gates have reduced swing in comparison to the full swing behaviour of the dynamic CMOS circuits.

Table 5.2Performance comparison

| Style                      | Dynamic | Static | Existing | Proposed       |  |  |  |  |
|----------------------------|---------|--------|----------|----------------|--|--|--|--|
| Parameter                  | CMOS    | PFSCL  | D-PFSCL  | architecture-4 |  |  |  |  |
| Circuit 1: 2:1 Multiplexer |         |        |          |                |  |  |  |  |

| 1               | 3                                                                                                                     | 3                                                                                                                                                                                                                                                                                                                                                                                                                                             | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
|-----------------|-----------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| 0               |                                                                                                                       | 1                                                                                                                                                                                                                                                                                                                                                                                                                                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 98              | 194                                                                                                                   | 334                                                                                                                                                                                                                                                                                                                                                                                                                                           | 82                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 76              |                                                                                                                       | 207                                                                                                                                                                                                                                                                                                                                                                                                                                           | 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 389             | 600                                                                                                                   | 182                                                                                                                                                                                                                                                                                                                                                                                                                                           | 88                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 933             | 5645                                                                                                                  | 5075                                                                                                                                                                                                                                                                                                                                                                                                                                          | 147                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |  |  |  |
| -               | 300 µA                                                                                                                |                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
|                 | Circuit 2: XOI                                                                                                        | R2                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
| 1               | 3                                                                                                                     | 3                                                                                                                                                                                                                                                                                                                                                                                                                                             | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 0               |                                                                                                                       | 1                                                                                                                                                                                                                                                                                                                                                                                                                                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 119             | 156                                                                                                                   | 197                                                                                                                                                                                                                                                                                                                                                                                                                                           | 86                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 78              |                                                                                                                       | 150                                                                                                                                                                                                                                                                                                                                                                                                                                           | 45                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 364             | 590                                                                                                                   | 290                                                                                                                                                                                                                                                                                                                                                                                                                                           | 85                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 1288            | 3589                                                                                                                  | 2813                                                                                                                                                                                                                                                                                                                                                                                                                                          | 157                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |  |  |  |  |  |
|                 | 300                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
|                 | Circuit 3: NAN                                                                                                        | D2                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
| 1               | 1                                                                                                                     | 1                                                                                                                                                                                                                                                                                                                                                                                                                                             | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 0               |                                                                                                                       | 0                                                                                                                                                                                                                                                                                                                                                                                                                                             | 0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 105             | 101                                                                                                                   | 96                                                                                                                                                                                                                                                                                                                                                                                                                                            | 78                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 75              |                                                                                                                       | 88                                                                                                                                                                                                                                                                                                                                                                                                                                            | 46                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |  |  |  |  |  |
| 243             | 180                                                                                                                   | 54                                                                                                                                                                                                                                                                                                                                                                                                                                            | 114.3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |  |  |  |
| 669.7           | 459                                                                                                                   | 124.4                                                                                                                                                                                                                                                                                                                                                                                                                                         | 173.8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |  |  |  |  |
|                 | 100                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
| Circuit 4: XOR3 |                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |  |  |  |  |
| 3               | 9                                                                                                                     | 9                                                                                                                                                                                                                                                                                                                                                                                                                                             | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
| 0               | 0                                                                                                                     | 3                                                                                                                                                                                                                                                                                                                                                                                                                                             | 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |  |  |  |  |
|                 | 0<br>98<br>76<br>389<br>933<br>-<br>1<br>0<br>119<br>78<br>364<br>1288<br><br>1<br>0<br>105<br>75<br>243<br>669.7<br> | 0          98       194         76          389       600         933       5645         -       300 $\mu$ A         1       3         0          119       156         78          364       590         1288       3589          300         1       1         0          364       590         1288       3589          300         105       101         75          243       180         669.7       459          100         3       9 | 0          1           98         194         334           76          207           389         600         182           933         5645         5075           -         300 μA            1         3         3           0          1           119         156         197           78          150           364         590         290           1288         3589         2813            300            1         1         1           0          0           1288         3589         2813            300            1         1         1           0          0           105         101         96           75          88           243         180         54           669.7         459         124.4            100            3         9         9 |  |  |  |  |  |

| $	au_{PHL}$ (ps)         | 246    | 369   | 335   | 134   |
|--------------------------|--------|-------|-------|-------|
| $\tau_{pre} (ps)$        | 257    | 398   | 298.7 | 163.5 |
| Power (µW)               | 477    | 1620  | 338   | 191   |
| EDP $(x10^{-27}J.s)$     | 7216.5 | 55145 | 9483  | 857.4 |
| I <sub>static</sub> (µA) |        | 900   |       |       |

Next, the impact of process variations is analysed by performing Monte Carlo for 500 simulation runs. The variations in output of the proposed architecture-4 XOR2 gate when both the inputs A and B are at logic high level are shown in Fig. 5.10. A pictorial histogram showing the number of samples and the variation in  $\tau_{PHL}$ ,  $\tau_{pre}$ ,  $V_{SWING}$  and the dynamic power dissipation for the proposed architecture-4 XOR2 gate is shown in Fig. 5.11 respectively. Similar variations in  $\tau_{PHL}$ ,  $\tau_{pre}$ ,  $V_{SWING}$  were observed for XOR2 gates in all the styles. The mean and the variance in the performance parameters for different logic styles are listed Table 5.3.The results indicate that the proposed architecture-4 gate shows more sensitivity towards variations in comparison to existing D-PFSCL gate.



Fig. 5.10 Proposed architecture-4 based XOR2 gate output under Monte Carlo analysis



Fig. 5.11 Monte Carlo variation in Proposed architecture-4 XOR2 gate a)  $\tau_{PHL}$  b)  $\tau_{pre}$  c)  $V_{SWING}$  d) Dynamic power dissipation

| Style                       | Dynamic<br>CMOS |      | Static<br>PFSCL |       | Existing<br>D-PFSCL |      | Proposed<br>Architecture-4 |       |
|-----------------------------|-----------------|------|-----------------|-------|---------------------|------|----------------------------|-------|
| Parameter                   | Mean            | 3σ   | Mean            | 3σ    | Mean                | 3σ   | Mean                       | 3σ    |
| τ <sub>PHL</sub> (ps)       | 120             | 7.57 | 162             | 70.4  | 168.3               | 17.9 | 116                        | 24.6  |
| $\tau_{\rm pre}~({\rm ps})$ | 69.7            | 5.8  |                 |       | 159.7               | 12.4 | 28.9                       | 34.4  |
| Power (µW)                  | 321             | 3.45 | 580             | 23    | 1250                | 27.6 | 193                        | 11.6  |
| V <sub>SWING</sub> (V)      | 1.8             | 0.01 | 0.388           | 0.057 | 0.36                | 0.1  | 0.33                       | 0.031 |

Table 5.3 Monte Carlo simulation results for the XOR2 gate in different styles

Further, simulation of the proposed architecture-4 was carried out under all the process corners to check the behaviour of the proposed architecture-4 based XOR2 and the results in all architectures are as in Fig. 5.12. It may be observed that the proposed architecture-4 based XOR2 works correctly under all process corners.



(a)



(b)



(c)

Fig. 5.12 Process corner results of XOR2 gate a)  $\tau_{PHL}$  b)  $\tau_{pre}$  c) power

To further illustrate the usefulness of proposed architecture-4, 8:1 multiplexer (MUX8) is considered. It is implemented and simulated in all the logic styles and the performance is compared. The block diagram of the MUX8 in the proposed architecture-4 is shown in Fig. 5.13. A performance summary is drawn in Table 5.4. It can be observed that the MUX8 realized using proposed architecture-4 outperforms in comparison to the other styles. Further, the area occupied by XOR2 gate realized using dynamic CMOS, static PFSCL, existing D-PFSCL and the proposed architecture are 50  $\mu$ m<sup>2</sup>, 60  $\mu$ m<sup>2</sup>, 250  $\mu$ m<sup>2</sup>, 100  $\mu$ m<sup>2</sup> respectively. The results show that the proposed architecture-4 based implementation occupies lesser area compared to existing D-PFSCL based implementation.



Fig. 5.13 Block diagram of the proposed architecture-4 MUX8

| Table 5.4 Performance summary of | MUX8 |
|----------------------------------|------|
|----------------------------------|------|

| Style                       | Dynamic | Static | Existing | Proposed       |
|-----------------------------|---------|--------|----------|----------------|
| Parameter                   | CMOS    | PFSCL  | D-PFSCL  | Architecture-4 |
| Gate count                  | 14      | 21     | 21       | 7              |
| STB count                   | 0       | 0      | 5        | 2              |
| $\tau_{PHL}$ (ps)           | 754     | 864    | 831      | 400            |
| $\tau_{\rm pre}~({\rm ps})$ | 562     |        | 620      | 250            |
| Power (µW)                  | 2268    | 3780   | 1134     | 700            |
| EDP                         | 322     | 705    | 195      | 28             |
| $(x10^{-24}J.s)$            |         |        |          |                |
| $I_{static}(\mu A)$         |         | 2100   |          |                |

# 5.3 Proposed Architecture -5

In the previous section, the proposed architecture-4 has the advantage of having no static power consumption and specifically, it could implement any two input complex logic in a single stage, thus avoiding the use of STB(s) in between with the corresponding reduction in delay and dynamic power consumption. However, it can be seen that complex logic with higher number of inputs would require cascading of gates based on proposed architecture-4. Hence, in order to have implementation of n-input complex logic in minimum number of gates as possible, an alternative based on D-PFSCL architecture is proposed, that can support a higher fan-in.

The proposed architecture-5 is based on the inclusion of transmission gate in the PDN of a D-PFSCL gate. It consists of transmission gate based logic network and a D-PFSCL gate. A generalized gate based on the proposed scheme is shown in Fig. 5.14. The logic function is implemented using the transmission gates and the corresponding output is fed to the D-PFSCL gate. In the Fig. 5.14, the transmission gate based Boolean function, f with n inputs, could be any 2 input or 3 input function, with output Q<sub>1</sub> that is fed to transistor Md1 of the PDN. The working of the proposed gate is similar to existing D-PFSCL gates; however the working of the proposed architecture-5 is explained here again for clarity. The dynamic current source DCS1, shown in the Fig. 5.14 consists of Ms1 and Ms2 driven by clock (CLK) and its complement ( $\overline{\text{CLK}}$ ) respectively with capacitance C<sub>1</sub> connected to node Y. During the precharge phase i.e. CLK=0, the output node Q is charged to V<sub>DD</sub> through Mr2 while the capacitor C<sub>1</sub> is discharged to ground potential through Ms2. Additionally, the logic function is evaluated through the transmission gate based network and the final output from this network, Qt is available at the input of the D-PFSCL gate, i.e. Md1. In the subsequent evaluation phase i.e. CLK=1, the D-PFSCL gate evaluates. In this phase, Ms1 is ON and is connected to virtual ground through C<sub>1</sub>. Depending on the value of  $Q_t$ , the output of the transmission gate network, either Md1 or Md2 conducts. For the case where input  $Q_t$  is low, then Md2 conducts, pulling down the node voltage of Q from  $V_{DD}$  to  $V_{DD}$ - $V_{SWING}$ , while  $C_1$  gets charged up. For the case where input  $Q_t$  is high, Md1 conducts and  $C_1$  gets charged up through Md1. The output Q in this case remains at  $V_{DD}$ , which is the output high voltage.



Fig. 5.14 Proposed architecture-5 generic gate

Further, as seen from the description of the working of the proposed architecture-5 gate, it is observed that the output of the transmission gate has to be ready before the start of the evaluation phase, meaning that the propagation delay through the transmission gate should be less than the precharge phase of the clock. Through simulations, it is observed that logic circuits upto MUX8 can be implemented using proposed architecture-5 in a single stage.

However, for larger logic circuits, let us say MUX16, MUX32 etc. cascading of multiple proposed architecture-5 based gates would be required.

Since the proposed architecture-5 is a dynamic clock based circuit, cascading of multiple gates should be done such that the evaluation of the subsequent stages start only after the output of the previous stage has stabilized. This is carried out by inserting a self-timed buffer (STB) to avoid malfunction because of the simultaneous evaluation of all stages [67]. An implementation of MUX16 using 3 proposed architecture-5 based gates and 1 STB is shown in Fig. 5.15.



Fig. 5.15 Proposed architecture-5 MUX16 gate

## 5.3.1 Analysis

In proposed architecture-5, the capacitor in the DCS,  $C_1$ , controls the voltage swing and its dimensioning is done as per (2.23)-(2.25) since the DCS is the same as in existing D-PFSCL.

Further, seeing that  $C_1$  determines the required voltage swing at the output node, the PMOS transistors Mr1 - Mr2 as in Fig. 5.14 are kept at minimum dimensions. Further, since in a dynamic circuit, the charging and the discharging of the nodes occurs through the

instantaneous currents, the dimensions of the transistors Md1-Md2 are kept minimum as per the technology node used.

The static power consumption in the proposed architecture-5 gate is negligible as the transistors Ms1-Ms2 and Mr1-Mr2 never turn ON simultaneously, and a direct path from power supply to ground is never established. Further, the capacitors will charge up during different phases of clock and therefore will consume dynamic power ( $P_{dyn}$ ). During the precharge phase, the load capacitor ( $C_L$ ) is precharged by the power supply ( $V_{DD}$ ). Thus, the working of the proposed architecture-5 based gate is exactly the same as existing D-PFSCL architecture and  $P_{dyn}$  is given by (2.28) for single gate and (2.29) for multi-gate implementation.

## 5.3.2 Simulations

The effectiveness of the proposed architecture-5 in the implementation of various complex logic functions is evaluated by designing and simulating multiple gates. Simulations were carried out with power supply, voltage swing, clock frequency and load capacitance of 1.1V, 0.4V, 1 GHz and 50fF respectively. Firstly, the performance of XOR2 gate based on proposed architecture-5 (Fig.5.16a) is compared to XOR2 gate based on the existing architectures. The corresponding schematic of the proposed architecture-4, as shown in Fig. 5.16b employs two modified triple-tail cells having individual and separate DCSs. The third realisation as per existing D-PFSCL is shown in Fig.5.16c, having two input D-PFSCL gates arranged in two stages with intermittent STB. Thus, both of them employ multiple DCSs leading to increased dynamic power consumption. However, the XOR2 gate realized using the proposed architecture-5 implements the functionality in a single stage. The input-output waveform of the proposed architecture-5 based XOR2 gate is as shown in Fig.5.16d.The output timing transitions are depicted in Fig. 5.17e which shows clearly the fast transition in

the proposed architecture-5 based XOR2 gate in comparison to the XOR2 gate based on existing D-PFSCL based architectures.



(a)



(b)



(c)



(d)



#### (e)

Fig. 5.16 XOR2 gate a) Proposed architecture-5 b) Proposed architecture-4 c) D-PFSCL [67]d) Simulation waveforms e) Transitions at the proposed architecture-5 XOR2 gate output

In order to provide the complete view of the investigation, the other common functionalities such as two input AND (AND2), OR (OR2), 2: 1 multiplexer (MUX2), 8:1 multiplexer (MUX8) are implemented and simulated in all the three architectures under the same simulation conditions. The findings are also summarized in Table 5.5. The advantage in terms of delay and power in the proposed architecture-5 based function implementation is evident from the results. Specifically, it is also observed through simulation results that for proposed architecture-5 MUX8, the maximum reduction in  $\tau_{PHL}$  and dynamic power dissipation is 10.8%, 95.1% with respect to proposed architecture-4 MUX8 and 92.7%, 98.3% with respect to existing D-PFSCL MUX8.

Table 5.5 Performance Comparison of gates based on D-PFSCL, proposed architecture-4 and proposed architecture-5

| Function | Parameters                  | D-PFSCL | Proposed architecture-4 | Proposed architecture-5 |
|----------|-----------------------------|---------|-------------------------|-------------------------|
| AND2     | τ <sub>pre</sub> (ps)       | 34.2    | 31.7                    | 27.9                    |
|          | $\tau_{PHL}$ (ps)           | 36      | 33.4                    | 32.5                    |
|          | $P_{dyn}\left(\mu W\right)$ | 90.42   | 61.6                    | 14.08                   |
| OR2      | τ <sub>pre</sub> (ps)       | 28.7    | 30.9                    | 27.7                    |
|          | $	au_{PHL}$ (ps)            | 34      | 32.5                    | 32.5                    |
|          | $P_{dyn}\left(\mu W\right)$ | 29.6    | 64.089                  | 11.99                   |
| XOR2     | τ <sub>pre</sub> (ps)       | 46.7    | 27.1                    | 27.6                    |
|          | $	au_{PHL}$ (ps)            | 76.9    | 44.1                    | 33.09                   |
|          | $P_{dyn}\left(\mu W\right)$ | 93.96   | 61.26                   | 25.34                   |
| MUX2     | τ <sub>pre</sub> (ps)       | 44.02   | 30.8                    | 27.68                   |
|          | $	au_{PHL}$ (ps)            | 109.7   | 32.6                    | 33.09                   |
|          | P <sub>dyn</sub> (µW)       | 90.81   | 64.056                  | 31.067                  |
| MUX8     | τ <sub>pre</sub> (ps)       | 88.8    | 41.3                    | 28.1                    |
|          | τ <sub>PHL</sub> (ps)       | 454.6   | 37.1                    | 33.1                    |
|          | P <sub>dyn</sub> (µW)       | 1410    | 488.2                   | 23.84                   |

Continuing with the study of proposed architecture-5 based XOR2 gate, the effect of supply voltage reduction on precharge delay, evaluation delay, dynamic power dissipation and EDP is simulated and the same is plotted in Fig. 5.17. An increasing trend in the delay values in Fig. 5.17a, b is observed with reducing supply voltage while a decreasing trend exists for dynamic power with reducing supply voltage as shown in Fig. 5.17c. The changes in EDP are

also noted and a minimum EDP point is identified for all the XOR2 gates as shown in Fig. 5.17d. At the minimum EDP point, the proposed architecture-5 XOR2 gate shows a reduction of 98% and 72% in EDP value with respect to D-PFSCL [66] and proposed architecture-4 based XOR2 gate respectively.



Fig. 5.17 Performance with respect to power supply variations a)  $\tau_{pre}$  b)  $\tau_{PHL}$  c) dynamic power d) EDP

Carrying the analysis further, the impact of reduction in supply voltage on the maximum clock frequency at which the gate can operate correctly is analysed through the results plotted in Fig. 5.18.



(a)



(b)

Fig. 5.18 a) Maximum Operating frequency with respect to different supply voltages b) Operating frequency with respect to dynamic power consumption

From Fig. 5.18a, it is seen that the proposed architecture-5 XOR2 gate can operate at higher frequency as compared to the XOR2 gate based on other architectures. It is also observed that the maximum operating frequency increases with increasing supply voltage, with a maximum operating frequency of 11 GHz for proposed architecture-5 XOR2 at supply voltage of 1.1V. Further, an insight to dynamic power consumption with clock frequency for a XOR2 gate is provided in Fig. 5.18b. It is seen that for the same power consumption, the proposed architecture-5 offers an improvement in speed of 450% while for the same speed; it offers an 80% reduction in power consumption with respect to existing D-PFSCL style.

Further, to study the impact of variations on the performance parameters namely  $\tau_{pre}$ ,  $\tau_{PHL}$  and dynamic power, Monte Carlo simulations of 500 runs are carried out. The observations of the Monte Carlo simulations are summarized in Table 5.6 The results signify that the performance of the circuit is overall improved by following the logic implementation based on the proposed architecture-5. Also, the proposed architecture-5 XOR2 gate shows a 0.29%, 0.33% and 1.3% variation in precharge delay, evaluation delay and dynamic power respectively. The obtained values are the least in comparison to the XOR2 design based on existing architectures. The Monte Carlo results for 500 runs the proposed architecture-5 XOR2 gate are plotted in Fig. 5.19.

| Parameter                  | D-PFSCL |      |      | Proposed architecture-4 |      |      | Proposed architecture-5 |        |       |
|----------------------------|---------|------|------|-------------------------|------|------|-------------------------|--------|-------|
|                            | μ       | σ    | %    | μ                       | σ    | %    | μ                       | Σ      | %     |
| τ <sub>pre</sub> (ps)      | 44.2    | 0.79 | 1.7% | 25.8                    | 1.1  | 4.2% | 27.6                    | 0.081  | 0.29% |
| $	au_{PHL}(ps)$            | 107     | 3.5  | 3.2% | 51.0                    | 3.2  | 6.2% | 33.3                    | 0.11   | 0.33% |
| $P_{dyn}\left(\mu W ight)$ | 89.7    | 0.73 | 0.8% | 59.5                    | 0.79 | 1.3% | 24.6                    | 0.3387 | 1.3%  |

Table 5.6 Monte Carlo results for XOR2

 $\mu$  = Mean,  $\sigma$  = Standard deviation, % = percentage variation



Fig. 5.19 Monte Carlo simulation results (500 runs) for proposed architecture-5 XOR2 a)  $\tau_{pre}$  b)  $\tau_{PHL}$  c) dynamic power

The process corner analysis at FF, FS, SF and SS for all the XOR2 designs is summarized in Fig. 5.20. It is observed that the proposed architecture-5 XOR2 gate operates correctly in all corners and shows a best case evaluation delay for FF corner of 29ps and a worst case of 45ps with higher dynamic power consumption of  $28.5\mu$ W for the best case and lower power consumption of  $22.4\mu$ W for the worst case. For the precharge delay, it is observed that best case delay corresponds to SF process corner with 19.2ps and worst case delay corresponds to FS process corner with 35.4ps.



(a)



Fig. 5.20 Process corner results of the XOR2 gate in all the architectures a)  $\tau_{pre}$  b)  $\tau_{PHL}$  c) dynamic power

To illustrate further the performance of proposed architecture-5, a full adder is designed and simulated in all the logic styles and the performance is noted. The block diagram of the full adder based on the proposed architecture-5 is shown in Fig. 5.21. The performance summary is shown in Table 5.7. It is observed that the full adder realized using proposed architecture-5 performs better in comparison to the other styles.



(a)



(b)

Fig. 5.21 Full adder based on proposed architecture-5 a) XOR3 b) Carry

123

| Function | Parameters                     | <b>D-PFSCL</b> | Proposed architecture-4 | Proposed architecture-5 |
|----------|--------------------------------|----------------|-------------------------|-------------------------|
|          |                                |                |                         |                         |
| Full     | $\tau_{pre} (ps)$              | 75.9           | 38.9                    | 26.7                    |
| adder    | τ <sub>PHL</sub> (ps)          | 361.1          | 126                     | 34.02                   |
| (SUM)    | $P_{dyn} \left( \mu W \right)$ | 524.7          | 185.5                   | 26.74                   |
| Full     | $\tau_{\rm pre}  ({\rm ps})$   | 45.7           | 46.5                    | 28.04                   |
| adder    | τ <sub>PHL</sub> (ps)          | 100.5          | 113.8                   | 33.12                   |
| (Carry)  | $P_{dyn} \left( \mu W \right)$ | 210.66         | 276.96                  | 26.23                   |

Table 5.7 Performance Comparison of full adder based on existing D-PFSCL, proposed architecture-4 and proposed architecture-5

## 5.4 Conclusion

In this chapter, two new D-PFSCL architectures are presented. The proposed architecture-4 modifies the existing D-PFSCL gate architecture so that any complex two input logic expression can be generated using a single gate, compared to multistage implementation required for existing D-PFSCL. The behaviour of the proposed architecture-4 is analysed and the expression for dynamic power dissipation is derived. With the reduction in number of gates and STBs for the implementation of any complex two input logic compared to conventional D-PFSCL, the advantage in delay and dynamic power consumption is apparent and is verified through simulations. It is observed that there is a maximum reduction in delay and power consumption of 51.8% and 38.2% respectively for proposed architecture-4 MUX8 gate compared to existing D-PFSCL. In proposed architecture-5 a new way of realisation of dynamic gates in PFSCL style is presented, which can easily implement different complex logic functions with lesser number of cascaded gates and STBs, compared to existing D-PFSCL and also proposed architecture-4. Further, the proposed architecture-5 uses single

DCS per gate, leading to lower dynamic power dissipation. Thus, the proposed architecture is an efficient method for realisation of logic functions in terms of performance parameters. This is also observed through simulation results, where for proposed architecture-5 based MUX8 gate, the maximum reduction in delay and power dissipation is 10.8%, 95.1% with respect to proposed architecture-4 MUX8 and 92.7%, 98.3% with respect to existing D-PFSCL MUX8. Comparing the delay and power consumption for MUX8 based on proposed architecture-4 and proposed architecture-5 respectively, it is observed that proposed architecture-5 performs better. For both proposed architecture-4 and proposed architecture-5, effect of process variations has also been studied through Monte Carlo analysis and process corner analysis, from which it is ascertained that both the proposed architectures function correctly.

# Chapter 6 Subthreshold PFSCL gates

# 6.1 Introduction

In the previous chapters, various methods to reduce power dissipation was discussed so as to make the circuits more suitable for the portable electronic applications. In this chapter, an alternate method for reduction in power dissipation is explored wherein a specific class of circuits operating with a power supply lower than the threshold voltage is considered. These circuits, called as subthreshold circuits, refer to the operation in the subthreshold region and provide proportional reduction in power dissipation. Such circuits have very specific target applications that need operation in extremely low voltages apart from having low power dissipation and low speed of operation. Circuits that operate in subthreshold region are widely studied in logic styles such as CMOS, SCL etc. [81-100] but the same has not been studied with PFSCL style. With the advantages of PFSCL style, it is expected that PFSCL gates operating in subthreshold region referred to as ST-PFSCL will also offer certain unique advantages compared to the subthreshold CMOS (ST-CMOS) counterpart.

# 6.2 Proposed architecture-6

Proposed architecture-6 is based on ST-PFSCL and operation of ST-PFSCL gate is discussed followed by analysis of its behaviour. The proposed architecture-6 gate uses power supply  $V_{DD}$  which is less than the threshold voltage of the MOS [101-102]. An ST-PFSCL inverter is shown in Fig. 6.1. It consists of source coupled pair of transistors Md1-Mf, biased by constant current  $I_{SS}$ . The constant current  $I_{SS}$  is typically in the range of few pico Amperes to hundreds of nano Amperes. The transistor Md1 is driven by logic input A while the positive feedback drives transistor Mf1 using the output voltage Q. The constant current is generated by transistor Ms, which is biased by  $V_{BN}$ . The voltages  $V_{BN}$  and  $V_{BP}$  are generated by replica bias circuit [104] that accurately controls the bias current  $I_{SS}$  and correspondingly sets  $V_{BP}$  for given voltage swing.



Fig. 6.1 ST-PFSCL inverter

The PMOS Mr1, biased by appropriate potential  $V_{BP}$ , acts as a load resistance  $R_P$  generating the single ended output Q. The PMOS resistance  $R_P$  is implemented as per [103], with bulk drain connected PMOS, as it is capable of generating resistance in the gigaOhms (G $\Omega$ ) range without requiring large dimensions. This requirement for resistance in the G $\Omega$  range comes due to the fact that bias current is in the pico Amperes to nano Amperes range with voltage drop of few hundred milli volts. The working of the ST-PFSCL inverter is as follows: The transistor pair Md1-Mf steers the bias current I<sub>SS</sub> through either of the branches depending on the input A, which is then converted into equivalent output voltage by the PMOS resistance  $R_P$ . For the case where input A is low, the bias current I<sub>SS</sub> does not flow through Md1 and hence the output Q remains at  $V_{DD}$ , which is the logic high level. In this case, the bias current is drained by Mf, which is driven by the high input,  $V_{DD}$ . For the case, where input A is high, the bias current I<sub>SS</sub> flows through Md1 and hence the output Q drops to  $V_{DD}$ -I<sub>SS</sub>R<sub>P</sub>, which is the logic low level. Thus, the functionality of the gate can be modeled as per (6.1), showing that the proposed architecture- 6 based inverter functions as expected.

$$Q = \begin{cases} V_{DD} & \text{if } A = 0 \\ & & \\ V_{DD} - I_{SS} R_P & \text{if } A = 1 \end{cases}$$

$$(6.1)$$

The behaviour of the proposed architecture-6 based inverter is modelled in terms of static and delay parameters to further understand it's working.

#### 6.2.1 Analysis

To analyze the behaviour of ST-PFSCL gates, the behaviour of the MOS in subthreshold region is first discussed. This is followed by discussion on the parameters - the voltage swing  $V_{SWING}$ , the small signal voltage gain,  $A_V$  and the delay model, which were derived based on the working of the proposed architecture-6 based inverter.

The transistor drain current, I<sub>Dsub</sub>, is related to the gate and drain voltages as [104].

$$I_{\text{Dsub}} = I_0 \frac{W_{\text{N}}}{L_{\text{N}}} e^{\frac{(V_{\text{GS}} - V_{\text{T0}})}{\eta_{\text{n}} U_{\text{T}}}} \left( 1 - e^{\frac{-V_{\text{DS}}}{U_{\text{T}}}} \right)$$
(6.2)

where  $I_0$ , the specific current of the device is written as

$$I_0 = 2\eta_n \mu_n C_{ox} \frac{W_N}{L_N} U_T^2$$
(6.3)

 $\eta_n$  is the subthreshold slope factor for the NMOS,  $U_T = kT/q$  is the thermal voltage,  $W_N$  and  $L_N$  are the effective channel width and length of the device and other symbols have their usual meaning.

For the case of saturation, the  $I_{Dsub}$  modifies to equation (6.3) and loses its dependence on the drain voltage, with the condition that  $V_{DS} > 3\eta_n V_{TN}$ .

$$I_{\text{Dsub}} = I_0 \frac{W_N}{L_N} e^{\frac{(V_{\text{GS}} - V_{\text{TN}})}{\eta_n U_T}}$$
(6.3)

130

In the ST-PFSCL inverter, the transistor Ms operates in the saturation region while the transistor pair Md1-Mf1 operates in either the saturation or the cut-off depending on the logic input. Therefore, the bias current  $I_{SS}$ , as shown in Fig. 6.1 is equal to  $I_{Dsub}$ .

$$I_{SS} = I_0 \frac{W_N}{L_N} e^{\frac{(V_{GS} - V_{TN})}{\eta_n U_T}}$$
(6.4)

Further, looking at the flow of the bias current  $I_{SS}$  through transistor pair Md1-Mf, the potential at the output node Q can be expressed as in (6.5) for output high voltage  $V_{OH}$  and output low voltage  $V_{OL}$ .

$$V_{OH} = V_{DD}$$
 (6.5a)

$$V_{OL} = V_{DD} - \left( I_0 \frac{W_N}{L_N} e^{\frac{(V_{GS} - V_{TN})}{\eta_n U_T}} \right) R_P$$
(6.5b)

The difference between the logic high level and the logic low level is the voltage swing,  $V_{SWING}$ , given by (6.6) [104]. This value is generally set to few hundreds of mV.

$$\mathbf{V}_{\text{SWING}} = \mathbf{V}_{\text{OH}} - \mathbf{V}_{\text{OL}} = \left(\mathbf{I}_0 \frac{\mathbf{W}_{\text{N}}}{\mathbf{L}_{\text{N}}} e^{\frac{(\mathbf{V}_{\text{GS}} - \mathbf{V}_{\text{TN}})}{\eta_{\text{n}} \mathbf{U}_{\text{T}}}}\right) \mathbf{R}_{\text{P}}$$
(6.6)

Looking at  $R_P$  again, which is a bulk-drain connected PMOS, the expression for  $R_P$  based on the current flowing through the PMOS and the voltage drop  $V_{SWING}$  across the PMOS given by  $V_{SD}$  is given as per [103] as in (6.7).

$$R_{Psub} = \left(\frac{\eta_p U_T}{I_{SS}}\right) \left(\frac{\frac{e^{V_{SD}}}{e^{U_T} - 1}}{\left(\eta_p - 1\right)e^{U_T} + 1}\right)$$
(6.7)

 $\eta_p$  is the subthreshold slope factor for the PMOS  $M_{r1}$ ,  $U_T = kT/q$  is the thermodynamic voltage,  $V_{SD}$  is the PMOS source-drain voltage and  $I_{SS}$  is the drain current.

131

Considering the small signal voltage gain,  $A_V$ , it is observed from the working of the ST-PFSCL inverter that  $A_V$  can be expressed as per [37] in (6.8).

$$A_{\rm V} = \frac{g_{\rm m,ST}R_{\rm P}}{1-g_{\rm m,ST}R_{\rm P}/2}$$
(6.8)

where  $g_{m,ST}$  is the transconductance of the NMOS transistor in the subthreshold region of operation. As per [104],  $g_{m,ST}$  is given by

$$g_{m,ST} = \frac{I_{SS}}{\eta_n U_T}$$
(6.9)

Using (6.9) and the expression for  $R_P$  as in (6.7) and simplifying, we get expression for  $A_V$  as

$$\mathbf{A}_{V} = \frac{1}{\frac{4\eta_{n}}{\eta_{p}} \left[ \frac{(\eta_{p}-1)e^{-V_{SD}}/U_{T+1}}{V_{SD}/U_{T-1}} \right]^{-1}}$$
(6.10)

From (6.10), we observe that for ST-PFSCL, the small signal voltage gain depends on the subthreshold slope factor for PMOS and NMOS,  $\eta_p$  and  $\eta_n$  and not on the dimensions of the transistors. This is also supported by the simulations.

The propagation delay of the proposed architecture-6 is derived on the basis of the behaviour of the ST-PFSCL inverter. The propagation delay depends on the contribution of parasitic MOS capacitances at the output node and the load capacitance. The parasitic capacitance for the ST-PFSCL inverter gate (Fig. 6.1) is calculated by considering input A is low. For a low-to-high transition on input A, total capacitance at the output node is depicted in Fig. 6.2.



Fig. 6.2 Linear half circuit of ST-PFSCL inverter

The propagation delay  $\tau_{PD}$  can be expressed as

$$\tau_{\text{PD, ST}} = R_{\text{Psub}} C_{\text{out}}$$
(6.11a)

$$\tau_{\text{PD, ST}} = R_{\text{Psub}} (C_{\text{gd,d1}} + C_{\text{db,d1}} + C_{\text{dbr1}} + C_{\text{gdr1}} + C_{\text{gd,f}} + \frac{1}{2}C_{\text{gs,f}} + C_{\text{L}})$$
(6.11b)

Where  $C_{out}$  is the sum of the constituent parasitic capacitances and load capacitance  $C_L$ . However, in the subthreshold region, due to the fact that the dimensions and the voltages are both very low, the load capacitance dominates [104] and the  $\tau_{PD,ST}$  can be expressed as in (6.12).

$$\tau_{\text{PD, ST}} = C_L \frac{V_{\text{SWING}}}{I_{\text{SS}}}$$
(6.12)

Resistance  $R_{Psub}$  is substituted by  $\frac{V_{SWING}}{I_{SS}}$ , showing the inverse dependence of the propagation delay on the bias current. Further, due to the constant current source, there is static power dissipation,  $P_{D,ST}$ , for ST-PFSCL which is constant and is given by (6.13). Using (6.12) and (6.13) the power delay product (PDP<sub>ST</sub>) can be derived as in (6.14).

$$P_{D,ST} = V_{DD} I_{SS} = V_{DD} \left( I_0 \frac{W_N}{L_N} e^{\frac{(V_{GS} - V_{TN})}{\eta_n U_T}} \right)$$
(6.13)

From (6.14), it is seen that the  $PDP_{ST}$  for a ST-PFSCL inverter depends only on  $C_L$  and  $V_{SWING}$  and  $V_{DD}$  and is independent of the bias current. Thus, it is nearly constant for given  $V_{DD}$  while the delay can be independently controlled as per requirement through I<sub>SS</sub>, allowing flexibility in the design for ultra low power applications.

The corresponding power dissipation and PDP for ST-CMOS are given in (6.15) and (6.16), where  $\alpha$  is the activity rate factor [104].

$$P_{D \text{ STCMOS}} = C_L V_{DD}^2 \left( 1 + \frac{2}{\alpha} e^{\frac{-V_{DD}}{\eta_n U_T}} \right) f_{op}$$
(6.15)

PDP <sub>STCMOS</sub> = C<sub>L</sub>V<sup>2</sup><sub>DD</sub> 
$$\left(1+2/\alpha e^{\frac{-V_{DD}}{\eta_n U_T}}\right)$$
 (6.16)

From (6.15) and (6.16), it is seen that  $V_{DD}$  impacts delay,  $P_D$  and PDP for ST-CMOS, in which case any change in  $V_{DD}$  to reduce the  $P_D$  will automatically increase the delay and impact the PDP.

#### 6.2.2 Simulations

To verify the behaviour of the proposed architecture-6 based inverter simulation is carried out in PTM 90nm CMOS technology using a power supply and voltage swing of 0.4 V and 0.2 V respectively. The input output waveform is as shown in Fig.6.3. It can be observed that for the cases when input A is at low logic level, the output is high and vice versa, confirming the inverter functionality.



Fig. 6.3 Simulation waveform of ST-PFSCL inverter

To showcase the performance of ST-PFSCL, a XOR2 gate is designed as it is an integral part of arithmetic circuits, error detection, random number generation etc. Fig. 6.4 shows ST-PFSCL XOR2 gate where three ST-PFSCL NOR2 gates are used to generate the output and  $V_{BN}$  and  $V_{BP}$  are the bias voltages generated by the replica bias circuit to set the I<sub>SS</sub> and PMOS resistance respectively as required. The ST-PFSCL XOR2 is designed for  $V_{DD}$ =0.4V) with voltage swing  $V_{SWING}$ =0.2V for varying I<sub>SS</sub>, ranging from few pico Amperes to hundreds of nano Amperes. The delay  $\tau_{PD, ST}$  versus I<sub>SS</sub> is plotted in Fig. 6.5, from which it is observed that the  $\tau_{PD, ST}$  is inversely proportional to I<sub>SS</sub> as given in (6.12).







(b)

Fig. 6.4 ST-PFSCL a) NOR2 gate b) XOR2 gate



Fig. 6.5 Delay versus  $I_{SS}$  of the ST-PFSCL XOR2 gate



Fig. 6.6 PDP versus delay of the ST-PFSCL XOR2 gate

The delay curves for  $V_{DD} = 0.3V$  and 0.5V has also been plotted in Fig. 6.5 and it is observed that the delay is not dependant on  $V_{DD}$ . The ST-PFSCL XOR2 gate PDP versus delay is plotted in Fig. 6.6. It is seen that for a particular  $V_{DD}$ , the PDP remains constant while allowing wide variation in the delay through control of I<sub>SS</sub>, which is as per equations (6.12) and (6.13).

The behaviour of the proposed architecture-6 based XOR2 gate under effect of process variations is studied through Monte Carlo simulations and the results for 500 simulation runs are presented in Fig. 6.7. It is observed that proposed architecture-6 based XOR2 gate shows a maximum variation of 29.8% for delay & 15.4% for  $V_{SWING}$ .



Fig. 6.7 Monte Carlo results for ST-PFSCL XOR2 a) Delay b) Voltage swing

The process corner analysis at FF, FS, SF and SS for ST-PFSCL XOR2 gate is carried out and the effect on delay and voltage swing is plotted at different design corners and different temperatures in Fig. 6.8. It is observed that the SS process corner and  $T=0^{\circ}C$  gives the highest delay while the FF process corner with  $T=125^{\circ}C$  leads to the lowest delay. Also the FS process corner leads to highest voltage swing for  $T=0^{\circ}C$  while the SF process corner leads to lowest voltage swing for  $T=125^{\circ}C$ . The proposed architecture-6 XOR2 gate functions correctly under various process corners. The behaviour with respect to variation in temperature is also as expected.



Fig. 6.8 Process corner results of the ST-PFSCL XOR2 (a) Delay (b) V<sub>SWING</sub>

As an application, a divide-by-8 circuit is implemented in ST-PFSCL as shown in Fig. 6.9 and the frequency of operation versus power dissipation is plotted in Fig. 6.10, with the curve for ST-CMOS also shown for comparison. Further, the PDP versus delay has also been plotted, as shown in Fig. 6.11. The simulation results for ST-PFSCL are presented for  $V_{DD} = 0.4V$  and  $V_{SWING}=0.2V$  for varying I<sub>SS</sub> while in the case of ST-CMOS, the  $V_{DD}$  is varied from 0.2V to 0.8V.





(a)



Fig. 6.9 a) ST-PFSCL D latch gate b) ST-PFSCL divide-by-8 circuit



Fig. 6.10 Frequency of operation versus power dissipation of the divide-by-8 circuit



Fig. 6.11 PDP versus delay of the divide-by-8 circuit

Thus, as bias current increases, the frequency of operation,  $f_{op,ST}$ , and  $P_{D,ST}$  both increase for ST-PFSCL, as shown in (6.12) and (6.13). However, from Fig. 6.8, it is further noted that for the ST-PFSCL circuit, the  $f_{op,ST}$  is an order of magnitude higher as compared to ST-CMOS circuit for a given  $P_{D,ST}$  and for given a  $f_{op,ST}$ , the  $P_{D,ST}$  is an order of magnitude lower compared to ST-CMOS. From Fig.6.11, it is seen that the PDP<sub>ST</sub> for ST-PFSCL remains nearly constant at 0.4pJ with change in bias current,  $I_{SS}$  from few pico Amperes to hundreds of nano Amperes as given in (6.14). However for ST-CMOS, the PDP<sub>ST-CMOS</sub> changes with  $V_{DD}$  as per (6.16) by varying from PDP <sub>ST-CMOS</sub> of 3pJ at  $V_{DD}$  0.2V to PDP <sub>ST-CMOS</sub> of 18pJ at

 $V_{DD}$  0.5V. This shows that for ST-CMOS the only variable parameter is  $V_{DD}$ , through which design conditions on delay,  $P_{D \text{ ST-CMOS}}$  and PDP <sub>ST-CMOS</sub> have to be met.

### 6.3 Conclusion

The implementation of digital logic in subthreshold region for ultra-low power applications is explored for PFSCL style and logic gates like ST-PFSCL XOR2 and divide-by-8 are simulated. From the simulations, it is observed that for ST-PFSCL, the delay can be varied independently of the supply voltage by varying the bias current over a wide range, leading to nearly constant PDP versus operating frequency. For example, the PDP for ST-PFSCL based divide-by-8 PDP is around 0.4pJ for  $V_{DD}$ = 0.4V independent of the bias current in comparison to ST-CMOS based divide-by-8 circuit with PDP of 8.52pJ at  $V_{DD}$ = 0.4V, which PDP also varies with the power supply. The results indicate that ST-PFSCL based gates add flexibility to the design of ultra-low power applications compared to ST-CMOS, while also being power efficient. This is in contrast to ST-CMOS, where the supply voltage impacts both the delay and the PDP. Thus, the use of ST-PFSCL to implement ultra-low power applications is beneficial as it provides two important variables, namely the bias current and the power supply, through which behaviour of the gate can be designed to satisfy the design conditions. Chapter 7 Conclusion

This chapter provides a final summary of the work done throughout the thesis and summarizes avenues of potential future work that could be built upon the base line principles established here.

## 7.1 Concluding Remarks

This thesis examines PFSCL architectures so as to be able to implement complex logic with reduced delay, power and area. Chapter 2 investigates basic PFSCL operation and provides an overview of the logic style and includes an examination of performance characteristics of basic PFSCL gates. The analysis and design of PFSCL fundamental cell (PFSCL FC) based gate is also detailed to set the background of further elaboration on the topic. The dynamic PFSCL (D-PFSCL) is yet another available style described in this chapter and is dealt in this work.

In third chapter, multithreshold PFSCL architectures are proposed that introduce low threshold voltage transistor in PFSCL FC i) in the PDN in order to reduce the footprint and ii) in the constant current source to lower minimum power supply and hence the power dissipation. The proposed architecture-1, where the PDN is modified, is analysed and its static model for XOR2 gate including small signal voltage gain and noise margin and delay model are derived so that the behaviour can be predicted. It is also confirmed from the simulation results that proposed architecture-1 based full adder leads to an area reduction of 66% while maintaining the same power and delay performance with respect to existing PFSCL FC based full adder. The proposed architecture-2, where the constant current source is modified, is analysed with respect to its impact on reduction in power supply. The expressions for the small signal voltage gain, noise margin and delay for proposed

architecture-2 remain the same as for conventional PFSCL. From simulation results it is observed that proposed architecture-2 leads to a power saving of 18.18% and PDP of 8% with respect to existing PFSCL FC based XOR2 gate.

In chapter four, the existing PFSCL FC that can implement two input logic in a single stage is modified so as to increase the fan-in and thus enable the implementation of complex three input logic in single stage leading to reduction in delay and power dissipation. The proposed architecture-3 introduces an additional transistor in central branch of the triple-tail cell and the resulting structure is named as Quadtail cell. The behaviour of proposed architecture-3 is captured in terms of the output voltage levels, small signal voltage gain, noise margin and delay. Complex three input logic like XOR3 and carry are implemented in a single stage based on proposed architecture-3 and a reduction in delay, power dissipation, PDP of 4.8%, 50%, 52% and 12%, 80%, 82% respectively is observed in comparison to existing PFSCL FC. Hence, the use of proposed architecture-3 can lead to efficient PFSCL circuit design.

To mitigate the static power consumption of PFSCL circuits, D-PFSCL circuits are worked upon in Chapter 5. The existing D-PFSCL provides advantage in terms of reduction in static power, however it needs multiple stages to even implement two input complex logic like XOR2 etc. In proposed architecture-4, the PDN of the existing D-PFSCL is modified by adding a transistor between power supply and common source node so that expressions needing AND-OR functionality can be implemented. The proposed architecture-4 is analysed and the design of the capacitor is discussed followed by the expression for the dynamic power. Multiple gates based on proposed architecture-4 are simulated and it is observed that such implementation leads to a reduction in the delay and power consumption compared to existing D-PFSCL. For the case of proposed architecture-4 MUX8, it leads to a reduction in the delay and power consumption of 51.8% and 38.2% respectively compared to existing D-

PFSCL. Another approach is also suggested to embed AND-OR functionality. This introduces the transmission gates to D-PFSCL and results in proposed architecture-5. The design of the capacitor and expression for dynamic power are included followed by the illustration of the usability and advantages of the proposed architecture-5 through simulation of gates such as XOR3, MUX8, full adder etc. For example, it is observed through simulation results that for proposed architecture-5 based MUX8 gate, the maximum reduction in delay and power dissipation is 10.8%, 95.1% with respect to proposed architecture-4 MUX8 and 92.7%, 98.3% with respect to existing D-PFSCL MUX8. Comparing the delay and power consumption for MUX8 based on proposed architecture-4 and proposed architecture-5 respectively, it is observed that proposed architecture-5 performs better.

In chapter six, the behaviour of the circuit implemented using PFSCL style operating in subthreshold region is explored as proposed architecture-6. Using expression for bias current in the subthreshold region, the basic principles for design of PFSCL in subthreshold region are identified using which a XOR2 and divide-by-8 are designed. Through simulations, the frequency of operation and the power dissipation is also analysed and trends are noted that will help in design of ST-PFSCL circuits for specific ultra-low power applications. It is also noted that the circuits implemented using PFSCL style in subthreshold region offers benefits compared to CMOS in subthreshold region in that it adds flexibility to the design of ultra-low power applications by having two variables namely the bias current and the power supply, through which behaviour of the gate can be designed to satisfy the design conditions.

# 7.2 Avenues for future work

The source coupled logic is an efficient way to implement digital circuits used in mixed signal applications. It has advantage of eliminating/ lowering switching noise generated in its CMOS based counterparts along with the power consumption is lower at higher frequencies.

Here, work is done to increase fan-in capabilities and reduce power consumption. Further, the impact of the use of multithreshold transistor in PFSCL style is explored.

Some of the avenues for future work that can be taken up are:

- 1. The proposed architectures may be used to develop applications such as ring oscillators, high performance arithmetic circuits, digital filters etc.
- 2. The PFSCL circuits may further be designed to accommodate higher fan-in in a single gate for improving performance in terms of power consumption and delay.
- 3. Carbon Nanotube FET (CNTFET) and multigate device such as fin FET (FinFET) have emerged as an alternative to CMOS technology due to its good scaling ability, high ON current, reduced threshold voltage variations, better sub-threshold slope and short-channel effect. The PFSCL circuits may be designed and developed using these technologies.
- 4. The work on ST-PFSCL may be extended to explore architectures that lead to lesser number of gates for implementation and develop circuits for biomedical applications.

# References

- [1] G. Yeap, *Practical Low Power Digital VLSI Design*. Springer US, 1998.
- [2] I. Fujimori, K. Koyama, D. Trager, F. Tam, and L. Longo, "A 5-V single-chip deltasigma audio A/D converter with 111 dB dynamic range," *IEEE J. Solid-State Circuits*, vol. 32, no. 3, pp. 329–336, Mar. 1997.
- [3] S. A. Jantzi, K. W. Martin, and A. S. Sedra, "Quadrature bandpass ΔΣ modulation for digital radio," *IEEE J. Solid-State Circuits*, vol. 32, no. 12, pp. 1935–1949, Dec. 1997.
- [4] J. Sneep and P. J. A. Naus, "A Bit-Stream Digital-to-Analog Converter with 18-b Resolution," *IEEE J. Solid-State Circuits*, vol. 26, no. 12, pp. 1757–1763, 1991.
- [5] H. A. Leopold, G. Winkler, P. O'Leary, K. Ilzer, and J. Jernej, "A Monolithic Cmos
   20-B Analog-To-Digital Converter," *IEEE J. Solid-State Circuits*, vol. 26, no. 7, pp.
   910–916, 1991.
- [6] B. P. Del Signore, D. A. Kerth, N. S. Sooch, and E. J. Swanson, "A Monolithic 20-b Delta-Sigma A/D Converter," *IEEE J. Solid-State Circuits*, vol. 25, no. 6, pp. 1311–1317, 1990.
- [7] N. H. E. Weste and D. M. Harris, *CMOS VLSI design : A circuits and systems perspective*, 4th ed. Pearson Education India.
- [8] D. J. Allstot, San-Hwa Chee, S. Kiaei, and M. Shrivastawa, "Folded source-coupled logic vs. CMOS static logic for low-noise mixed-signal ICs," *IEEE Trans. Circuits Syst. I Fundam. Theory Appl.*, vol. 40, no. 9, pp. 553–563, 1993.
- [9] D. K. Su, M. J. Loinaz, S. Masui, and B. A. Wooley, "Experimental results and modeling techniques for substrate noise in mixed-signal integrated circuits," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 420–430, Apr. 1993.

- S. Masui, "Simulation of substrate coupling in mixed-signal MOS circuits," in 1992
   Symposium on VLSI Circuits Digest of Technical Papers, 1992, pp. 42–43.
- [11] B. R. Stanisic, N. K. Verghese, R. A. Rutenbar, L. R. Carley, and D. J. Allstot, "Addressing substrate coupling in mixed-mode ICs: simulation and power distribution synthesis," *IEEE J. Solid-State Circuits*, vol. 29, no. 3, pp. 226–238, Mar. 1994.
- [12] C. S. Choy, C. F. Chan, and M. H. Ku, "Feedback control circuit design technique to suppress power noise in high speed output driver," *Proc. - IEEE Int. Symp. Circuits Syst.*, vol. 1, pp. 307–310, 1995.
- [13] C. S. Choy, C. F. Chan, M. H. Ku, and J. Povazanec, "Design procedure of low-noise high-speed adaptive output drivers," *Proc. - IEEE Int. Symp. Circuits Syst.*, vol. 3, pp. 1796–1799, 1997.
- [14] S. Kiaei, S. H. Chee, and D. Allstot, "CMOS source-coupled logic for mixed-mode VLSI," in *Proceedings IEEE International Symposium on Circuits and Systems*, 1990, vol. 2, pp. 1608–1611.
- [15] S. Kiaei and D. J. Allstot, "Low-noise logic for mixed-mode VLSI circuits," *Microelectronics J.*, vol. 23, no. 2, pp. 103–114, Apr. 1992.
- [16] R. T. L. Saez, M. Kayal, M. Declercq, and M. C. Schneider, "Digital circuit techniques for mixed analog/digital circuits applications," *Proc. IEEE Int. Conf. Electron. Circuits, Syst.*, vol. 2, pp. 956–959, 1996.
- [17] R. Senthinathan and J. L. Prince, "Application Specific CMOS Output Driver Circuit Design Techniques to Reduce Simultaneous Switching Noise," *IEEE J. Solid-State Circuits*, vol. 28, no. 12, pp. 1383–1388, 1993.
- [18] J. Kundan and S. M. R. Hasan, "Current mode BiCMOS folded source-coupled logic circuits," Proc. - IEEE Int. Symp. Circuits Syst., vol. 3, pp. 1880–1883, 1997.

- [19] X. Bai and M. Kameyama, "Low-power multiple-valued source-coupled logic circuits using dual-supply voltages for a reconfigurable VLSI," in *Proceedings of The International Symposium on Multiple-Valued Logic*, 2013, pp. 164–169.
- [20] R. T. L. Saez, M. Kayal, M. Declercq, and M. C. Schneider, "Design guidelines for CMOS current steering logic," *Proc. - IEEE Int. Symp. Circuits Syst.*, vol. 3, pp. 1872–1875, 1997.
- [21] E. Albuquerque, J. Fernandes, and M. Silva, "NMOS current-balanced logic," *Electron. Lett.*, vol. 32, no. 11, pp. 997–998, May 1996.
- [22] L. Yang and J. S. Yuan, "Enhanced techniques for current balanced logic in mixedsignal ICs," *Proc. IEEE Comput. Soc. Annu. Symp. VLSI, ISVLSI*, vol. 2003-January, pp. 278–279, 2003.
- [23] E. F. M. Albuquerque and M. M. Silva, "An experimental comparison of substrate noise generated by CMOS and by low-noise digital circuits," *Proc. - IEEE Int. Symp. Circuits Syst.*, vol. 2, 2004.
- [24] P. Saxena, S. K. M, and C. V. B, "Design of a Novel Current Balanced Voltage Controlled Delay Element," *Int. J. VLSI Des. Commun. Syst.*, vol. 5, no. 3, pp. 37–45, Jun. 2014.
- [25] H. T. Ng and D. J. Allstot, "CMOS current steering logic for low-voltage mixedsignal integrated circuits," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 5, no. 3, pp. 301–308, 1997.
- [26] S. Radiom, B. Sheikholeslami, H. Aminzadeh, and R. Lotfi, "Folded-current-steering DAC: An approach to low-voltage high-speed high-resolution D/A converters," *Proc. IEEE Int. Symp. Circuits Syst.*, pp. 4783–4786, 2006.

- [27] D. Y. Jeong, S. H. Chai, W. C. Song, and G. H. Cho, "CMOS current-controlled oscillators using multiple-feedback-loop ring architectures," *Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf.*, vol. 40, pp. 386–387, Feb. 1997.
- [28] S. R. Maskai, S. Kiaei, and D. J. Allstot, "Synthesis Techniques for CMOS Folded Source-Coupled Logic Circuits," *IEEE J. Solid-State Circuits*, vol. 27, no. 8, pp. 1157–1167, 1992.
- [29] J. Kundan and S. M. Rezaul Hasan, "Enhanced folded source-coupled logic technique for low-voltage mixed-signal integrated circuits," *IEEE Trans. Circuits Syst. II Analog Digit. Signal Process.*, vol. 47, no. 8, pp. 810–817, Aug. 2000.
- [30] M. Maleki and S. Kiaei, "Enhancement Source-Coupled Logic for Mixed-Mode VLSI Circuits," *IEEE Trans. Circuits Syst. II Analog Digit. Signal Process.*, vol. 39, no. 6, pp. 399–402, 1992.
- [31] M. Yamashina and H. Yamada, "An MOS Current Mode Logic (MCML) Circuit for Low-Power Sub-GHz Processors," IEICE Trans. Electron., vol. E75-C, no. 10, pp. 1181–1187, 1992.
- [32] M. Yamashina *et al.*, "Low-supply voltage GHz MOS integrated circuit for mobile computing systems," *IEEE Symp. Low Power Electron.*, pp. 80–81, 1994.
- [33] M. Mizuno et al., "A GHz MOS adaptive pipeline technique using MOS currentmode logic," IEEE J. Solid-State Circuits, vol. 31, no. 6, pp. 784–790, Jun. 1996.
- [34] J. M. Musicer and J. Rabaey, "MOS current mode logic for low power, low noise CORDIC computation in mixed-signal environments," in *Proceedings of the 2000 international symposium on Low power electronics and design - ISLPED '00*, 2000, pp. 102–107.
- [35] M. Alioto and G. Palumbo, *Model and design of bipolar and MOS current-mode logic : CML, ECL and SCL digital circuits.* Springer, 2005.

- [36] O. Musa and M. Shams, "An efficient delay model for MOS current-mode logic automated design and optimization," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 57, no. 8, pp. 2041–2052, 2010.
- [37] M. Alioto, L. Pancioni, S. Rocchi, and V. Vignoli, "Modeling and Evaluation of Positive-Feedback Source-Coupled Logic," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 51, no. 12, pp. 2345–2355, Dec. 2004.
- [38] A. H. Ismail, M. Sharifkhani, and M. I. Elmasry, "On the design of low power MCML based ring oscillators," in *Canadian Conference on Electrical and Computer Engineering*, 2004, vol. 4, pp. 2383–2386.
- [39] M. Alioto and G. Palumbo, "Power-delay optimization of D-latch/MUX source coupled logic gates," *Int. J. Circuit Theory Appl.*, vol. 33, no. 1, pp. 65–86, Jan. 2005.
- [40] A. Tanabe *et al.*, "0.18-μm CMOS 10-Gb/s multiplexer/demultiplexer ICs using current mode logic with tolerance to threshold voltage fluctuation," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 988–996, Jun. 2001.
- [41] P. Heydari and R. Mohavavelu, "Design of ultra high-speed CMOS CML buffers and latches," Proc. - IEEE Int. Symp. Circuits Syst., vol. 2, 2003.
- [42] M. Alioto, R. Mita, and G. Palumbo, "Design of High-Speed Power-Efficient MOS Current-Mode Logic Frequency Dividers," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 53, no. 11, pp. 1165–1169, Nov. 2006.
- [43] K. Gupta, Radhika, N. Pandey, and M. Gupta, "A novel high speed MCML square root carry select adder for mixed-signal applications," in *IMPACT 2013 - Proceedings* of the International Conference on Multimedia Signal Processing and Communication Technologies, 2013, pp. 194–197.

- [44] B. Liang, K. Ma, Z. Ding, and X. Fu, "The structure design of MOS current mode logic adder," 2012 Int. Conf. Microw. Millim. Wave Technol. ICMMT 2012 Proc., vol. 4, pp. 1396–1399, 2012.
- [45] M. Alioto and Y. Leblebici, "Analysis and design of ultra-low power subthreshold MCML gates," *Proc. - IEEE Int. Symp. Circuits Syst.*, pp. 2557–2560, 2009.
- [46] S. Badel and Y. Leblebici, "Tri-state buffer/bus driver circuits in MOS current-mode logic," *Proc. 2007 Ph.D Res. Microelectron. Electron. Conf. PRIME 2007*, pp. 237–240, 2007.
- [47] K. Gupta, N. Pandey, and M. Gupta, "Low-power tri-state buffer in MOS current mode logic," *Analog Integr. Circuits Signal Process. 2013 751*, vol. 75, no. 1, pp. 157–160, Jan. 2013.
- [48] H. Hassan, M. Anis, and M. Elmasry, "Analysis and design of low-power multithreshold MCML," in *Proceedings - IEEE International SOC Conference*, 2004, pp. 25–29.
- [49] M. Anis, M.Elmasry, *Multi-Threshold CMOS Digital Circuits*. Springer 2003.
- [50] A. Tajalli, E. Vittoz, Y. Leblebici, and E. J. Brauer, "Ultra low power subthreshold MOS current mode logic circuits using a novel load device concept," in *ESSCIRC* 2007 - Proceedings of the 33rd European Solid-State Circuits Conference, 2007, pp. 304–307.
- [51] G. Scotti, D. Bellizia, A. Trifiletti, and G. Palumbo, "Design of Low-Voltage High-Speed CML D-Latches in Nanometer CMOS Technologies," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 25, no. 12, pp. 3509–3520, Dec. 2017.
- [52] M. H. Anis and M. I. Elmasry, "Self-timed MOS current mode logic for digital applications," *Proc. IEEE Int. Symp. Circuits Syst.*, vol. 5, 2002.

- [53] N. Kalantari and M. M. Green, "All-CMOS high-speed CML gates with active shuntpeaking," *Proc. - IEEE Int. Symp. Circuits Syst.*, pp. 2554–2557, 2007.
- [54] S. Badel and Y. Leblebici, "An inductorless peaking technique applied to MOS current-mode logic gates," *Proc. - Norchip*, pp. 36–39, 2004.
- [55] J. B. Kim, "Low-power MCML circuit with sleep-transistor," ASICON 2009 Proc.
  2009 8th IEEE Int. Conf. ASIC, pp. 25–28, 2009.
- [56] M. W. Allam and M. I. Elmasry, "Dynamic current mode logic (DyCML): a new lowpower high-performance logic style," *IEEE J. Solid-State Circuits*, vol. 36, no. 3, pp. 550–558, Mar. 2001.
- [57] K. Gupta, N. Pandey, and M. Gupta, "Analysis and design of MOS current mode logic exclusive-OR gate using triple-tail cells," *Microelectronics J.*, vol. 44, no. 6, pp. 561–567, Jun. 2013.
- [58] N. Pandey, K. Gupta, and B. Choudhary, "New Proposal for MCML Based Three-Input Logic Implementation," VLSI Des., vol. 2016, 2016.
- [59] N. Saxena, S. Dutta, N. Pandey, and K. Gupta, "Implementation and Performance Comparison of a Four-Bit Ripple-Carry Adder Using Different MOS Current Mode Logic Topologies," in *ICCSA 2017 Lecture Notes in Computer Science*, 2017, vol. 10409 LNCS, pp. 299–313.
- [60] M. Alioto, A. Fort, L. Pancioni, S. Rocchi, and V. Vignoli, "Positive-Feedback Source-Coupled Logic: a delay model," in 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), pp. II-641–4.
- [61] M. Alioto, A. Fort, L. Pancioni, S. Rocchi, and V. Vignoli, "An approach to the design of PFSCL gates," in *Proceedings - IEEE International Symposium on Circuits* and Systems, 2005, pp. 2437–2440.

- [62] M. Alioto, L. Pancioni, S. Rocchi, and V. Vignoli, "Power–Delay–Area–Noise Margin Tradeoffs in Positive-Feedback MOS Current-Mode Logic," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 54, no. 9, pp. 1916–1928, Sep. 2007.
- [63] K. Gupta, R. Sridhar, J. Chaudhary, N. Pandey, and M. Gupta, "Performance comparison of MCML and PFSCL gates in 0.18 m CMOS technology," in 2011 2nd International Conference on Computer and Communication Technology, ICCCT-2011, 2011, pp. 230–233.
- [64] M. Alioto, L. Pancioni, S. Rocchi, and V. Vignoli, "Exploiting Hysteresys in MCML Circuits," *IEEE Trans. Circuits Syst. II Express Briefs*, vol. 53, no. 11, pp. 1170–1174, 2006.
- [65] K. Gupta, N. Pandey, and M. Gupta, "Performance improvement of PFSCL gates through capacitive coupling," in *IMPACT 2013 - Proceedings of the International Conference on Multimedia Signal Processing and Communication Technologies*, 2013, pp. 185–188.
- [66] N. Pandey, K. Gupta, and M. Gupta, "An efficient triple-tail cell based PFSCL D latch," *Microelectronics J.*, vol. 45, no. 8, pp. 1001–1007, Aug. 2014.
- [67] K. Gupta, N. Pandey, and M. Gupta, "Dynamic positive feedback source-coupled logic (D-PFSCL)," *Int. J. Electron.*, vol. 103, no. 10, pp. 1626–1638, Oct. 2016.
- [68] Kirti Gupta, Neeta Pandey, Maneesha Gupta, Model and Design of Improved Current Mode Logic Gates: Differential and Single-ended. Springer 2020.
- [69] N. Pandey, M. Gupta, and K. Gupta, "A PFSCL based configurable logic block," in 2015 Annual IEEE India Conference (INDICON), 2015, pp. 1–4.
- [70] K. Gupta, P. Shukla, and N. Pandey, "On the implementation of PFSCL adders," in 2016 Second International Innovative Applications of Computational Intelligence on

Power, Energy and Controls with their Impact on Humanity (CIPECH), 2016, pp. 287–291.

- [71] K. Gupta, U. Mittal, R. Baghla, P. Shukla, and N. Pandey, "On the implementation of PFSCL serializer," in 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN), 2016, pp. 436–440.
- [72] K. Gupta, U. Mittal, R. Baghla, and N. Pandey, "Implementation of PFSCL demultiplexer," in 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT), 2016, pp. 490–494.
- [73] A. Tyagi, N. Pandey, and K. Gupta, "PFSCL based Linear Feedback Shift Register," in 2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT), 2016, pp. 580–585
- [74] R. K. Agrawal, N. Pandey, and K. Gupta, "Implementation of PFSCL razor flipflop," in 2017 International Conference on Computing Methodologies and Communication (ICCMC), 2017, pp. 6–11.
- [75] N. Pandey, B. Choudhary, K. Gupta, and A. Mittal, "Bus implementation using new low power PFSCL tristate buffers," *Act. Passiv. Electron. Components*, vol. 2016, 2016.
- [76] N. Pandey, B. Choudhary, K. Gupta, and A. Mittal, "New Sleep-Based PFSCL Tri-State Inverter/Buffer Topologies," *J. Circuits, Syst. Comput.*, vol. 26, no. 12, Dec. 2017.
- [77] S. Naman, D. Shruti, and P. Neeta, "An efficient hybrid pfscl based implementation of asynchronous pipeline," *i-manager's J. Circuits Syst.*, vol. 4, no. 3, p. 6, 2016.
- [78] F. Besharati, A. Golmakani, and S. Babayan-Mashhadi, "Design of a LP Currentmode Comparator Based on Positive-Feedback Source Coupled Logic," in *Electrical Engineering (ICEE), Iranian Conference on*, 2018, pp. 224–227.

- [79] N. Pandey and A. Tyagi, "A Modified Configurable Cell for Complex Function Realization in PFSCL Style," J. Multi Discip. Eng. Technol., vol. 13, no. 2, pp. 63–70, 2019.
- [80] Yuhua Cheng, Chenming Hu, Mosfet Modeling & BSIM3 User's Guide. Springer 2002.
- [81] S.-M. Kang, Y. Leblebici, and C. Kim, *CMOS digital integrated circuits : Analysis and Design*, 4th ed. McGraw-Hill Higher Education, 2014.
- [82] A. Tajalli, E. J. Brauer, and Y. Leblebici, "Ultra-low power 32-bit pipelined adder using subthreshold source-coupled logic with 5 fJ/stage PDP," *Microelectronics J.*, vol. 40, no. 6, pp. 973–978, Jun. 2009.
- [83] A. Tajalli and Y. Leblebici, "Subthreshold current-mode oscillator-based quantizer with 3-decade scalable sampling rate and pico-Ampere range resolution," in 2010 Proceedings of ESSCIRC, 2010, pp. 174–177.
- [84] A. Tajalli and Y. Leblebici, "Ultra-low power mixed-signal design platform using subthreshold source-coupled circuits," in *Proceedings of the Conference on Design*, *Automation and Test in Europe*, 2010, pp. 711–716.
- [85] A. Tajalli, E. J. Brauer, Y. Leblebici, and E. Vittoz, "Subthreshold Source-Coupled Logic Circuits for Ultra-Low-Power Applications," *IEEE J. Solid-State Circuits*, vol. 43, no. 7, pp. 1699–1710, Jul. 2008.
- [86] A. Tajalli and Y. Leblebici, "Design trade-offs in ultra-low-power CMOS and STSCL digital systems," in 2011 20th European Conference on Circuit Theory and Design (ECCTD), 2011, pp. 544–547.
- [87] M. Shoaran, A. Tajalli, M. Alioto, A. Schmid, and Y. Leblebici, "Analysis and Characterization of Variability in Subthreshold Source-Coupled Logic Circuits," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 62, no. 2, pp. 458–467, Feb. 2015.

- [88] N. Katic, I. Kazi, A. Tajalli, A. Schmid, and Y. Leblebici, "A subthreshold currentsensing ΣΔ modulator for low-voltage and low-power sensor interfaces," *Int. J. Circuit Theory Appl.*, vol. 43, no. 11, pp. 1597–1614, Nov. 2015.
- [89] M. Beikahmadi, A. Tajalli, and Y. Leblebici, "A Subthreshold SCL Based Pipelined Encoder for Ultra-Low Power 8-bit Folding/Interpolating ADC," in 2008 NORCHIP, 2008, pp. 9–12.
- [90] H. Hassan, H. W. Kim, and S. Ibrahim, "Design and Investigation of Configurable Source Coupled Logic," in *Proceedings of the International Conference on Microelectronics, ICM*, 2018, vol. 2018-December, pp. 283–286.
- [91] J. Ahmadi-Farsani, H. Sadjedi, and M. B. Ghaznavi-Ghoushchi, "An ultra low-power current-mode clock and data recovery design with input bit-rate adaptability for biomedical applications in CMOS 90 nm," *Integration*, vol. 62, pp. 238–245, Jun. 2018.
- [92] M. Zangeneh and A. Joshi, "Designing Tunable Subthreshold Logic Circuits Using Adaptive Feedback Equalization," *IEEE Trans. Very Large Scale Integr. Syst.*, vol. 24, no. 3, pp. 884–896, Mar. 2016.
- [93] A. G. Andreou, K. A. Boahen, P. O. Pouliquen, A. Pavasovic, R. E. Jenkins, and K. Strohbehn, "Current-mode subthreshold MOS circuits for analog VLSI neural systems," *IEEE Trans. Neural Networks*, vol. 2, no. 2, pp. 205–213, Mar. 1991.
- [94] K. P. Cheung, "On the 60 mV/dec @300 K limit for MOSFET subthreshold swing," in Proceedings of 2010 International Symposium on VLSI Technology, System and Application, VLSI-TSA 2010, 2010.
- [95] S. H. Zadeh, T. Ytterdal, and S. Aunet, "Ultra-Low Voltage Subthreshold Binary Adder Architectures for IoT Applications: Ripple Carry Adder or Kogge Stone Adder," in 2019 IEEE Nordic Circuits and Systems Conference, NORCAS 2019: 159

NORCHIP and International Symposium of System-on-Chip, SoC 2019 - Proceedings, 2019.

- [96] H. Soeleman, K. Roy, and B. C. Paul, "Robust subthreshold logic for ultra-low power operation," *IEEE Trans. Very Large Scale Integr. Syst.*, 2001.
- [97] B. Zhai *et al.*, "Energy-efficient subthreshold processor design," *IEEE Trans. Very Large Scale Integr. Syst.*, 2009.
- [98] B. H. Calhoun, A. Wang, and A. Chandrakasan, "Modeling and sizing for minimum energy operation in subthreshold circuits," in *IEEE Journal of Solid-State Circuits*, 2005, vol. 40, no. 9, pp. 1778–1785.
- [99] M. S. Ansari, R. Sinha, and S. Khot, "Ultra-low power 50/60Hz notch filter for biomedical signal acquisition using 32nm ± 0.15V Bulk-Driven subthreshold CMOS OTAs," in 2017 4th International Conference on Electrical and Electronics Engineering, ICEEE 2017, 2017, pp. 309–313.
- [100] N. Pandey, R. Pandey, T. Mittal, K. Gupta, and R. Pandey, "Ring and coupled ring oscillator in subthreshold region," in 2014 International Conference on Signal Propagation and Computer Technology, ICSPCT 2014, 2014, pp. 132–136.
- [101] C. C. Enz, F. Krummenacher, and E. A. Vittoz, "An analytical MOS transistor model valid in all regions of operation and dedicated to low-voltage and low-current applications," *Analog Integr. Circuits Signal Process*, vol. 8, no. 1, pp. 83–114, Jul. 1995.
- [102] E. A. Vittoz, "Weak Inversion for Ultimate Low-Power Logic," in Low-Power CMOS Circuits, CRC Press, 2018, pp. 16-1-16–18.
- [103] F. Cannillo, C. Toumazou, and T. S. Lande, "Nanopower Subthreshold MCML in Submicrometer CMOS Technology," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 56, no. 8, pp. 1598–1611, Aug. 2009.

[104] A. Tajalli and Y. Leblebici, Extreme low-power mixed signal IC design: Subthreshold cource-coupled circuits. Springer New York, 2010.