Multi-Port Memory Design in Quantum Cellular Automata Using Logical Crossing

: Memory and its data communication play a vital role in deciding the performance of a Processor. In order to obtain a high performance computing machine, memory access has to be equally faster. In this paper, Dual port memory with Set/Reset is designed using Majority Voter in Quantum-dot Cellular Automata (QCA). Dual port memory consists of basic functional blocks such as 2 to 4 decoder, Control Logic Block (CLB), Address Checker Block (ACB), Memory Cell (MC), Data Router block and Input/Output block. These functional units are constructed using the 3-input majority voters. QCA is one of the recent technologies for the design of nanometer level digital components. The functionality of Dual Port Memory has been simulated and verified in QCADesigner 2.0.3. A novel crossover method called Logical Crossing is utilized to improve the area of the proposed design. The logical crossing does the data transmission with the support of proper Clock zone assignment. The logical crossing based QCA layouts are optimized in terms of area and number of cell counts. It is observed that 29.81%, 18.27%, 8.32%, 11.57% and 3.69% are the percentage of improvement in the number of cells in Decoder, ACB, CLB, Data Router and Memory Cell respectively. Also, 25.71%, 16.83%, 8.62%, 4.74% and 3.73% of improvement is achieved in the area for Decoder, ACB, CLB, Data Router and Memory Cell respectively. In addition to that the proposed Dual port memory using logical crossing attains improvement in the area by 8.26%; that is made possible due to the 8.65% reduction in the number of cells required for its construction. Moreover, the quantum circuits of the RAM are obtained using the RCViewer+ tool. The quantum cost, constant inputs, the number of gates, garbage output and total cost are estimated as 285, 67, 57, 50 and 516 respectively.


Introduction
In a processor, the memory plays a critical role in deciding the performance of computation. The main characteristics of RAM are Reliability, Availability and Maintainability. The computation speed depends upon the design of memory architecture and the speed of communication. The communication speed indirectly depends on the architectural design of memory. The improvement in architectural design increases the performance of the system.
In CMOS (Complementary Metal Oxide Semiconductor), the memory architecture has nearly reached its saturation point. There comes a need for technological improvement at this moment, which can be achieved through the realization of digital structures in nanometer quantum cellular automata (QCA) [1]. QCA is a fast, ultra-low power and provides high packing density compared to other emerging nanotechnologies [1]. Further, the performance of the QCA can be enhanced by incorporating novel architectures. Generally, coplanar (single layer) and multilayer crossing is adopted in QCA wiring. However, usage of Multilayer architecture consumes less area and exhibits higher performance compared to coplanar wiring [2]. Logical Crossing is a new kind of wire crossing that exhibits a better performance than its predecessors. [3].
Two kinds of memory architectures can be realized in QCA. They are (1) line based and (2) loop based memory. In line-based memory, the four QCA clocking signals are used for storing the values in the cell; whereas the loop-based structure maintains the data using feedback in the circuit [1,4]. The line-based method is very simple, but the reliability is questionable. While the loop-based method has better reliability and it is realized using multiplexer logic and latches [5].
The remaining parts of the paper are organized as follows; Section 2 explains the definitions of performance measures. Section 3 discusses the outcomes of the RAM related literature reviews. Section 4 deals with the novel logical crossing for interconnections in QCA. Section 5 elaborates on the proposed RAM design and its QCA realization. The simulation results obtained are discussed in Section 6. Finally, the paper is concluded with suggestions for future research.

Preliminaries performance metrics 2.1 Quantum Cost
The Quantum Cost of a circuit is defined as the total number of elementary quantum gates (primitive gates) that are needed to realize a given function [6]. The quantum cost of reversible preliminary gates such as Feynman, Toffoli and Fredkin gates are 1, 5 and 5 respectively [6].

Garbage Output
Garbage Output is defined as the number of unused outputs of the reversible circuit. Based on the requirement, these outputs are introduced to maintain the property of reversibility in the circuit [7].

Constant Input
Constant Input is a predefined input (Logic '0' or '1') in order to obtain the desired output function from the reversible gate. The input is kept constant at either '0' or '1' during the entire computation.

Logical Calculations
It is defined as the number of NOT (γ), AND (β) and XOR (α) operations that are required to obtain a desirable output function in reversible logic [8]. It indicates the hardware complexity of the circuit.

Number of Gates
It is the total number of gates required to realize the desired function. It is measured from the circuit's input to its output.

Total Cost
It is a sum of the Quantum Cost, Constant Input, Garbage output and Number of Gates.

QCA
Due to the increasing growth of electronic tools, parameters such as speed, area, processing power, energy consumption and density in the design of these tools are highly important [9]. In this regard, new technologies and designs are always being presented to resolve the disadvantages and make necessary improvements. One of the proposed technologies that try to advance in the digital electronics industry is quantum cellular automata (QCA) nanotechnology. This technology, which progresses constantly, has higher speed, density; also consumes far lower area and energy compared to the existing technologies [10].
In QCA technology, there can be two types of cells, which are 45°and 90°cells. A QCA cell contains four quantum dots that are located in the square corners. The electrons occupy the two corners of the QCA cell diagonally. A QCA wire can be designed simply by placing QCA cells next to each other. The length of the QCA wires should not be high since it leads to signal drop and can cause trouble in the circuit operation. Once the polarization of a QCA cell is fixed, the encoded binary information is transferred to the adjacent cells [11].
The three input majority gate and inverters are the basic structures of QCA circuits. Designing this gate is difficult in other technologies, but in this technology, the five QCA cells are arranged in a way to generate the majority gate. According to the structure and equation of three input majority gate, it is observed that the AND and OR gates can be constructed based on two inputs by inserting a fixed value into one of the three input cells (logical one for OR and logical zero for AND) [12].

QCA Clocking
Every QCA based circuit requires a clocking mechanism for synchronization, flow control management, and provision of power to stimulate a circuit. This synchronization process is performed through QCA Clocking [13] as shown in Fig.1. The clocking of QCA can be accomplished by controlling the potential barriers between adjacent quantum dots. In QCA, the data flow path is based on the path along which the clocking phase increases [14]. It must be noticed that clocking phases should increase in turn unless the circuit does not function as expected. The control of data flow is one of the specific characteristics of QCA. This inherent characteristic helps designers in developing more optimized novel structures for digital circuits [15].

Related works in RAM
Timing of the clocking zones requires two additional clocks to implement a four step process for reading/writing data to the memory in line based parallel memory [16]. The parallel hybrid memory architecture reduces the area and latency. The area, number of interconnection and latency can be improved by proper QCA layout [17].
Various kinds of RAM architectures are presented and their performances are analyzed in terms of number of cells, area, number of clocks and cell delay as shown in Table 1. Then the best and worst-case performances are identified based on the latency and area of the presented layout design [5]. Set/Reset signals are introduced in the recent RAM cell designs. The inclusion of Set/Reset does not increase the number of gates [18], but the number of gates is reduced in [19] through optimum realization. The number of QCA cells increases when the RAM layout is realized using regular clock zones with Latches (D or SR), but it reduces the clock latency [20]. The number of QCA cells, area, the number of gates and latency of RAM are reduced with the usage of 3-input and 5-input majority gate [19] in the architecture. Also, the removal of coplanar wires (crossover) [20,21] and the effective layout arrangement of QCA cells make it a robust and noise free design [18,19].
Initially, single port RAM is designed using SR/D Latch with Loop-based concept in QCA. The performance is improved by incorporating the 5-input majority gate and efficient layout design. But, the present-day processors are expecting RAM with multiple ports and high capacity. So recently, a 4×4 RAM is designed with two ports using majority voters in QCA [22,23]. The major objective of this RAM design is to avoid cross- Multilayer -Yes ----ings in the entire layout [22]. But, multilayer crossing is adopted in [23] to reduce the number of cells and area.

Limitations of the existing design
From the above analysis of the existing RAM designs, the following observations are made, -Single port memory is designed with or without crossover in [19][20][21]. In order to overcome the limitations of the existing designs, a Multiport 4×4 RAM is proposed in this paper, which is being realized in the QCA layout using Logical Crossing.

Logical crossing
Two major wire crossing techniques are popularly used in QCA data transmission: coplanar and multilayer. Each of them has its own advantages and disadvantages as shown in Table 2 [3,24].
In order to combine the advantages of both methods, a new wire crossing technique is introduced in [3] named Logical Crossing. In logical crossing, the data are trans-mitted to adjacent cells by operating them with clock signals shifted in phase by 180º clock phase. When two cells are in locked and locking stages, the Coulomb repulsion between them makes the data transmission possible. In other clock zones, the data transmission does not occur. The detailed clock zone information for data transfer in QCA is shown in Table 3.
Hence the cells in the hold and relax phase can cross each other without polarization effect. The diagrammatical view of data transmission in coplanar, multilayer and logical crossing are shown in Fig.2 and their corresponding QCA simulation waveforms are shown in Fig. 3 The logical crossing method is used to construct XOR function [3], 2×1 multiplexer [25], Full Adder [26] and complex logical functions [24]. The incorporation of logical crossing in all these references reduces the number of cells and area in the QCA layout.

Proposed dual port RAM design
In dual port memory, a user can access two memory locations at a time. A data conflict may occur when two users access (among them at least one is write opera-   Fig. 5 (a to e). The decoder provides the address to the two ports (Port A and Port B). ACB checks the address of the two ports (i.e. to check whether the address of the two ports is the same). CLB generates the priority for the two ports, if Port A has higher priority, it accesses that memory cell first; followed by Port B. DRB provides the write/read operation (Data Route) to the two ports. Memory cell stores the single bit and it performs write/read operations.

Decoder
In dual port memory, two memory cells are selected at the same time for write/read operation. So, two decoders are used to generate the addresses for both the ports as shown in Fig.5a. The decoders are used to generate a 'row select' signal for addressing an appropriate memory array which is shown in Table 4.

Address Checker Block (ACB)
The addresses of the two decoders are compared to check whether they are similar or not. If the output of Address Checker Block (ACB) as shown in Fig.5b, X is '1' then addresses of the ports are same. Table 5 shows the operational output of ACB, where D iR and D iL represent the right and left decoder outputs respectively. If D 0R and

Control Logic Block (CLB)
The control logic block (CLB) has inputs of Data Router signals of two ports and a control input. CLB produces the priorities as well as the Data Router for both ports.
If the addresses are different, no conflict occurs at all. So, after the ACB block, if the addresses are unmatched, the control logic block simply passes the same Data Router data for both ports as supplied to the input lines of this block. For the same addresses, outputs will be chosen based on the control input (S 0 ) as per Table 6.  If the addresses are same and X is logic '1' , S 0 is the conflict resolver, WR A and WR B have logic '0' then the priority outputs P 1 and P 2 are logic '0' . The priority and the port for write/read A/B is selected based on the address port and WR signal. In order to remove the conflicts, the conflict resolve factor (0 or 1) is used. When the same memory location is selected for both ports with at least one operation is write operation, then the priority of the two ports must be different. These priorities also work as the final write/read signals as shown in the following equations 1 to 4. B 1 and B 2 can be defined by the following functions.
Therefore, conflict resolver functions as the write/read signals which satisfies all the following conditions (access of the same or different memory location), Where, P 1 , P 2 are the priority of port A and port B respectively.

Data Router
Priority generated in the CLB is used as the Data Router for ports A & B. At any point of time, one row is selected for port A/port B, for the intended operation. Hence, the Data Router and the address lines have to be combined as shown in Fig.5d. In Table 7, d i is the left address decoder while D i is the right address decoder, P i is Port A/B and RS i is memory array i for write/read operation. If the same memory location is selected for both the ports, then the write/read signal of the prior port is selected. Output signals RS 1 , RS 2 , RS 3 , and RS 4 are the Data Router signals for four rows.

Memory Cell
The memory architecture used in [27,28] is modified according to dual port mode as shown in Fig. 5e. The   Table 8, WR i and RS i are the write and the read operation of Port A and B, respectively.

I/O Operation
There are individual data inputs for each port. The selected input port data are forwarded to the Memory cell or data are transferred to the output port based on the CLB control signals. The operation is performed using AND-OR combinational logic.

Results and discussions
The above discussed dual port memory components are realized in quantum cellular automata (QCA) using logical crossing as shown in Fig.6 (a to e). It shows the QCA realization of Decoder, ACB, CLB, Data Router and Memory cell using Logical crossing based crossover. QCADesigner 2.0. 3 [29] is used for this QCA layout design. The simulation waveforms of the individual modules of the RAM are shown in Fig.7. The simulation waveforms confirm the functional verification of the proposed RAM design. In Fig.7, the sample input and output simulation waveforms are shown, in which, for each module, one of the possible inputs and outputs is highlighted on the waveform and its logical value is represented on the right side. [20] with D Latch 132 3.
Proposed 77 The QCA layout of individual modules of dual port RAM is realized using the logical crossing. The AND, OR and NOT are the basic logic units of the functional blocks of RAM. These logic units are constructed using 3-input majority voter and they are incorporated to build the decoder, ACB, CLB, Data Router and Memory cell. The interconnections between them are introduced by logical crossing, which has a positive impact on the performance measures.
In [18][19][20], memory cell structure alone is implemented in QCA using multilayer/coplanar architecture. In [20], the inherent QCA capability is used to derive the Latch function. From Table 9, it is observed that the logical crossing in memory cell design reduces the required number of cells. When observing the logical crossing realization in Table 10, the inverter realization has 20%      where the area is not reduced with logical crossing. Hence, the area reduction was possible only in the interconnections for CLB. Data router is a combination of CLB and ACB (for which the area is already reduced). This has a positive effect on the construction of data router resulting in area reduction up to 11.27%. The number of cells required for wire connections in the proposed methodology has been reduced significantly.
The realization of memory cell leads to a small area reduction of 3.42%. The area occupied by the 4×4 memory array is reduced by 3.69%. The complete architecture of the RAM is obtained by combining all the individual cost, constant input, number of gates and garbage output are listed in Table 12.
From Table 12, the total cost of RAM is 516, which is the sum of quantum cost (285), number of constant inputs (67), number of gates (57) and number of garbage outputs (50). The number of logical calculations show the hardware complexity of the circuits.

Conclusion and future work
Quantum Cellular Automata is a low-power technology compared to present-day CMOS technology. In this paper, various functional blocks of Dual port memory such as 2 to 4 decoder, Control Logic Block (CLB), Address checker block (ACB), Data Router block and 4×4 memory cell block are designed using majority voters. The design unit consists of basic logic gates such as AND, OR, Inverter and connecting wires. All the functional modules are realized in the QCA layout with Logical Crossing. In Logical crossing method, the alternate clock signals are used to control the flow of data transmission rather than orientation and multiple layers for crossover. It has a positive impact on the cell count and reduces the area.
In comparison to the best published results from recent literature, it is worth mentioning that the number of cells is reduced by 29.81% (Decoder), 18.27% (ACB), 8.32% (CLB), 11.57% (Data Router) and 3.69% (Memory Cell) in the proposed work. Also, the area of the aforementioned blocks is reduced by 25.71%, 16.83%, 8.62%, 4.74% and 3.73%, respectively. In addition to that, the proposed logical crossing based complete Dual port memory achieves an improvement of 8.26% in area and 8.65% in number of cells. Moreover, the quantum cost, the number of constant inputs, the number of gates, the number of garbage output and the total cost are 285, 67, 57, 50 and 516 respectively. The work can be extended towards adding Asynchronous/Synchronous Set/Reset abilities to the dual port memory with increased memory array size.