Design of Priority Based Reconfigurable Router in Network on Chip

: Network on Chip (NoC) is an advanced integration design for communication networks while providing a solution to the traditional bus based System on Chip designs (SoC) too. A router is a key component which is considered as the backbone of communication in NoC. The objective of this work is to design a priority based reconfigurable router for NoC. Initially, a 4x4 Baseline router is designed and synthesised and then the channels inside the router are modified to achieve reconfiguration to improve the router’s efficiency. In 4x4 Reconfigurable Router the slots are well utilized but prioritization portion is not considered. Routers are associated with switches to take data transfer decisions resulting in high power consumption. In order to overcome this problem, a new priority based reconfigurable router is designed. The design for this router is carried out using Verilog HDL and synthesized and simulated using Xilinx ISE Design Suite 14.3 and ModelSim-Altera 6.5b Software respectively. The corresponding results in terms of power, energy efficiency, area and delay are analysed and the proposed work gives better results than the conventional Baseline Router.


Introduction
The key behind the Integrated Circuit technology has been Moore's law for almost five decades. Although this is projected to slow down to doubling every 3 years in the next few years for fixed chip sizes, the exponential trend is still in force. Because of the evolution, the system level focus moves in steps. It leads to a paradigm shift through the technology maturity for a given implementation style. Past examples of such shifts were moving from room to rack level systems (LSI-1970s) and later from rack to board level systems (VLSI-1980s). This trend allowed the introduction of Systems on-Chip (SoC) (1990s), the integration of many components such as Microprocessors and Custom IPs in a single chip. Hence the integration of many processing elements along with the memory cores in a single chip [1,2] was introduced. In turn it created a communication overhead that traditional bus-based architec-tures cannot handle for a number of reasons. In order to solve these problems, NoC (Network on Chip) [3,4] is a good paradigm. NoC is an integrated network that uses routers to allow the communication among those blocks. It uses networking theory methods for on chip communications where the blocks exchange information on a chip [5].

Network on chip
NoC technology is a new approach to communication that [6,7] enables not only more efficient interconnects but also more efficient design and verification processes for modern SoCs. NoC is an approach in signalling the needs of the signal to various communication protocols by reducing the complexity of the chips interconnects. The communication through the NoC is performed by Processing Elements (PEs) through the network fabric composed of switches or routers [8] through physical channels. A typical NoC based MPSoC is composed of a number of PEs such as CPUs, custom IPs, DSPs and Storage elements (Embedded Memory Blocks).

Proposed design
The proposed architecture is a Priority based Reconfigurable Router for NoC. The architecture constitutes of a Reconfigurable Router with four channels and a Priority based Scheduler, a Buffer and an Arbiter Unit. Under this section, we will see the design of a Reconfigurable Router alongside the buffer and arbiter unit.

Router design
Router architectures have dominated the early NoC researches [9] and the first NoC design proposed the use of simplistic routers with deterministic routing algorithms in terms of RTL design. Since, the router [10] is a component that is to be used in every future versions of the system, its architecture options may be either revised or coexist in the same architecture (heterogeneous NoCs [11]), it should be designed as a reusable IP block [12].
The basic block diagram of a NoC Router is shown in figure 1 and its major components are listed below. - The input/output buffers that temporarily store flits. - The output port allocation logic which selects the output port for each flit/packet. - The switch fabric which makes the physical connection from input to output port. - The control logic that is responsible for overall synchronization.

Buffer
Buffers are the greatest power consumers [13,14]. Thus, efficient buffer design is critical for achieving good performance/area/power trade-off. To minimize the implementation cost, the on-chip network has to be implemented with little area overhead. Thus, unlike off chip networks which feature large memories, NoCs typically use small registers for buffering. Major advantage of using registers over large memories is the significant reduction of the address decoding/encoding latency and the access latency [15]. The FIFO scheduler which has to be fit inside the Router is designed. There are 4 FIFOs used in the design of this router. Each FIFO used here is of 9 bits wide and 16 bits depth. The FIFO operates on the system clock and the reset operation is synchronized with an active-low reset. Sizing of buffers in worst case scenario will have to compromise routing area and power consumption while improving the throughput. If depth is small, latency is more leading to a compromise in QoS.
Above figure shows the internal design of an FIFO Buffer module of our proposed design. It has two major units namely Data Path unit and Control unit. Control unit receives control signals as Credit_ in as In_request and Out_request. It has sub modules such as FSM, Counter and Decoder. Minimum buffer dimensions requirements (width and depth) are functions of switching modes, packet size, flit size and expected traffics.
One of the vital goals in designing NOCs is to minimize the buffer size by incorporating a trade off between latency and throughput performance degradations.
Upon arrival of the packet at the input port, it is stored in the FIFO Buffer flit by flit during every clock cycle provided that there is availability of free space. Simulta-Figure1: NoC Router Block Diagram neously, FIFO Buffer will be evacuated until the output port finds free space. The evacuation stops when the control signal credit_in entering the arbiter becomes invalid indicating non availability of free space. However the storage in the current buffer will continue. Now, the control signal credit_out representing this port will be sent as back pressure to the adjacent source node when the FIFO Buffer becomes almost full. As soon as the first flit reaches the first location (Shift Register SR) of FIFO which is connected directly to the router, the header flit will notify arbiter about the arrival of the flit/ packet in the port and provide destination address to the Direction Decoder to calculate the direction.
Using the mechanism, eight buffer locations is sufficient enough to get the minimum latency in the absence of blocking. However, the buffer size has to be made parameterizable and chosen as 16 to make a fair understanding and reallocation to four different channels to accommodate the reconfiguration process during contention and to make a fair comparison with other designs. Hence, by doing systematic Design Space Exploration (DSE), the depth of buffers is zeroed in as 16 bits for our design.

Arbiter unit
Finite State Machine (FSM) in general is simply another name for sequential circuits. Finite refers to the fact that the number of states the circuit can take during finite. A synchronous clocked FSM changes state only when a triggering edge occurs on the clock signal. It provides the selection process of selecting the operation to be carried out by the router. The FSM controller here is used to display the work to be done by the router [12]. In general, a synchronous circuit is a digital circuit in which the changes in the state of memory elements are synchronized by a clock signal. In a sequential digital logic circuit, data is stored in simple memory devices termed as the latches.
A synchronizer is a digital circuit that converts an asynchronous signal into the recipient clock domain without causing any stability issues. This module provides synchronization between router Finite State Machine and router FIFO modules. Thus, it provides fruitful communication between all the four input ports and output ports. The register module is designed using four internal registers in order to hold a header byte, FIFO full state byte, internal parity byte and packet parity byte. All the available registers are latched on the rising edge of the clock.

Reconfigurable router
If an NoC router has a larger FIFO buffer, the network will have higher throughput [16] and smaller latency as there will be fewer flits stagnant on the network .Since, each communication will have its peculiarities, sizing the FIFO for the worst case communication sce-

Figure 2:
Internal design of an FIFO Buffer module nario will compromise not only the routing area, but power [17] as well. However, if the router has a small FIFO depth, the latency will be larger and the quality of service (QoS) can be compromised. The solution is to have a heterogeneous router, [18] in which each channel can have a different buffer size. In this situation, if a channel has a communication rate smaller than its neighbour, it may lend some of its buffer slots that are not being used as explained clearly in stages in the figure 3. In figure 3.a all the channels are designed with a depth of 4 slots. The south channel is filled with 9 slots by borrowing three free slots from west and two free slots from east, along with its own four slots as shown in figure 3.b. Thus, the final reconfiguration of south channel with depth 9 is shown in figure 3.c. In a different communication pattern, the roles may be reversed or changed at run time, without a redesign step [19]. When the traffic is less there will be no waiting time and reconfiguration process leading to no impact in latency based on buffer depth. Our proposed idea is to design a router having 4 channels namely East Channel, West Channel, South Channel and North Channel. All four channels have separate buffer slots along with FSM and Registers as 4x4 Baseline Router. It is followed by the modification of the channels inside the router in a way to achieve reconfiguration. Reconfiguration in a router works according to the needed bandwidth in the channel. Initially, the buffer depth of all channels is decided during design. In our design, the buffer depth is fixed for 16. If the east channel is filled with the data it can be transferred to its neighbouring channel when it uses less number of slots. For storing the huge data, the data loss can be avoided by this process known as reconfigurable mechanism.
The design of east channel with the novel reconfiguration mechanism is shown in the figure 4. Priority, [20] a crucial part in networking was not considered in our primary design. Routers are associated with switches to take data transmission decisions resulting in higher power consumption [21]. In order to overcome this problem of higher power consumption, a new prior-  During typical router operation, an incoming flit is first successfully received and possibly stored in the input queue. Second, the output port request for the incoming flit is determined based on the flit destination address according to the routing algorithm. Third, the output port allocator receives the flit [22,23] requests and allocates the output ports according to priority. Finally, as soon as a flit is granted a port it is routed through the switch fabric to the granted output port to reach the neighbouring router or destination node.

Results and discussion
The design of 4x4 Baseline Router is done using Verilog in Xilinx ISE software. The design is simulated using ModelSim. The on chip power consumption is found to be 37.41mW as shown in figure 5. The timing report in figure 6 shows that the synthesis of the design takes a delay of about 9.012ns.
For the efficient utilization of slots in the 4x4 Baseline Router, the need of reconfiguration of the slots in the channel is identified. The Reconfiguration of slots is done according to the bandwidth necessity of the channel. For example, if the south channel is overflowing with the data it can be transferred to its neighbouring channel if it uses the less number of slots. For assigning the four directions four channels are used in reconfiguration router. The register inside the east channel is provided with push and pop as input signal to perform the read and write operation respectively. Moreover, The function of 4x4 Reconfigurable Router is obtained from the simulation results as in figure 7. It shows that the data input of west 77 & 74, are not able to be stored in corresponding west output port due to scarcity of memory slot in south. We can visualize that these two data 77 &74 are stored in north's and south's empty slots respectively. Thus, the concept of reconfiguration is achieved in the simulations. Also, the on chip power consumption is found to be 33.66mW, which is 38% reduction than the Baseline router. The timing delay is The function of 4x4 Priority based Reconfigurable router is obtained from the simulation results as shown in figure 8, the data input of north 20,21 & 22 are not able to store in corresponding north output port due to the scarcity of memory in north channel. So the data 20, 21 & 22 are stored in east's, west's and south's empty slots respectively. The concept of priority based reconfiguration is achieved in the sense of accessing the data inputs of any of the channels from any of the slots. The 4x4 Priority based Reconfigurable Router on chip power consumption is found to be much lesser at 15.54mW, which is 44% reduction in comparison with 4x4 Baseline router, and where as it has a further 36% reduction in delay at 3.050ns.
The figure 9 shows the overall comparison results of the designed Routers. From that, it is found that the designed Priority based Reconfigurable Router consumes less power with a reduction of 58.5% than the 4x4 Baseline router. The number of (Look Up Tables) LUTs in 4x4 Priority based Reconfigurable Router is reduced from 86 to 38, leading to an optimized router architecture of reduced area and power consumption. When implementing the proposed design on spartan3e FPGA, the delay among the configurable logic blocks is reduced from 9.012ns to 3.05ns. Thus, this design leads to a high efficiency router too. The utilization of the available energy is also significantly improved. The maximized energy efficiency is 86.46% as it has improved from 42.02 %.

Conclusion
The Baseline Router is designed using the FIFO, FSM, Synchronizers and Registers with single input channel and three output channels. Then, the channels are extended in all four directions as 4x4 Baseline Router. The design entry of this router is done using Verilog. Their corresponding test fixtures are synthesized and implementation design is obtained using Xilinx ISE Design Suite 14.3. Those on chip supply power consumptions are obtained using XPower Analyzer. The functionality verification of the router is obtained in the form of simulation waveform results from ModelSim -Altera 6.5b. By analysis, it is found that reconfiguration of these buffer slots will lead to the reduction in power consumption in router configuration. Also, the efficient utilization of slots in the 4x4 Baseline Router will improve its performance. So, the channels of the 4x4 Baseline router's channels are modified to achieve reconfiguration. The on chip power consumption and delay of the 4x4 Reconfigurable Router is found to be reduced to 33.66mW which is 44% reduction from the Baseline Router.
In 4x4 Reconfigurable Router the slots are utilized but priority, a crucial part in networking was not considered. Routers are associated with switches to take data transmission decisions resulting in higher power consumption. In order to overcome this problem of higher power consumption a new priority based reconfigurable router is designed and found that the power consumption is reduced by 58.5% while delay is reduced to 3.05 ns. The number of LUTs is reduced to 38. The utilization of the available energy is also significantly improved. The maximized energy efficiency is 86.46% as it has improved from 42.02 %. Thus, this design has led to a high efficiency router too. Designing of Multiple priority Traffic class Router [24] can be considered for future works.