Implementation of VIP for bus interface logic of 32-bit processor using System Verilog

: A verification environment to verify an ARM-based SoC is proposed in this work. This work introduces the design of a Verification Intellectual Property (VIP) of Advanced Microcontroller Bus Architecture (AMBA). AMBA protocols are today the best standards for 32-bit processor because they are well documented and can be used without royalties. The VIP provides Coverage Driven Verification (CDV) which significantly reduces the design verification time. The code coverage verification of the AHB bus master, Icache controller, Dcache controller and APB peripherals such as APB bridge, timer, UART, and ACE is done in this work. The test cases done for the APB peripherals are ACE with the mil_std_protocol, Timers for generation of interrupt and watchdog reset, UART for transmitting and receive messages, and interrupt registers for Reading and Write. The functional verification of AMBA is carried out using the Mentor Graphics Questasim tool with the system Verilog language.


Introduction
With the continued progression of chip geometries to ever smaller sizes, designers are finding themselves with a wealth of available gates in which to create their latest designs [1][2][3][4]. With design from scratch entirely out of the question, designers now build these systems with off-the-shelf IP blocks that are pre-designed and verified, helping them meet their goals of differentiation, cost control and time to market.VIP blocks are well-tested simulation models of industry-standard buses and protocols that generate and respond to stimulus and check protocol rule adherence. VIP reduces system verification time and improves quality.
VIP design of AMBA AXI bus is done in the previous works [5][6][7][8][9] [11], But the VIP for Dcache controller, Icache controller and APB peripherals such as APB bridge, timer, UART, and ACE is done for the first time. The implemented VIP finds application in the realization of onboard computers for navigation, guidance, and control processing in-flight applications as well as for general purpose processing applications.
The verification environment is managed with Questa Sim Simulator ver. 10.0, test bench and SVA in System Verilog HDL and DUT in VHDL. Separate assertion files in system Verilog are bound with the corresponding test benches to validate design specifications.

Proposed system design 2.1 AMBA Bus
The Advanced Microcontroller Bus Architecture specification defines an on-chip communications standard regarding bus protocols for communication between various system devices and peripherals. AMBA is a registered trademark of ARM Limited and is an open standard, on-chip interconnect specification for the connection and management of functional blocks in a System-on-Chip (SoC) [2]. This work provides Coverage Driven Verification (CDV) for the implementation of Verification Intellectual Property (VIP) for the AMBA bus. In this paper, the verification of the APB peripherals such as APB Bridge, timer, UART, and ACE is done.

AMBA Architecture
An AMBA based microcontroller typically consists of a high-performance system backbone bus (AMBA AHB or AMBA ASB), able to sustain the external memory bandwidth, on which the CPU, on-chip memory, and other Direct Memory Access (DMA) devices reside. A typical AMBA Architecture is shown in figure 1[2].

AHB Master
AHB_master generates chip select signals, the external memory address for instruction and data memory. The FSM of AHB_master is shown in figure 2.
The FSM works on clk_x2_pos. Wait states are added for external instruction and data memory access. They are selected from the Memory Configuration Register depending on the memory bank accessed.
Instr_state: It is selected when instruction access is requested by fetch stage of the pipeline when instruction cache is disabled. Chip select signals and address for external memory access are generated. The instruction read from external memory is sent to the fetch stage of the pipeline. FSM waits in the instr_state till the specified wait states are over and transitions back to idle state.

Figure 2: FSM of AHB_master
Data_state: It is selected when data memory access is requested by the memory stage of the pipeline when data cache is disabled. Chip select signals for load/store and address for external memory access are generated. For load, the data read from external memory is sent to the memory stage of the pipeline. For the store, the data from the memory stage of the pipeline is sent to the external memory. FSM waits in the data_state till the specified wait states are over and transitions back to idle state.
ICache_state: This state is selected if Icache is enabled and a cache miss occurs. FSM2 handling miss in icache_controller.vhd is active here. The chip select signals are generated when fsm2 in icache_controller. vhd is in wait_for_mfc (wait_for_mfc_icache = 1) state. FSM transitions to idle when icache access is complete (icache_access_complete=1, from icache_controller. vhd) DCache_state: This state is selected if Dcache is enabled and a cache miss occurs. FSM2 handling miss in dcache_controller.vhd is active here. The chip select signals are generated when fsm2 in dcache_controller. vhd is in wait_for_mfc (wait_for_mfc_dcache = 1) state. FSM transitions to idle when dcache access is complete (dcache_access_complete=1, from dcache_controller. vhd).
APB_sel: If a memory mapped peripheral is selected apb_sel state is encountered. FSM transitions to idle when hready_apb = '1' from apb_bridge.vhd, the external memory address for instruction access, is generat-ed in instr_state or icache_state. The external memory address for data access is generated in data_state or dcache_state.

Icache controller
The instruction cache controller is used to cache copies of frequently accessed instructions, thus eliminating the program memory access bottleneck. The cache controller receives instruction read request an address from the Fetch stage. Depending on hit/miss the cache controller supplies instruction from the on-chip cache or reads from external program memory via AHB and provides instruction to the pipeline. Till external memory access is complete, the pipeline is stalled.

LRU Replacement Algorithm
The LRU (Least Recently Used) algorithm is implemented for the two-way associative cache conFigure uration. This algorithm selects a block for replacement based on its usage, thus benefiting from the temporal locality principle. A single bit is added as part of the tag entry in the tag ram. Whenever a tag match is found in a block, the LRU bit of that block is cleared and the LRU bit of the second block in the set is made '1' .When a block is to be evicted, the tag entry in the set which has its LRU bit set to 1 is selected.

Dcache controller
The data cache controller caches frequently used data items. The data cache implements copy back with write allocate on a write miss. The dirty blocks are written back to external memory only when they need to be evicted. - Uses an LRU replacement algorithm -Copy back policy -Write allocate on a write miss -Two way set associative Hardware Organization: Data cache is identical to the instruction cache, except that each tag ram location has an additional bit called the dirty bit, which indicates whether the cache block had been modified during its cache residency. Thus each tag ram array has 23 bits * 1024 locations.
Address Decoding: This is identical to two-way associative instruction cache implementation. The 10-bit ad-dress is used to index the tag ram arrays. The two-word locator bits are appended to the 10-bit set address to address the cache ram arrays.
Copy Back Architecture: When there is a store request from the memory stage of the pipeline, the corresponding cache array entry is updated in the cache if it already exists in the cache. If it is a cache miss, the block which includes the address requested by the store operation is brought into the cache from the external memory (write allocate on a write miss) and the required location is updated. This policy is especially beneficial when frequent writes to a memory location (store instruction) occur since there is no need to access external memory once the word is in the cache. This policy uses the memory bandwidth more efficiently compared to the write through policy wherein each store location writes to the external memory.
Data Cache Parity Error detection of cache tags and data is implemented using two parity bits per tag and 4-byte data sub-block. The tag parity is generated from the tag value, LRU, dirty and the valid bits. The data parity is derived from the sub-block data. The parity bits are written simultaneously with the associated tag or sub-block and checked on each access. The two parity bits corresponding to the parity of odd and even data (tag) bits.

APR Bridge
APB bridge acts as the master for the APB slaves -four ACE, two UART, interrupt registers and four timers. The APB bridge converts the AHB signals from the bus master to corresponding signals in APB. Memory configuration register specifies the no. of wait states for different memory banks and internal RAM.
Select signal for selecting a memory mapped register is generated by decoding the address (haddr). Hwrite=1 indicates a register write operation. Hwrite=0 indicates a register read operation. The ACE, UART, timer, interrupt, and processor configuration registers are read or written in a single clock cycle, and hready_apb is asserted. For ace access hready_apb is asserted when already signal is asserted. The Processor Configuration Register is shown in table 1.

AMBA Advanced Peripheral Bus
The APB Bridge is the only bus master on the AMBA APB. Also, the APB Bridge is also a slave on the higherlevel system bus (for example AHB). The bridge unit converts system bus transfers into APB transfers and performs the following functions: -Latches the address and holds it valid throughout the transfer. -Decodes the address and generates a peripheral select, PSELx. Only one select signal can be active during a transfer. -Drives the data onto the APB for a write transfer.

Code Coverage Analysis
Code coverage is a verification technology is used to recognize what code has been executed ( figure 3). It has to be checked only after the simulation part. If the design may look like a good design, but the problem is that it can contain unknown bugs. It is hardly possible to know the verification is functionally correct, with cent percent certainty and all of the test benches simulate successfully. The main objective of the code coverage is to find out which code has to forget to exercise in the design.
The term "test bench" specifies the stimulus used to initiate a predestined input sequence for the design and to examine its response. The test bench describes the stimulus for the DUT along with its responsibility for the outputs. Here, the test bench is written in SystemVerilog with a preset input sequence, and they may be included with external data files.  shows the analysis window for the statement coverage verification. It will be generated after the simulation part. The tick ( ) mark indicates that statement code which includes in the DUT are functionally correct. If shows (x) mark, indicate that design is functionally incorrect. It will quickly identify, and we can browse which statements that were not executed.

Figure 4: Statement coverage
FSM: It is usually, coded using a choice in a case statement, the unvisited state identified with uncovered statements. During the verification time, it clearly or correctly identifies the state transitions. Figure 5 shows the bubble diagram for FSM. It indicates that state transitions of decoder sections.

Branch and Toggle coverage:
A signal is considered to have fully toggled when it has experienced at least one rising edge and at least one falling edge during the simulation. It has been found from the simulated results that the coverage windows, which indicate all the branch and toggles present in the design of AMBA was functionally, correct.
Transition Coverage: Transition coverage measures the presence or occurrence of sequences of values. Transition coverage can involve more than two consecutive values of the same coverage point. However, the number of possible bins grows factorially with the number of transition states. Mechanically, transition coverage is identical to coverage points. Specific values are sampled at specific locations at specific points in time with specific bins. Table 2 and 3 show the Assertions for the Timer module and the UART module.

Results and discussion
The functional verification of AMBA is carried out using the Mentor Graphics Questasim tool in the code coverage mode with the SystemVerilog language. The Sys-temVerilog simulation is performed to verify the AMBA design by using the VIP. Functional integrity of DUT is checked by using Assertions and cover groups along with necessary test inputs. Figure 6 shows the instance coverage analysis of the AMBA peripherals. The instance analysis is done a state of the peripherals during each instance. It provides the coverage of the individual modules in the AMBA peripherals. This window analyzes coverage statistics for each instance in a flat and non-hierarchical view. The window contains the same code coverage statistics columns as in the Files and Structure windows.  The Assertion Coverage of the AMBA peripherals is also analyzed using a simulation tool. Assertion/properties provide a clear indication to the VIP module. The assertions can be set as true or false throughout the VIP module. It has been found from the simulated results that, the total of 17 assertions are used for the analysis and each of them has been satisfied throughout the analysis. Hence, the assertion coverage of the proposed VIP module is 100 %. Figure 8 and 9 shows the coverage analysis report of the AMBA peripherals. The coverage analysis depicts the coverage obtained by the proposed VIP module for each of the AMBA peripherals. For the coverage analysis, the VIP considered seven aspects for each module, and they are the statement, Branches, FEC condition trees, FEC expression trees, States, transition, and tog-gle bin. In total, proposed VIP has achieved coverage value of 91.4%, for each AMBA peripherals. Besides, the total 17 assertions provided for the analysis has never failed throughout the process,