# Field Configurable System-on-Chip Device Architecture

Steve Knapp, Director of Applications Danesh Tavana, VP of Engineering Triscend Corporation 301 N. Whisman Road Mountain View, CA 94043

#### **Abstract**

Time to market pressures, increasing system complexity, and smaller process geometries, are creating a market vacuum that will be increasingly addressed by an important emerging category of devices: the Configurable System-on-Chip (CsoC). These application specific programmable parts (ASPP) are single chip combinations of microprocessors, memory, dedicated peripheral functions, and embedded programmable logic. They provide unprecedented time-to-market benefits and field customization for the electronic systems of this upcoming decade. Integration of microprocessors, memory, peripherals, and programmable logic is made possible with a new bus architecture called the Configurable System Interconnect Bus (CSI). The Configurable System Interconnect Bus was specifically designed to facilitate re-use, guarantee timing, increase system throughput, and reduce system debug time in applications that require intense time-to-market and field upgrade.



Figure 1. Embedded Systems Challenge

## Introduction

Six competing requirements challenge the embedded systems designer: time-to-market, performance, cost, physical size, power consumption, and product features (Figure 1). The designer's task is to find the best possible compromise between these requirements in order to deliver the most effective product to the customer. For many applications, Triscend's Configurable System on Chip (CSoC) offers designers a more attractive balance between these factors than alternative solutions. Design re-use, and field upgrade through remote software download, when combined with a CSoC device, fundamentally change the rules of embedded system design by breaking the link between time to market and product features.

#### **CSoC Introduction**

Embedded system designers develop products with a specific function or application in mind. In almost every application there is a central processing unit that is responsible for the overall system's supervision, handling of complex state machines, and number crunching algorithms. The central processor also manages the system's memory and I/O, which are the peripherals that interface to various entry or display devices. In contrast to personal computers, the embedded system applications are usually differentiated in both hardware and software. Unlike the PC chip sets and prevailing standards around it, the embedded system world is plagued with innumerous design choices and lack of standards. Triscend's CSoC device is an ideal single chip device for the embedded system designer who wants both hardware and software flexibility on a single extensible platform.

The CSoC's processor choice is based on a popular industry-standard core that is supported by leading 3<sup>rd</sup> party compiler, assembler, and debugger tool vendors. The processor is a familiar, proven architecture with a large availability of freeware applications. The processor's performance is boosted through pipelining in a "Turbo" mode operation, and by employing advanced CMOS processing technologies.

An integrated DMA controller offloads the processor from large data transfers, freeing it for more

important tasks. A glue-less interface to external byte-wide Flash, EPROM, or SRAM improves performance and eliminates external latches that are typically required in time-multiplexed address/data bus interfaces. In applications requiring even higher performance with lower power dissipation, an on-chip SRAM ranging in density from 8K bytes to 64K bytes provides data or code storage for the processor, and data buffer space for the DMA controller.

The system comes with an integrated on-chip bus specially designed to insure single cycle data transfers and programmable address decoding for the peripherals embedded in the chip. This bus is specially designed to facilitate "soft module" re-use by abstracting the processor specific signaling requirements from the peripherals, and by supporting a friendly drag and drop software tool for developing micro-controller derivatives.

Programmable I/O (PIO) pins operate independently from the bus and are tightly coupled to an on-chip embedded programmable logic core. Flexible pin assignment for optimum PCB layout and a host of field programmable options like output drive strength, slew rate control, pull up, pull down, registered I/O, or low power operation are some of the user selectable features for each package pin.



Figure 2. Triscend's E5 Configurable System on Chip Block Diagram

Finally, there are large amounts of on-chip programmable logic with sophisticated built-in system debugging hardware that includes a JTAG interface and a breakpoint unit. The chip is supported by sophisticated software that can alter the device's operation in the field. The software is also used to debug the device using standard logic debugging or processor in-circuit

emulation, ICE, techniques. Triscend Corporation's E5 device (Figure 2) is a single chip CSoC that is the ideal embedded system platform for creating flexible, fast time-to-market applications.

Implementing the E5 as a dedicated logic chip with only a small area of re-configurability provides one to two orders of magnitude more efficiency in silicon area than a similar implementation in a pure programmable logic device. Other obvious advantages offered by the integration include increased performance and lower power. An important architectural feature of the E5 CSoC is the Configurable System Interconnect (CSI) bus that allows each resource on the bus to communicate to other resources, thus leveraging and building upon the existing resources.

#### **CSI Bus Introduction**

The configurable system interconnect (CSI) bus is designed to facilitate design re-use within the configurable system logic (CSL). The bus is distributed throughout the CSL in fixed logic. It is connected and operates with all system masters including the processor, DMA controller, JTAG interface, and the external Memory Interface Unit (MIU). Memory mapped and DMA slaves may be implemented to handle data transactions. User logic may obtain the services of the processor and DMA controller as proxy masters through an interrupt or DMA request respectively. The user is assured that the bus will reach any logic implemented within the CSL. The maximum bus performance specification can be met regardless of the placement algorithm's quality of results.

The primary objective in the design of this bus was to make it easy to use. The bus operates synchronously. The default transaction duration is one clock cycle. Wait states may be added when necessary. The appropriate bus signals may be configured to connect to the CSL logic as required for the user's design. Additionally, the synchronous decode of address is managed by selectors. There are many selectors distributed throughout the CSL. Each selector provides simple read and write signals to be routed to the user logic. The addresses and commands that a selector responds to may be individually configured.

### **CSI Bus Architecture**

The bus operates synchronously and supports multiple masters and slaves, DMA, and wait states. The bus is pipelined, has separate write and read data paths, uses multiplexed or logical OR networks to combine signal sources, and supports a fast default cycle with optional wait states when necessary. A round-robin arbitration scheme is implemented for masters. The slave side includes multiple decoded and qualified read and write enable signals generated by selectors. A logical bus

architectural diagram and signal flow is shown in Figure 3. There are four primary bus segments: master read/write, and slave read/write related to the distribution and collection of the bus signals prior to the pipeline registers. The multi-source instances of all signals in each collection segment are combined via logical OR gates or multiplexed networks into a consolidated bus. This prevents power consumption concerns in the event of any contention that is typical of tri-state bus structures.



| arb | Arbitration request and early response       |
|-----|----------------------------------------------|
| mw_ | Master write collection segment              |
| mr_ | Master read distribution segment             |
| sw_ | Slave write distribution segment             |
| sr_ | Slave read collection segment                |
| lw_ | Non-CSL slave interface distribution segment |
| lr_ | Non-CSL slave interface collection segment   |
| fw_ | CSL interface distribution segment           |
| fr_ | CSL interface collection segment             |

Figure 3. CSI Bus architecture and Bus signal description

The data bus signals within the CSL travel in one of two directions, either from the system masters to the slave logic on a write operation (labeled as fw\_ in Figure 3), or from CSL slave logic to system masters on a read operation (labeled as fr\_ in that same figure). All bus signals flowing to the user logic are referred to as "write" signals and all bus signals from the user logic as "read" signals.

## **Bus Distribution in CSL**

The CSI bus has separate address, command, write data, and read data signals. Figure 4a shows the CSI Socket bus diagram. Within the CSL there are many copies, or entry and exit points for the synchronized bus signals.



Figure 4a. Configurable System Interconnect (CSI) bus and the socket interface to the CSL matrix

CSI Bus signals do not need to travel very far through the routing signals of the CSL. All signals are synchronized throughout the CSL at bank boundaries just before entering the user networks as shown in Figure 4b. A complete set of the shared or common physical bus signals is available in each CSL bank. A bank includes 128 logic cells and 8 selectors. All of the synchronized data, address and control signals are logically equivalent and may be treated as aliases.



Figure 4b. Vertical and horizontal breakers separate the individual CSL banks with Configurable System Interconnect (CSI) bus resources

The CSL configuration software can usually optimize the performance and ability to place and route by selecting the closest available alias. The logic block architecture and programmable routing scheme for CSL was determined prior to the addition of the bus. The bus signals enter general user logic within the CSL logic through existing interconnect resources. The logic block

of each configurable logic cell includes a four input LUT and a register bit. Each pair of these is grouped with an associated programmable switching matrix and routing channels and then used as a fundamental building block or logic tile. Logic tiles are grouped into 8x8 banks that are arrayed two dimensionally to provide the required amount of Configurable System Logic embedded within a CSoC.

Every bank boundary has a complete set of bus pipeline registers. CSI bus signal to and from the banks are pipelined allowing the number of banks to be scaled while maintaining constant bus interface timing characteristics at the user logic.

There is one selector per column of cells in the boundary above each bank. Each selector delivers a decoded and piped read and write signal. It is fully programmable as to which address bits are included in its decode and the decode value. Bus signals are distributed to the banks by a cascade of buffers that is repeated once per bank. They are collected from the banks by a similar cascade of OR gates. A bus signal crosses the chip only once in the horizontal and vertical dimensions.



Figure 5. Signal flow in a Logic Tile

Four vertical 32-bit data busses span each bank, two write and two read, four bits per tile per bus, least significant nibble to the right. This data bus is also used to access the CSL configuration memory. The data bus signals are buffered and piped at bank boundaries. The read data path is implemented as a chain of OR gates within the bank, with one OR gate per tile. Since an 8032 microprocessor has an 8-bit bus, all data bytes are combined into one byte at the top edge of the CSL. The write data bits are driven to the existing routing resources within the tile. A horizontal 32-bit read data bus also spans each bank with four bits per tile. The horizontal data path is implemented as a bi-directional multiplexed

chain with the select control derived from the programmable routing channels and the select data from the logic tile's LUT, registers, or carry chain. Within each tile, the horizontal read data may be configured to connect to the corresponding vertical read data OR chain.

The signal flow associated with each logic tile is illustrated in Figure 5. Aside from the data busses, the general interconnect includes 48 nets within each tile. There are four sets of eight "short" nets, each set extending to one of the four adjacent tiles. Each set of eight "long" nets spans the bank, one vertically and one horizontally. At bank boundaries, vertical long nets may be used as bus control signal inputs or outputs, while horizontal long nets may be used as bus address lines.

## Conclusion

A Configurable system-on-chip device with an embedded bus oriented programmable array logic core was described in the paper. The on-chip bus provides an ideal framework for delivering quick time-to-market custom applications while reinforcing good re-use disciplines.

#### References

"Triscend E5 Configurable Processor Family," Triscend Corporation Press Release, November 1998.

"A Bus Centric Configurable Processor System," Press Article.

"Triscend E5 Configurable Processor Family", Product Description.

"Configurable Processors: An Emerging Solution for Embedded Systems Design." White Paper, Triscend Corporation.

"Triscend Configurable System-on-Chip Learning Center," Application available at www.triscend.com

"Configurable System Interconnect Bus User's Guide" Triscend Corporation