

## **Overview**

The T-Head XuanTie E907 is a fully synthesizable, high-end, microcontroller-class processor that is compatible to the RISC-V RV32IMA[F][D]C[P] ISA. It delivers considerable integer and enhanced, energy efficient floating-point performance. The key features of E907 include single/double precision FPU, deep optimized DSP execution unit with CSI-DSP lib and fast interrupt response.

| T-Head<br>XuanTie E907   |                  |                          |  |  |
|--------------------------|------------------|--------------------------|--|--|
| RV 32IMA[F][D]C [P] Core |                  |                          |  |  |
|                          |                  |                          |  |  |
| CLIC /<br>CLINT          | I-CACHE          | AXI4.0<br>Master         |  |  |
| T-Head<br>Extension      | D-CACHE          | Peripheral AHB<br>Master |  |  |
| PMP                      | FPU              | DSP                      |  |  |
| ВТВ                      | RAS              | внт                      |  |  |
| Data<br>Prefetch         | RISC -V<br>Debug | НРМ                      |  |  |
|                          |                  |                          |  |  |

## **Features**

| Feature                               | Description                                                                                                                                                                 |  |
|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Architecture                          | RV32IMA[F][D]C[P]                                                                                                                                                           |  |
| Pipeline                              | 5 stages (Integer)                                                                                                                                                          |  |
| Main Interface                        | AXI4.0 64-bit Master Port                                                                                                                                                   |  |
| Peripheral Interface                  | AHB5 32-bit Master Port                                                                                                                                                     |  |
| Hybrid Branch Predictor               | BHT/BTB/RAS (Optional)                                                                                                                                                      |  |
| Instruction Cache                     | Up to 32KB (Optional)                                                                                                                                                       |  |
| Data Cache                            | Up to 32KB (Optional)                                                                                                                                                       |  |
| FPU                                   | Suitable architecture for double precision floating point                                                                                                                   |  |
| DSP Enhancement                       | Deep optimized DSP unit with CSI-DSP lib compliant to RISC-V P-Extension v0.9                                                                                               |  |
| Interrupts                            | Up to 240 interrupts + Non-maskable interrupt (NMI)                                                                                                                         |  |
| Hardware Performance<br>Monitor (HPM) | HPM for performance profiling (Optional)                                                                                                                                    |  |
| T-Head Extensions                     | T-Head MCU enhanced extensions include the interrupt accelerating technology to reduce the response latency and the enhanced ISA to improve the instruction set performance |  |
| Sleep modes                           | Sleep and deep sleep modes                                                                                                                                                  |  |
| Debug                                 | RISC-V Debug, various trigger settings, hardware/software breakpoints                                                                                                       |  |

# **XuanTie E907 Components**

#### Processor Overview

The E907 processor adopts a 16-/32-bit mixed instruction set and implements a classic five-stage integer pipeline. Also, it can be configured with an FPU (single precision or single+double precision) or DSP unit. E907 is designed for high-end performance MCU/MPU applications whose target frequency falls in 400MHz-1GHz.



## • Floating Point Unit (FPU)

Oriented towards the motor and navigation domain, the E907 processor implements a powerful FPU to accelerate the algorithms. The FPU has the following features:

- ♦ Compliant to the RISC-V RV32F and RV32D;
- ♦ Compliant to the IEEE-754 protocol spec;
- Special design for double precision floating point unit when configured with RV32D and the single precision reuses the pipeline;
- ♦ 64-bit data cache read port to provide sufficient bandwidth for double precision operations.

## Memory Subsystem

E907 implements an optional instruction cache and data cache. Also, E907 supplies two configurations on the master interface: rich and normal outstanding capability of bus transactions. The "rich" configuration can achieve high bandwidth and accelerate memory accesses such as memory copy, memory set, etc.

Both the instruction cache and the data cache have the following features:

- ♦ 2-way set-associative and the cache line is 32 bytes;
- ♦ FIFO cache replacement policy;
- Support software invalid and clear (only D-cache) operations through extended instructions;
- ♦ Can be configured to 2KB/4KB/8KB/16KB/32KB.

#### DSP

The DSP execution unit is compliant to RISC-V P-Extension v0.9. The DSP supports 8-/16-bit SIMD multiply, multiply-accumulate etc. Those are the key operations to accelerate signal processing or filter arithmetic such as FFT, FIR, IIR and AI arithmetic such as matrix multiplication, vector multiplication, etc.

The DSP execution unit can make full use of the 32 integer GPRs in E907 to supply enough resources for the software optimization. Furthermore, E907 has optimized the micro-architecture to reduce the execution latency and has adopted hybrid branch prediction to decrease the mis-prediction ratio. Thus, the CPI of key DSP lib is close to 1.

Besides, the DSP execution unit has the following features:

- Optimized data prefetch mechanism;
- ♦ Appropriate GPR read and write ports for Zp64 64-bit arithmetic;
- ♦ High performance CSI-DSP lib is supplied, fine-tuned with software and hardware co-optimization.





#### Physical Memory Protection (PMP)

The E907 processor has an optional RISC-V PMP which allows machine and user privilege modes to access different address ranges. Only the machine mode has the authority to define the memory access permissions. If an unauthorized access is detected, an access fault exception is triggered. The PMP has the following features:

- ♦ Up to 16 regions can be configured;
- ♦ Read/Write/Execution memory protection;
- ♦ Minimum 128B address range.

## • Core Local Interrupt Controller (CLIC)

The E907 processor implements the RISC-V standard interrupt controller: CLIC and CLINT. The CLIC has the following features:

- Support up to 240 external interrupts;
- ♦ Up to 32 priority levels;
- ♦ Support level or positive/negative edge interrupt types;
- Support hardware vector interrupts;
- ♦ The control registers are memory mapped.

#### Debug Components

The E907 processor adopts RISC-V v0.13.2 version debug spec with standard JTAG to communicate between the host and the E907 debug unit. E907 has done a lot of optimizations on the debugger and probe and has achieved 800KB/s-900KB/s download speed, 4 times faster than the common solutions in the market. The debug unit supports the following features:

- ♦ Multi-level configurations to meet various system requirements;
- Support hardware/software breakpoints;
- Support a variety of trigger settings;
- ♦ Supply an independent master port to access the SoC resources;
- ♦ Check and modify CPU register resource;
- ♦ Single step or multi step flexibly supported;

#### • Hardware Performance Monitor (HPM)

The E907 processor implements an optional RISC-V standard HPM to enable software developers to profile the performance. The HPM has the following features:

- ♦ Support the ratio of branch prediction profiling;
- Support the cache miss rate profiling;
- ♦ Support the execution number of instructions and CPU cycles profiling;
- Support profiling under machine and user mode;

#### Interfaces

The E907 has a 64-bit AMBA4 AXI master bus and a fast peripheral master bus to communicate with the external memory or peripheral IP. The internal requests can be allocated to either bus according to the address.

The fast peripheral master bus has 32-bit data width and adopts the AHB5 protocol spec. Transactions are directly sent out through the fast peripheral port after the address calculation, bypassing the data cache pipeline. Thus, the fast peripheral bus can supply the ability of accessing the SRAM or slave IP with low latency.

#### • T-Head MCU Enhanced Extensions (TME)

Oriented towards the MCU/MPU applications, XuanTie processor architecture enriched the RISC-V spec especially on the performance and interrupt response speed.

- ♦ Support fast interrupt handling and the response time is 20 CPU cycles compared to over 50 cycles using the standard spec design;
- ♦ Support tail-chain for both vector and non-vector interrupts;
- Support hardware interrupt stack swapping;
- ♦ Support NMI;
- ♦ Support Lockup;
- ♦ Support sleep and deep sleep;
- ♦ Support software reset operation;
- ♦ Support configurable reset address through top port during integration;
- ♦ 56 extended instructions, including cache maintenance, bitwise operations, load/store enhancements and interrupt acceleration.



#### Software Ecosystems

- ♦ Supply Keil-like Integrated Development Environment (CDK) and support mainstream IDE and debug probes such as IAR IDE, OpenOCD, Lauterbach debugger, Segger J-Link;
- Performance optimized compiler, assembler, linker, and binary tools are contributed to GNU and supported officially;
- ♦ Extended instructions are supported by GCC and LLVM;
- ♦ E907 is supported by QEMU;
- ♦ Deep optimized CSI-DSP lib;
- ♦ Code size optimized runtime lib;
- ♦ Support FreeRTOS, uCOS, RT-Thread and AliOS-Things.

# **Processor Configuration Options**

The XuanTie E907 processor has configurable options that can be set during the integration.

| Feature               | Options                             |
|-----------------------|-------------------------------------|
| FPU                   | No/Single/Single+Double             |
| DSP                   | No/Yes                              |
| Instruction Cache     | 2KB/4KB/8KB/16KB/32KB               |
| Data Cache            | 2KB/4KB/8KB/16KB/32KB               |
| Branch History Table  | Not included or 2Kb/4Kb/8Kb/16Kb    |
| Branch Target Buffer  | Not included or 16-entry BTB        |
| AXI Outstanding Reads | 2 (Area optimized) or               |
|                       | 6 (Performance optimized)           |
| Interrupts            | 32/64/96/128/192/240                |
| PMP                   | Not included or 4/8/12/16 regions   |
| НРМ                   | Not included or RISC-V standard HPM |
| Debug Resources       | Minimum/Typical/Maximum             |
|                       |                                     |

## PPA

| Dhrystone:2.0 DMIPS/MHz (O2) Coremark:3.85 Coremark/MHz (O3) Linpackd:185.2KFLOPS/MHz Whetstone:2.61MFLOPS/MHz 4*8x8 MAC/cycle 2*16x16 MAC/cycle |
|--------------------------------------------------------------------------------------------------------------------------------------------------|
| 1.0GHz@worst case                                                                                                                                |
| Std cell: 132K Gates                                                                                                                             |
| Dynamic@Dhry: 29uW/MHz<br>Leakage: 1.4mW                                                                                                         |
|                                                                                                                                                  |

<sup>\*</sup> Process is TSMC 28nm HPCPlus, 9T, RVT&HVT. The above data is obtained after floor plan, with RC.

<sup>\*\*</sup> Configuration is RV32IMAC with 8KB ICACHE/8KB DCACHE/8Kb BHT/BTB/CLIC32.

<sup>\*\*\*</sup> Process is TSMC 28nm HPCPlus, 7T, HVT. Data only consists of logic cells.