This page provides detailed information about the Imperas Instruction Set Simulator for the ARM armv8-a (AArch64) processor core.
This page is information about the armv8-a alias of the AArch64 variant.
Processor IP owner is ARM Holdings. More information is available from them here.
The Imperas Instruction Set Simulator (ISS) is a product of Imperas Software Ltd. It is a binary licensed by Imperas as a commercial product. It is also available in OVP packages for evaluation and demonstration.
The Imperas ISS uses the OVP Fast Processor Models and dynamically loads them as selected.The OVP Fast Processor Models are written using the OVP VMI API that provides a Virtual Machine Interface that defines the behavior of the processor.The VMI API makes a clear line between model and simulator allowing very good optimization and world class high speed performance.
The processor models are provided as a binary shared object and are also available as source (different models have different licensing conditions). This allows the download and use of the model binary or the use of the source to explore and modify the model.
The models have been run through an extensive QA and regression testing process.
If you develop your own processor models using the OVP VMI APIs then they can be used with the Imperas ISS.
Traditionally, processor models and simulators make use of one thread on the host PC. Imperas have developed a technology, called QuantumLeap, that makes use of the many host cores found in modern PC/workstations to achieve industry leading simulation performance. To find out about the Imperas parallel simulation lookup Imperas QuantumLeap. There are videos of QuantumLeap on ARM here, and MIPS here. For press information related to QuantumLeap for ARM click here or for MIPS click here. Many of the OVP Fast Processor Models have been qualified to work with QuantumLeap - this is indicated for this model below.
This ISS executes instructions of the target architecture and provides an interface for debug access. An interface to GDB is provided and this allows the connection of many industry standard debuggers that use the GDB/RSP interface. For more information watch the OVP video here.
The ISS also works with the Imperas Multicore Debugger and advanced Verification, Analysis and Profiling tools.
The ISS is downloadable (needs registration and to be logged in) in package Demo_Processors for Windows32 and for Linux32. Note that the ISS is also available for 64 bit hosts as part of the commercial products from Imperas.
This ISS uses the CPU with Model Variant name: armv8-a (AArch64)
ARM Processor Model
Usage of binary model under license governing simulator usage.
Note that for models of ARM CPUs the license includes the following terms:
Licensee is granted a non-exclusive, worldwide, non-transferable, revocable licence to:
If no source is being provided to the Licensee: use and copy only (no modifications rights are granted) the model for the sole purpose of designing, developing, analyzing, debugging, testing, verifying, validating and optimizing software which: (a) (i) is for ARM based systems; and (ii) does not incorporate the ARM Models or any part thereof; and (b) such ARM Models may not be used to emulate an ARM based system to run application software in a production or live environment.
If source code is being provided to the Licensee: use, copy and modify the model for the sole purpose of designing, developing, analyzing, debugging, testing, verifying, validating and optimizing software which: (a) (i) is for ARM based systems; and (ii) does not incorporate the ARM Models or any part thereof; and (b) such ARM Models may not be used to emulate an ARM based system to run application software in a production or live environment.
In the case of any Licensee who is either or both an academic or educational institution the purposes shall be limited to internal use.
Except to the extent that such activity is permitted by applicable law, Licensee shall not reverse engineer, decompile, or disassemble this model. If this model was provided to Licensee in Europe, Licensee shall not reverse engineer, decompile or disassemble the Model for the purposes of error correction.
The License agreement does not entitle Licensee to manufacture in silicon any product based on this model.
The License agreement does not entitle Licensee to use this model for evaluating the validity of any ARM patent.
Source of model available under separate Imperas Software License Agreement.
Instruction pipelines are not modeled in any way. All instructions are assumed to complete immediately. This means that instruction barrier instructions (e.g. ISB, CP15ISB) are treated as NOPs, with the exception of any undefined instruction behavior, which is modeled. The model does not implement speculative fetch behavior. The branch cache is not modeled.
Caches and write buffers are not modeled in any way. All loads, fetches and stores complete immediately and in order, and are fully synchronous (as if the memory was of Strongly Ordered or Device-nGnRnE type). Data barrier instructions (e.g. DSB, CP15DSB) are treated as NOPs, with the exception of any undefined instruction behavior, which is modeled. Cache manipulation instructions are implemented as NOPs, with the exception of any undefined instruction behavior, which is modeled.
Real-world timing effects are not modeled: all instructions are assumed to complete in a single cycle.
Performance Monitors are implemented as a register interface only except for the cycle counter, which is implemented assuming one instruction per cycle.
TLBs are architecturally-accurate but not device accurate. This means that all TLB maintenance and address translation operations are fully implemented but the cache is larger than in the real device.
Debug registers are implemented but non-functional (which is sufficient to allow operating systems such as Linux to boot). Debug state is not implemented.
Models have been extensively tested by Imperas. ARM Cortex-A models have been successfully used by customers to simulate SMP Linux, Ubuntu Desktop, VxWorks and ThreadX on Xilinx Zynq virtual platforms.
AArch64 is implemented at EL3, EL2, EL1 and EL0.
The following ARMv8.2 core features are implemented: ARMv8.2-FP16, SVE
Security extensions are implemented (also known as TrustZone). To make non-secure accesses visible externally, override ID_AA64MMFR0_EL1.PARange to specify the required physical bus size (32, 36, 40, 42, 44, 48 or 52 bits) and connect the processor to a bus one bit wider (33, 37, 41, 43, 45, 49 or 53 bits, respectively). The extra most-significant bit is the NS bit, indicating a non-secure access. If non-secure accesses are not required to be made visible externally, connect the processor to a bus of exactly the size implied by ID_AA64MMFR0_EL1.PARange.
VMSA EL1, EL2 and EL3 stage 1 address translation is implemented. VMSA stage 2 address translation is implemented.
LPA (large physical address extension) is implemented as standard in ARMv8.
TLB behavior is controlled by parameter ASIDCacheSize. If this parameter is 0, then an unlimited number of TLB entries will be maintained concurrently. If this parameter is non-zero, then only TLB entries for up to ASIDCacheSize different ASIDs will be maintained concurrently initially; as new ASIDs are used, TLB entries for less-recently used ASIDs are deleted, which improves model performance in some cases (especially when 16-bit ASIDs are in use). If the model detects that the TLB entry cache is too small (entry ejections are very frequent), it will increase the cache size automatically. In this variant, ASIDCacheSize is 8
Advanced SIMD and Floating-Point Features:
SIMD and VFP instructions are implemented.
The model implements trapped exceptions if FPTrap is set to 1 in MVFR0 (for AArch32) or MVFR0_EL1 (for AArch64). When floating point exception traps are taken, cumulative exception flags are not updated (in other words, cumulative flag state is always the same as prior to instruction execution, even for SIMD instructions). When multiple enabled exceptions are raised by a single floating point operation, the exception reported is the one in least-significant bit position in FPSCR (for AArch32) or FPCR (for AArch64). When multiple enabled exceptions are raised by different SIMD element computations, the exception reported is selected from the lowest-index-number SIMD operation. Contact Imperas if requirements for exception reporting differ from these.
Trapped exceptions are implemented in this variant (FPTrap=1)
SVE is implemented, with 512 bit vectors. The exact set of supported vector widths can be specified using parameter "SVEImplementedSizes". This parameter is a mask in which vector size Nx128 is supported if bit 1<<(N-1) is set in the mask. For example, a mask value of 0xf indicates that 128, 256, 384 and 512 bit vector widths are supported. As a second example, a value of 0x8b indicates that 128, 256, 512 and 1024 bit vector widths are supported. In this variant, "SVEImplementedSizes" is 0xf.
For SVE First-fault and Non-fault vector loads, the UNKNOWN value returned for faulting loads (or those disabled by the FFR) can be specified using parameter "SVEFaultUnknown". In this variant, "SVEFaultUnknown" is 0xdfdfdfdfdfdfdfdf.
When operating with a vector length less than the maximum length, the value of inaccessible vector and predicate elements is preserved.
Generic Timer is present. Use parameter "override_timerScaleFactor" to specify the counter rate as a fraction of the processor MIPS rate (e.g. 10 implies Generic Timer counters increment once every 10 processor instructions).
It is possible to enable model debug features in various categories. This can be done statically using the "override_debugMask" parameter, or dynamically using the "debugflags" command. Enabled debug features are specified using a bitmask value, as follows:
Value 0x004: enable debugging of MMU/MPU mappings.
Value 0x080: enable debugging of all system register accesses.
Value 0x100: enable debugging of all traps of system register accesses.
Value 0x200: enable verbose debugging of other miscellaneous behavior (for example, the reason why a particular instruction is undefined).
Value 0x400: enable debugging of Performance Monitor timers
Value 0x800: enable dynamic validation of TLB entries against in-memory page table contents (finds some classes of error where page table entries are updated without a subsequent flush of affected TLB entries).
All other bits in the debug bitmask are reserved and must not be set to non-zero values.
This model implements a number of non-architectural pseudo-registers and other features to facilitate integration.
Memory Transaction Query:
Two registers are intended for use within memory callback functions to provide additional information about the current memory access. Register transactPL indicates the processor execution level of the current access (0-3). Note that for load/store translate instructions (e.g. LDRT, STRT) the reported execution level will be 0, indicating an EL0 access. Register transactAT indicates the type of memory access: 0 for a normal read or write; and 1 for a physical access resulting from a page table walk.
Page Table Walk Query:
A banked set of registers provides information about the most recently completed page table walk. There are up to six banks of registers: bank 0 is for stage 1 walks, bank 1 is for stage 2 walks, and banks 2-5 are for stage 2 walks initiated by stage 1 level 0-3 entry lookups, respectively. Banks 1-5 are present only for processors with virtualization extensions. The currently active bank can be set using register PTWBankSelect. Register PTWBankValid is a bitmask indicating which banks contain valid data: for example, the value 0xb indicates that banks 0, 1 and 3 contain valid data.
Within each bank, there are registers that record addresses and values read during that page table walk. Register PTWBase records the table base address, register PTWInput contains the input address that starts a walk, register PTWOutput contains the result address and register PTWPgSize contains the page size (PTWOutput and PTWPgSize are valid only if the page table walk completes). Registers PTWAddressL0-PTWAddressL3 record the addresses of level 0 to level 3 entries read, respectively. Register PTWAddressValid is a bitmask indicating which address registers contain valid data: bits 0-3 indicate PTWAddressL0-PTWAddressL3, respectively, bit 4 indicates PTWBase, bit 5 indicates PTWInput, bit 6 indicates both PTWOutput and PTWPgSize. For example, the value 0x73 indicates that PTWBase, PTWInput, PTWOutput, PTWPgSize and PTWAddressL0-L1 are valid but PTWAddressL2-L3 are not. Register PTWAddressNS is a bitmask indicating whether an address is in non-secure memory: bits 0-3 indicate PTWAddressL0-PTWAddressL3, respectively, bit 4 indicates PTWBase, bit 6 indicates PTWOutput (PTWInput is a VA and thus has no secure/non-secure info). Registers PTWValueL0-PTWValueL3 contain page table entry values read at level 0 to level 3. Register PTWValueValid is a bitmask indicating which value registers contain valid data: bits 0-3 indicate PTWValueL0-PTWValueL3, respectively.
Artifact Page Table Walks:
Registers are also available to enable a simulation environment to initiate an artifact page table walk (for example, to determine the ultimate PA corresponding to a given VA). Register PTWI_EL1S initiates a secure EL1 table walk for a fetch. Register PTWD_EL1S initiates a secure EL1 table walk for a load or store (note that current ARM processors have unified TLBs, so these registers are synonymous). Registers PTW[ID]_EL1NS initiate walks for non-secure EL1 accesses. Registers PTW[ID]_EL2 initiate EL2 walks. Registers PTW[ID]_S2 initiate stage 2 walks. Registers PTW[ID]_EL3 initiate AArch64 EL3 walks. Finally, registers PTW[ID]_current initiate current-mode walks (useful in a memory callback context). Each walk fills the query registers described above.
MMU and Page Table Walk Events:
Two events are available that allow a simulation environment to be notified on MMU and page table walk actions. Event mmuEnable triggers when any MMU is enabled or disabled. Event pageTableWalk triggers on completion of any page table walk (including artifact walks).
Artifact Address Translations:
A simulation environment can trigger an artifact address translation operation by writing to the architectural address translation registers (e.g. ATS1CPR). The results of such translations are written to an integration support register artifactPAR, instead of the architectural PAR register. This means that such artifact writes will not perturb architectural state.
A simulation environment can cause TLB state for one or more address translation regimes in the processor to be flushed by writing to the artifact register ResetTLBs. The argument is a bitmask value, in which non-zero bits select the TLBs to be flushed, as follows:
Bit 0: EL0/EL1 stage 1 secure TLB
Bit 1: EL0/EL1 stage 1 non-secure TLB
Bit 2: EL2 stage 1 TLB
Bit 3: EL0/EL1 non-secure stage 2 TLB
Bit 4: EL3 stage 1 TLB
Halt Reason Introspection:
An artifact register HaltReason can be read to determine the reason or reasons that a processor is halted. This register is a bitfield, with the following encoding: bit 0 indicates the processor has executed a wait-for-event (WFE) instruction; bit 1 indicates the processor has executed a wait-for-interrupt (WFI) instruction; and bit 2 indicates the processor is held in reset.
System Register Access Monitor:
If parameter "enableSystemMonitorBus" is True, an artifact 32-bit bus "SystemMonitor" is enabled for each PE. Every system register read or write by that PE is then visible as a read or write on this artifact bus, and can therefore be monitored using callbacks installed in the client environment (use opBusReadMonitorAdd/opBusWriteMonitorAdd or icmAddBusReadCallback/icmAddBusWriteCallback, depending on the client API). The format of the address on the bus is as follows:
bits 31:26 - zero
bit 25 - 1 if AArch64 access, 0 if AArch32 access
bit 24 - 1 if non-secure access, 0 if secure access
bits 23:20 - CRm value
bits 19:16 - CRn value
bits 15:12 - op2 value
bits 11:8 - op1 value
bits 7:4 - op0 value (AArch64) or coprocessor number (AArch32)
bits 3:0 - zero
As an example, to view non-secure writes to writes to CNTFRQ_EL0 in AArch64 state, install a write monitor on address range 0x020e0330:0x020e0333.
System Register Implementation:
If parameter "enableSystemBus" is True, an artifact 32-bit bus "System" is enabled for each PE. Slave callbacks installed on this bus can be used to implement modified system register behavior (use opBusSlaveNew or icmMapExternalMemory, depending on the client API). The format of the address on the bus is the same as for the system monitor bus, described above.
The CPU model being used is downloadable (needs registration and to be logged in) in package arm.model for Windows32 and for Linux32. Note that the CPU model is also available for 64 bit hosts as part of the commercial products from Imperas.
OVP simulator downloadable (needs registration and to be logged in) in package OVPsim for Windows32 and for Linux32. Note that the OVP simulator is also available for 64 bit hosts as part of the commercial products from Imperas.
OVP Download page here.
OVP documentation that provides overview information on processor models is available OVP_Guide_To_Using_Processor_Models.pdf.
Full model specific documentation on the variant (armv8-a (AArch64)) being used in this ISS is available OVP_Model_Specific_Information_arm_AArch64.pdf.
For more information on the Imperas ISS see the Imperas site and on the OVP Fast Processor model see the OVPworld site.
Location: The Fast Processor Model source and object file is found in the installation VLNV tree: arm.ovpworld.org/processor/arm/1.0
Processor Endian-ness: This model can be set to either endian-ness (normally by a pin, or the ELF code).
Processor ELF Code: The ELF code for this model is: 0xb7
QuantumLeap Support: The processor model is qualified to run in a QuantumLeap enabled simulator.
Information on the armv8-a OVP Fast Processor Model can also be found on other web sites::
www.ovpworld.org has the library pages http://www.ovpworld.org/library/wikka.php?wakka=CategoryProcessor
www.imperas.com has more information on the model library
http://www.ovpworld.org: Creating Instruction Accurate Processor models using the VMI API
http://www.ovpworld.org: Function by function Reference Guide for BHM / PPM APIs.
Currently available Instruction Set Simulator (ISS) Families.