Ref: ARM SystemonChip Architecture 2nd Edition by Steve Furber, Addison Wesley
Due Acknowledgement of the Reference URL at:
http:thinkingeek.com20130109armassemblerraspberrypichapter1
Structure of the ARM processor
Below is the structure of the ARM1176JZFS processor commonly used in many microcontrollersmobile devices.
We can see each processorcore are made up of different componentsfunctional units to serve various purposes.
2
About the coreprocessor
The ARM1176JZFS is a 32bit processorcore, i.e. the computer word length of each computer systemmeaning the processorcore can handle a 32bit instructiondata in each clock cycle.
00101011 01101001 10110110 00110101
32bit instructiondata
3
Supporting Units to the Core
Each coreprocessor is well supported with a no. of componentsunits inside the processor chip for fast computation and data storageretrieval;
The ARM1176JZFS has:
33 general
7 dedicatedspecialized registers;
Arithmetic Logic Unit ALUthe ALU performs all
arithmetic and logic operations, and generates the condition codes for instructions to set specific flags;
Vector Floating Point VFP CoProcessorfor much faster floatingpoint arithmetic.
;
purpose 32
bit registers
R0R32
4
Memory Management Unit..
The processor memory management unit MMU works with the cache memory system to control accesses to and from externalmain memory;
The MMU also controls the translation of virtual addresses to physical addresses;
Capacity of the MainExternal Memory : Storage Size i.e. No. of Memory Addresses X
Size of Each Memory AddressCell e.g. 232 X 32 bits
210 1 X1X1X22 X4X8bits16 G bytes since 8 bits1 byte
5
The Prefetch UnitInstruction Cache..
The prefetch unit fetches 16bit or 32bit instructions from the instruction cache also called Icache, Instruction Tightly Coupled Memory TCM, or from external memory and predicts the outcome of branches in the instruction stream to be covered later;
Modern microprocessors make extensive uses of caches for fast access and storage of datainstructions, L1L2L3 Dcache or Icache. Performance of caches : RegistersCachesMain memorymeans faster
6
LoadStore Unit LSU
The Load Store Unit LSU manages all LOAD and STORE operations, e.g. LOAD a value from the memoryaddressA0A1B007 toregisterR0;
The loadstore pipeline decouples load and store operations from the other pipelines such as those for the ALU operations.
7
VFPv2 Registers
ARMv6 defines a f loating point subarchitecture called the Vector Floatingpoint v2 VFPv2 for which the Raspberry Pi does provide a HW implementation;
We already know that the ARM architecture provides 16 general purpose registers r0 to r15, where some of them play special roles: r13, r14 and r15.
Despite their name, these general purpose registers do not allow operating floating point numbers in them, so VFPv2 provides us with some specific registers.
These VFPv2 registers are named s0 to s31, for single precision, and d0 to d15 for doubleprecision floatingpoint operations.
The 5Stage Pipeline in ARMS
Basically, the ARMS processor uses a 5stage pipeline with the prefetch unit occupying the first stage and the integer unit using the remaining four stages:
1. Instruction prefetch. IF 2.Instructiondecodeandregisterread. ID 3. Execute shift and ALU. EX 4.Datamemoryaccess. Mem
5. Writeback results. WB
The Pipeline Organization in ARMS
IFetch
IDEX MemWB
The ARMS Applications
The above ARMS was designed as a generalpurpose processor core that can readily be applications manufactured by ARM Limiteds many licensees.
It offers significantly two to three times higher performance than the simpler ARM7 cores for a similar increase in silicon area, and requires the support of doublebandwidth onchip memory if it is to realize its full potential.
One application of the ARMS core is to build a high performance CPU such as the ARM810.
Branch Prediction by the Prefetch Unit
The prefetch unit of the ARMS processor is responsible for branch prediction and uses static prediction based on the branch direction backwards branches are predicted taken, whereas forwards branches are predicted not taken to attempt to guess where the instruction stream will go;
the integer unit will compute the exact stream and issue corrections to the prefetch unit where necessary.
Independent Fetch Unit
Stream of Instructions Inorder Issue to Execute
Execution Unit Integer Unit in ARMS
Instruction PreFetch With Branch Prediction
Instruction fetch decoupled with Execution
Often issue logic Included with Fetch
Correctness Feedback on Branch Results
Prediction: Branches, Dependencies, or even Data..
Prediction has become essential to getting good performance from scalar instruction streams.
We will discuss predicting branches. However, architects are now predicting everything: data dependencies, actual data, and results of groups of instructions
at what point does computation become a probabilistic operationverification?
we are pretty close with control hazards already
Why does prediction work?
underlying algorithm has regularities;
data that is being operated on has regularities;
instruction sequence has redundancies that are artifacts of way that humanscompilers think about problems
Predictioncompressible information streams?
Dynamic Branch Prediction
Prediction could be static at compile time, as used in the ARMS architecture or dynamic at runtime
for our example, if we were to statically predict taken, we would only be wrong once each pass through loop;
Is dynamic branch prediction better than static branch prediction?
seems to be; still some debate to this effect we will see some analysis later;
today, lots of hardware being devoted to dynamic branch predictors.
Dynamic Branch Prediction..
Solution: 2bit scheme where change prediction only if get misprediction twice;
Red: stop, not taken
Green: go, taken
Adds hysteresis to decision making process.
END of Module4
Reviews
There are no reviews yet.