Computer Architecture
- How hardware and software interact
- Outline of the system's functionality, design and compatibility
- New computer architectures are built and tweaked inside another computer architecture or interior part of an FPGA, as a microprocessor
- It is basically the function of the CPU and how it controls the connected devices when a program runs
- Outline of the system's functionality, design and compatibility
- New computer architectures are built and tweaked inside another computer architecture or interior part of an FPGA, as a microprocessor
- It is basically the function of the CPU and how it controls the connected devices when a program runs
System Design of Computer Architecture - includes all the hardware components in the system such as data processor, direct memory access, and GPU. It also includes data paths, memory controllers, and other abstracts ie: memory virtualization, multiprocessing
There are 3 main parts to focus on when it comes to Computer Architecture: Control Unit, Arithmetic Logic Unit, and Register. We will look into the Cache component later.
- Control Unit
- It basically directs the flow of data between CPU and the other devices
- Orchestrates fetching (from memory), decoding -- using *binary decoder to convert coded instructions into timing & control signals that direct other units (ie:- memory, ALU, I/O devices)
* binary decoder = n coded input to a max of 2^n unique outputs
- Orchestrates fetching (from memory), decoding -- using *binary decoder to convert coded instructions into timing & control signals that direct other units (ie:- memory, ALU, I/O devices)
* binary decoder = n coded input to a max of 2^n unique outputs
|
- For each instruction executed in a CPU, it goes through:
1) Fetch, 2) Decode, 3) Execute, and 4) Write to memory - A Clock speed indicates the number of instructions a CPU can execute per second ~ typically 3.5 GHz ~ 3.5 billion instructions per second - The 4 steps must be executed in order but the CPU can execute multiple steps in parallel for different instructions => several instructions are stored in the pipeline |
- Registers
- A CPU register is a small set of data holding places that are part of computer processor
- It may hold an instruction, a storage address, temporary variable values or any kind of data
- It may hold an instruction, a storage address, temporary variable values or any kind of data
- Arithmetic Logic Unit
- ALUs perform arithmetic and logic operations
- The input and outputs values are taken and stored in the registers
- The input and outputs values are taken and stored in the registers
Types of Computer Architecture
- John Von-Neumann Architecture
- The memory unit can store general value data and instructions
- The Control unit has the ability to hold the instructions in the program counter (PC) the instruction register (IR) - Von-Neumann's architecture is easier to execute in real hardware so it's more commonly used - Harvard Architecture - Code/Instruction data is stored in a different memory section than the general value data
- Ex:- Punch card uses Harvard Architecture A System Design Architecture is when the computer design process is used among people and businesses for processing marketing info, and product design
|
|
- Instruction Set Architecture
- An abstract model of the computer that defines how the CPU is controlled by the software
- It is an interface between the hardware and software on how the CPU should get the work done - Understanding how compilers make use of the instructions to execute helps developers write more efficient code - When there's a built-in processor within ISA => Micro-architecture, where all electronic elements are connected with data pathways for optimal completion of instructions = Computer Organization |
|
3 Basic Components of Microprocessor
- Address lines: refers to the address of a block
- Data Lines: data buses used for data transfer
- IC Chips: vital for data processing in a microchip
Parts of an Instruction
ex: ADD R0 R1 R2
- ADD is opcode - R0, R1, R2 are address fields - some have mode field: how to perform op or effectiveness of address field
Hazard
- a situation that prevents the next instruction from the instruction stream from executing during its clock cycle
- Structural Hazard: hardware cannot support all possible combinations of instructions simultaneously in overlapped execution
- To Avoid: Fully pipeline the stages
- Data Hazard: instruction depends on the result of the previous instruction
- To Avoid: Forwarding / Add stalls
- Control Hazard: caused by the pipelining of branches and other instructions that change the PC
- To Avoid: Freeze or flush pipeline = holding/deleting any instruction after the branch destination is known OR use branch prediction
RAID (Redundant Array of Independent Disks)
- On a case of disk failure, => RAID can be used to prevent data loss
- refers to the hard drives connected and set up in ways to help accelerate / protect the performance of a computer's disk storage
- typically used on servers and high-performance computers
- 4 common types of RAID: 0, 1, 5, 10
- RAID 0: Spread data among all disks => Data loss will still occur if 1 disk is destroyed but data write is fast
- RAID 1: Data is copied onto more than 1 disk => So each disk has at least 1 copy
- RAID 5: Requires 3 or more disk => Uses parity to recover 1 disk failure at a time (RAID 6 uses double parity to handle 2 disk failures)
- RAID 10: Each data is written to various disks, data from each of these various disks are copied
Interrupts
- Internal / Software Interrupts: Caused by special instruction and operate similar to a branch or jump instruction (usually when switching from user to supervisor to execute the instruction)
(exceptions caused by the program like divide by 0) - Synchronous, PC is incremented and has highest priority while vice versa for below
- External / Hardware Interrupts: Caused by an external hardware module/failure (ie: I/O devices)
Interrupt Service Routine
- Deals with hardware interrupts - used if an interrupt suspends a thread, can't determine when it will be executed
- Not like independent threads, but more like signals - runs whenever there's a signal from either software or hardware
Subroutine
- Part of code within a larger program = performs a specific task and is independent of the remaining code
- Can be determined when it will run
Hardware Methods to Establish Priority
- Daisy-chaining: connecting all devices that can request an interrupt in a serial manner - placed by priority of device
- Parallel priority: uses a register for which bits are configured separately by the interrupt signal from each device
5 Stages of DLX pipeline
- Instruction fetch, Instruction decode and register fetch, Execution, Memory access, Writeback
Write-Through Method
- updating the main memory for every write operation => data integrity = data is up-to-date, but not good at performance
Write-Back Method
- only the location in cache is updated, flag is set, then later saved to RAM => saves many write cycle
Direct Mapping
- RAM is used to store data and some data is stored in the cache.
- Address space is divided into 2 parts: index field and tag field
- Cache is used to store tag field while index field is stored in RAM
- Cache basically stores address of the Main memory which contain program data
Associative Mapping
- Uses several mapping functions to transfer data from main memory to cache memory
- Any main memory is mapped into any line of cache =>cache memory address is not used so associative cache controls processor and interprets request
DMA (Direct Memory Access)
- Allows an input/output device to receive/send data directly to/from the main memory, w/o going through the CPU
- This process is performed by the DMA controller chip to boost memory operations
Horizontal Microcode
- It contains the control signal w/o any intermediary
- Contained in a wide control store that comes with several discrete micro-operations that are combined into 1 micro-instruction for simultaneous operations
Vertical Microcode
- when only 1 micro-operation is performed at each CPU cycle
CPU is busy but have other tasks
- Create an interrupt that is non-maskable (unstoppable) and then give the jump instruction to the essential subroutine
Data Selector => Multiplexers are used to convert octal code to binary code, where dynamic memory uses the same address lines for both row and column
EEPROM is the type of memory whose content is erased, with the passage of electric discharge
For a pipeline with 'n' stages, the ideal throughput is n instructions, but pipeline does involve some overheads because it will increase cycle per instruction
Large numbers of registers in CPU => use ALU to connect them
If internal bus connects only register within CPU, use address register to select RAM memory address and then transfer to or from data register
Superscalar machine is a CPU that implements instruction-level parallelism within a single processor - can execute more than 1 instruction per clock cycle
VLIW (Very Long Instruction Word) is a CPU architecture that takes advantage of instruction-level parallelism. It executes operations in parallel which is based on a fixed schedule that is determined when compiled.
- Address lines: refers to the address of a block
- Data Lines: data buses used for data transfer
- IC Chips: vital for data processing in a microchip
Parts of an Instruction
ex: ADD R0 R1 R2
- ADD is opcode - R0, R1, R2 are address fields - some have mode field: how to perform op or effectiveness of address field
Hazard
- a situation that prevents the next instruction from the instruction stream from executing during its clock cycle
- Structural Hazard: hardware cannot support all possible combinations of instructions simultaneously in overlapped execution
- To Avoid: Fully pipeline the stages
- Data Hazard: instruction depends on the result of the previous instruction
- To Avoid: Forwarding / Add stalls
- Control Hazard: caused by the pipelining of branches and other instructions that change the PC
- To Avoid: Freeze or flush pipeline = holding/deleting any instruction after the branch destination is known OR use branch prediction
RAID (Redundant Array of Independent Disks)
- On a case of disk failure, => RAID can be used to prevent data loss
- refers to the hard drives connected and set up in ways to help accelerate / protect the performance of a computer's disk storage
- typically used on servers and high-performance computers
- 4 common types of RAID: 0, 1, 5, 10
- RAID 0: Spread data among all disks => Data loss will still occur if 1 disk is destroyed but data write is fast
- RAID 1: Data is copied onto more than 1 disk => So each disk has at least 1 copy
- RAID 5: Requires 3 or more disk => Uses parity to recover 1 disk failure at a time (RAID 6 uses double parity to handle 2 disk failures)
- RAID 10: Each data is written to various disks, data from each of these various disks are copied
Interrupts
- Internal / Software Interrupts: Caused by special instruction and operate similar to a branch or jump instruction (usually when switching from user to supervisor to execute the instruction)
(exceptions caused by the program like divide by 0) - Synchronous, PC is incremented and has highest priority while vice versa for below
- External / Hardware Interrupts: Caused by an external hardware module/failure (ie: I/O devices)
Interrupt Service Routine
- Deals with hardware interrupts - used if an interrupt suspends a thread, can't determine when it will be executed
- Not like independent threads, but more like signals - runs whenever there's a signal from either software or hardware
Subroutine
- Part of code within a larger program = performs a specific task and is independent of the remaining code
- Can be determined when it will run
Hardware Methods to Establish Priority
- Daisy-chaining: connecting all devices that can request an interrupt in a serial manner - placed by priority of device
- Parallel priority: uses a register for which bits are configured separately by the interrupt signal from each device
5 Stages of DLX pipeline
- Instruction fetch, Instruction decode and register fetch, Execution, Memory access, Writeback
Write-Through Method
- updating the main memory for every write operation => data integrity = data is up-to-date, but not good at performance
Write-Back Method
- only the location in cache is updated, flag is set, then later saved to RAM => saves many write cycle
Direct Mapping
- RAM is used to store data and some data is stored in the cache.
- Address space is divided into 2 parts: index field and tag field
- Cache is used to store tag field while index field is stored in RAM
- Cache basically stores address of the Main memory which contain program data
Associative Mapping
- Uses several mapping functions to transfer data from main memory to cache memory
- Any main memory is mapped into any line of cache =>cache memory address is not used so associative cache controls processor and interprets request
DMA (Direct Memory Access)
- Allows an input/output device to receive/send data directly to/from the main memory, w/o going through the CPU
- This process is performed by the DMA controller chip to boost memory operations
Horizontal Microcode
- It contains the control signal w/o any intermediary
- Contained in a wide control store that comes with several discrete micro-operations that are combined into 1 micro-instruction for simultaneous operations
Vertical Microcode
- when only 1 micro-operation is performed at each CPU cycle
CPU is busy but have other tasks
- Create an interrupt that is non-maskable (unstoppable) and then give the jump instruction to the essential subroutine
Data Selector => Multiplexers are used to convert octal code to binary code, where dynamic memory uses the same address lines for both row and column
EEPROM is the type of memory whose content is erased, with the passage of electric discharge
For a pipeline with 'n' stages, the ideal throughput is n instructions, but pipeline does involve some overheads because it will increase cycle per instruction
Large numbers of registers in CPU => use ALU to connect them
If internal bus connects only register within CPU, use address register to select RAM memory address and then transfer to or from data register
Superscalar machine is a CPU that implements instruction-level parallelism within a single processor - can execute more than 1 instruction per clock cycle
VLIW (Very Long Instruction Word) is a CPU architecture that takes advantage of instruction-level parallelism. It executes operations in parallel which is based on a fixed schedule that is determined when compiled.