Cheatography

# computer architecture Cheat Sheet by hziad

### rate of instru­ction execution

 1-GHz clock. Instru­ction statistics in a large program are as follows: Branch 20% Load 20% Store 10% Comput­ational instru­ctions 50% 90% of instru­ction fetch operations are completed in one clock cycle and 10% are completed in 4 clock cycles. On average, access to the data operands of a Load or Store, instru­ction is completed in 3 clock cyclesOn average, instru­ction fetch takes 0.9+0.1x4 = 1.3 cycles. All instru­ctions, except Load and Store, take four more cycles to complete. Load and Store instru­ctions take two additional cycles, on average. Average completion time = 1.3 + (0.2 + 0.5) x 4 + (0.2 + 0.1) x 6 = 5.9 cycles Instru­ction rate = 109 /5.9 = 169.5 million instru­ctions per second a) Access to the memory is always completed in 1 clock cycle.Execution rate = 1x109 /5 = 200 million instru­ctions per second

### chapter 5 example

 ``````Call_register R9 • Calls a subroutine whose address is in register R9: 1. Memory address <- [PC], Read memory, IR <-Memory data, PC <-[PC] <- 4 2. Decode instruction, RA <-[R9] 3. PC-Temp<-[PC], PC <-[RA] 4. RY <- [PC-Temp] 5. Register LINK <- [RY] Q1: Assume that all memory access operations are completed in one clock cycle in a processor that has a 1-GHz clock. What is the frequency of memory access operations if Load and Store instructions constitute 20 percent of the dynamic instruction count in a program? (The dynamic count is the number of instruction executions, including the effect of program loops, which may cause some instructions to be executed more than once.) Assume that all instructions are executed in 5 clock cycles There is one memory access to fetch each instruction. Then, 20 percent of the instructions have a second memory access to read or write a memory operand. On average, each instruction has 1.2 memory accesses in 5 clock cycles. Therefore, the frequency of memory accesses is (1.2/5) × 10^9 , or 240 million accesses per second.(1 Ghz= 10^9 hz) Give the sequence of actions for a Return-from-subroutine instruction in a RISC processor. Assume that the address LINK of the general-purpose register in which the subroutine return address is stored is given in the instruction field connected to address A of the register file (IR31−27). Whenever an instruction is loaded into the IR, the contents of the generalpurpose register whose address is given in bits IR31−27 are read and placed into register RA. Hence, a Return-from-subroutine instruction will cause the contents of register LINK to be read and placed in register RA. Execution proceeds as follows: 1. Memory address←[PC], Read memory, Wait for MFC, IR←Memory data, PC←[PC] + 4 2. Decode instruction, RA←[LINK] 3. PC←[RA] 4. No action 5. No action At the time the instruction Load R6, 1000(R9) is fetched, R6 and R9 contain the values 4200 and 85320, respectively. Memory location 86320 contains 75900. step 1 and 2: the values are determined by the previous instructions stept 3: RA = 85320, RB = 4200. Step 4: RA = 85320, RB = 4200, RZ = 86320, RM= 4200. Step 5: RA = 85320, RB = 4200, RZ = 86320, RM= 4200 RY = 75900.``````

### 16M × 32 memory using 1M × 4 memory chips A 16M module can be structured as 16 rows, each containing eight 1M x 4 chips. A 24-bit address is required. Address lines A19-0 should be connected to all chips. Address lines A23-20 should be connected to a 4-bit decoder to select one of the 16 rows.

### A block-­set­-as­soc­iative cache

 ``````A block-set-associative cache consists of a total of 64 blocks, divided into 4-block sets. The main memory contains 4096 blocks, each consisting of 32 words. Assuming a 32-bit byte-addressable address space, how many bits are there in each of the Tag, Set, and Word fields? Each block contains 128 bytes, thus requiring a 7-bit Word field. There are 16 sets, requiring a 4-bit Set field. The remaining 21 bits of the address constitute the tag field.``````

### problem set Mult A = 010111 and B = 110110
b) A = 110011 and B = 101100
c) A = 001111 and B = 001111

### Question about the sequential circuit

 how to implement multip­lic­ation of 2’s-co­mpl­ement n-bit numbers using the Booth algorithm, by clearly specifying inputs and outputs for the Control sequencer and any other changes needed around the adder and register A. Both the A and M registers are augmented by one bit to the left to hold a sign extension bit. The adder is changed to an n + 1-bit adder. A bit is added to the right end of the Q register to implement the Booth multiplier recoding operation. It is initially set to zero. The control logic decodes the two bits at the right end of the Q register according to the Booth algorithm. The right shift is an arithmetic right shift as indicated by the repetition of the extended sign bit at the left end of the A register.

### R2, R3, R4, R5, R6, and R7

 ``````Load R2, #A_VEC Load R3, #B_VEC Load R4, #3 And R5, R5, R0  LOOP: Load R6, (R2) Load R7, (R3) Multiply R6, R6, R7 Add R5, R5, R6 Add R2, R2, #4 Add R3, R3, #4 Subtract R4, #1 Branch>0 LOOP Load R7, # RESULT Store R5. (R7) End ORIGIN 500 A_VEC: DATAWORD 05, -20, 10 B_VEC: DATAWORD 09, 04, 07 RESULT: RESERVE 4 R2=512, R3=524, R4=0, R5=25 R6=7 R7=524``````

### pipeline provides forwarding paths The result from theALU is 130 − 12 = 118. This result is available in register RZ during cycle 6. The result of the Or instru­ction, 130, is in register RY during in cycle 6. In cycle 6, the Subtract instru­ction is in the Memory stage. The unspec­ified instru­ction following the Subtract instru­ction is generating a result in the Compute stage. In cycle 7, the result of the unspec­ified instru­ction is in register RZ, and the result of the Subtract instru­ction is in register RY.

### Chapter 6 execution time and speed up

 ``````Assume that 20% of the dynamic count of the instructions executed for a program are branch instructions. There are no pipeline stalls due to data dependencies. • Static branch prediction is used with a not-taken assumption. a) Determine the execution times for two cases: when 30 percent of the branches are taken, and when 70 percent of the branches are taken. b) Determine the speedup for one case relative to the other. Express the speedup as a percentage relative to 1 In first case, 30% of branches are taken but we assumed not-taken, so they are mispredicted (one cycle penalty 30% from 20% branch instructions): The value of δbranch_penalty = 0.20 × 0.30 ×1 = 0.06 In second case, 70% of branches are taken but we assumed not-taken, so they are mispredicted (one cycle penalty 70% from 20% branchs): The value of δbranch_penalty = 0.20 × 0.70 ×1 = 0.14 Using S = 1 + δbranch_penalty, the execution time: in first case is (1.06 × N)/R and (1.14 × N)/R for the second case. b) Because the execution time for the first case is smaller, the performance improvement as a speedup percentage is:(1.14/1.06 -1)*100=7.5%``````

### Chapter 9

 Overflow =cn ⊕ cn−1 or xn−1 yn−1( ̅sn−1) +( ̅xn−1)( ̅yn−1) sn−1 For the subtra­ction operation X−Y on 2’s-co­mpl­ement numbers X and Y • We form the 2’s-co­mpl­ement of Y and add it to X . • Set Add/Sub =0 and c0= 0 for addition. • Set Add/Sub =1 and c0 = 1 for subtra­ction 4-bit adder has four carry-out signals: c1 = G0 + P0 c0, c2 = G1 + P1G0 + P1P0 c0, c3 = G2 + P2G1 + P2P1G0 + P2P1P0 c0, c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0 c0 A sequence of n addition cycles generates a 2n-bit product Delay = n * (adder + control delays) For n = 32, delay is approx­imately 32  14 = 448 gate delays Registers A and Q are shift registers, together, they hold partial product PPi while multiplier bit qi generates the signal Add/No­add.At the end of each cycle, C, A, and Q are shifted right one bit position to allow for growth of the partial product as the multiplier is shifted out of register Q

### • Non-Re­storing Division: Stage 1: Do the following two steps n times:
1. If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A and Q left and add M to A.
2. Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Stage 2: If the sign of A is 1, add M to A.

### example ### Store R6, X(R8) IR  Memory data,
PC  [PC]  4
2. Decode instru­ction,
RA [R8], RB [R6]
3. RZ [RA]  Immediate value X,
RM [RB]
Memory data [RM],
Write memory
5. No action (on Bus B) Memory address ← [PC],
Read memory, Wait for MFC, (on Bus C) IR ← Memory data,
PC ←[PC] + 4
2. Decode instru­ction
3. The contents of R5 and R6 are sent to ALU
using buses A and B,
R5 ←[R5] + [R6],
the sum is written to R5 using bus C

### Booth algorithm ### booth mult example ### Restoring Division • Do the following three steps n times:
1. Shift A and Q left one bit position.
2. Subtract M from A, and place the answer back in A.
3. If the sign of A is 1, set q0 to 0 and add M back to A (that is, restore A); otherwise, set q0 to 1.