#### Lecture 14 ## Direct-mapped cache ## A Typical Memory Hierarchy □ Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology #### Characteristics of the Memory Hierarchy #### **Cache Basics** - Two questions to answer (in hardware): - Q1: How do we know if a data item is in the cache? - Q2: If it is, how do we find it? - Direct mapped - Each memory block is mapped to exactly one block in the cache - lots of lower level blocks must share blocks in the cache - Address mapping (to answer Q2): (block address) modulo (# of blocks in the cache) - Have a tag associated with each cache block that contains the address information (the upper portion of the address) required to identify the block (to answer Q1) # Caching: A Simple First Example #### Cache Index Valid Tag Data | 00 | | : | | | | | | | | | | | | |----|--|---|------|----|---|----|----|----|----|----|----|----|----| | 01 | | | | | | | | | | | | | | | 10 | | | | | | | | | | | | | | | | | | <br> | ٠. | 1 | ٠. | ٠. | ٠. | ٠. | ٠. | 17 | ٠. | ٠. | Q1: Is it there? Compare the cache tag to the high order 2 memory address bits to tell if the memory block is in the cache One word blocks Two low order bits define the byte in the word (32b words) Q2: How do we find it? Use next 2 low order memory address bits — the index — to determine which cache block (i.e., modulo the number of blocks in the cache) (block address) modulo (# of blocks in the cache) # Caching: A Simple First Example (block address) modulo (# of blocks in the cache) ## MIPS Direct Mapped Cache Example One word blocks, cache size = 1K words (or 4KB) What kind of locality are we taking advantage of? # **Direct Mapped Cache** Consider the main memory word reference string Start with an empty cache - all blocks initially marked as not valid 0 1 2 3 4 3 4 15 # **Direct Mapped Cache** Consider the main memory word reference string Start with an empty cache - all blocks initially marked as not valid 0 1 2 3 4 3 4 15 **0** miss | 00 | Mem(0) | |----|--------| | | | | | | | | | 1 miss | 00 | Mem(0) | |----|--------| | 00 | Mem(1) | | | , , | | | | **2** miss | 00 | Mem(0) | |----|--------| | 00 | Mem(1) | | 00 | Mem(2) | | | | 3 miss | 00 | Mem(0) | |----|--------| | 00 | Mem(1) | | 00 | Mem(2) | | 00 | Mem(3) | 4 miss | 01 | | | |----|----|--------| | ΟŢ | 8 | Mem(0) | | | 00 | Mem(1) | | | 00 | Mem(2) | | | 00 | Mem(3) | 3 hit | 01 | Mem(4) | | |----|--------|--| | 00 | Mem(1) | | | 00 | Mem(2) | | | 00 | Mem(3) | | 4 hit | 01 | Mem(4) | |----|--------| | 00 | Mem(1) | | 00 | Mem(2) | | 00 | Mem(3) | **15** miss | | 01 | Mem(4) | |----|----|--------| | | 00 | Mem(1) | | | 00 | Mem(2) | | 11 | 90 | Mem(3) | 8 requests, 6 misses ## Multiword Block Direct Mapped Cache Four words/block, cache size = 1K words What kind of locality are we taking advantage of? # Taking Advantage of Spatial Locality • Let cache block hold more than one word Start with an empty cache - all blocks initially marked as not valid 0 1 2 3 4 3 4 15 | 4 | | |---|--| | | | | | | ## Taking Advantage of Spatial Locality • Let cache block hold more than one word Start with an empty cache - all blocks initially marked as not valid 0 1 2 3 4 3 4 15 0 miss | 00 | Mem(1) | Mem(0) | |----|--------|--------| | | | | 1 hit | 00 | Mem(1) | Mem(0) | |----|--------|--------| | | | | 2 miss | 00 | Mem(1) | Mem(0) | |----|--------|--------| | 00 | Mem(3) | Mem(2) | 3 hit | 00 | Mem(1) | Mem(0) | |----|--------|--------| | 00 | Mem(3) | Mem(2) | 4 miss | $\mathbf{O}$ | 1 | | <b>-</b> . | |--------------|----|--------|------------| | 0 | 8 | Mem(1) | Mem(0) | | | 00 | Mem(3) | Mem(2) | 3 hit | 01 | Mem(5) | Mem(4) | |----|--------|--------| | 00 | Mem(3) | Mem(2) | 4 hit | 01 | | Mem(5 | 5) | Mem( | 4) | |----|---|-------|----|------|----| | 00 | ) | Mem(3 | 3) | Mem( | 2) | **15** miss | 1 | 101 | Mem(5) | Mem(4) | | |---|-----|--------|--------|---| | _ | 00 | Mem(3) | Mem(2) | 4 | 8 requests, 4 misses #### Miss Rate vs Block Size vs Cache Size Miss rate goes up if the block size becomes a significant fraction of the cache size because the number of blocks that can be held in the same size cache is smaller (increasing capacity misses) a. One-word-wide memory organization ## Handling Cache Misses (Single Word Blocks) - Read misses (I\$ and D\$) - stall the pipeline, fetch the block from the next level in the memory hierarchy, install it in the cache and send the requested word to the processor, then let the pipeline resume - Write misses (D\$ only) - 1. stall the pipeline, fetch the block from next level in the memory hierarchy, install it in the cache (which may involve having to evict a dirty block if using a write-back cache), write the word from the processor to the cache, then let the pipeline resume or - Write allocate just write the word into the cache updating both the tag and data, no need to check for cache hit, no need to stall or - 3. No-write allocate skip the cache write (but must invalidate that cache block since it will now hold stale data) and just write the word to the write buffer (and eventually to the next memory level), no need to stall if the write buffer isn't full ### Multiword Block Considerations - Read misses (I\$ and D\$) - Processed the same as for single word blocks a miss returns the entire block from memory - Miss penalty grows as block size grows - Early restart processor resumes execution as soon as the requested word of the block is returned - Requested word first requested word is transferred from the memory to the cache (and processor) first - Nonblocking cache allows the processor to continue to access the cache while the cache is handling an earlier miss - Write misses (D\$) - If using write allocate must first fetch the block from memory and then write the word to the block (or could end up with a "garbled" block in the cache (e.g., for 4 word blocks, a new tag, one word of data from the new block, and three words of data from the old block)