Memory hierarchy
Memory in computers is desired to be fast as well as cheap, and currently, we do not have a technology that satisfies both. One other desirable feature of memory is that it should be abundant, therefore, the computers have a hierarchy of memory. The fastest memory is the costliest and therefore is very limited, often less than 1 KB. Similarly, as we move down the memory hierarchy the time required to access it increases, and so does the cost.
Registers
Registers are the fastest memory that is accessed with no latency since they are located inside the CPU, which is the reason for making them the costliest memory. Considering the cost overhead their size is very limited, 32 x 32 bits in a 32-bit CPU, and 64 x 64 bits in a 64-bit CPU, in both cases less than 1 KB. The programs running manage these registers themselves for optimal usage.
Cache
Cache memory is a small piece of memory that stores frequently accessed data for their fast access. It is mostly controlled by the hardware. It can be a software or a hardware component. The cache may store a copy of some frequently used data or instructions that are stored somewhere else or a result of some recent computation. When the requested data is found in the cache it is called a cache hit. Whereas, if the data is not found in the cache memory it is called a cache miss, and data is further looked into the main memory. Cache hit usually consumes two CPU cycles which in the case of a cache miss is much higher. The overall purpose of the cache is to improve performance by providing faster access.
The cache is also known as CPU memory. These are SRAM-based memories. It is often integrated with a CPU chip or placed on a different chip that has a separate bus interconnect with the CPU chip.
Levels of Cache Memory
Cache memories can be classified into different levels based on their accessibility to the microprocessor. With each increasing level, the cost per bit reduces, and the size of the cache memory increases while the cost of accessing it decreases.
Level 1/L1 Cache: It is also known as the primary cache and is the fastest cache. It is usually on the CPU hence has a zero wait-state or delay. It is implemented in SRAM
Level 2/L2 Cache: It may be located on the CPU or on a different chip with a high speed interconnect bus to make the memory access very fast. It is also referred to as a secondary cache. It is also implemented in SRAM.
Level 3/L3 Cache: It is a typically slower memory than L1 and L2 cache. It feeds the L2 and L3 cache for their faster access or to improve their performance. It is built on the motherboard within the CPU module. In case a system has multiple processors, each processor has its own L1 and L2 cache but there is a single L3 cache that is shared amongst all the processors.
Cache Coherence
In a multiprocessor system, each processor has its own L1 and L2 cache. When accessing the same shared variables there will be a copy of the variable in each cache and one in the main memory. Hence, it becomes important to update these variables timely and cache coherence ensures that. For example, if some shared variable 'x' is updated or written by one of the processors and the change has not been reflected in the main memory and other processor's cache, they might end up reading the older value which will be a bug.There exist three different levels of cache coherence:
1. A write operation to a shared variable in the cache must be reflected instantaneously to all other memory locations.
2. Writes to a shared variable will be observed in the same fashion they were done.
3. Different processors may assume different orders of writes to a variable. [Non-coherent]