Cache coherency
参考
Directory-based cache coherence - Wikipedia
Cache coherency protocols (examples) - Wikipedia
Cache Coherence - an overview | ScienceDirect Topics
Overview
In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of shared data: one copy in the main memory and one in the local cache of each processor that requested it. When one of the copies of data is changed, the other copies must reflect that change. Cache coherence is the discipline which ensures that the changes in the values of shared operands (data) are propagated throughout the system in a timely fashion.[1]
The following are the requirements for cache coherence:[2]
Write Propagation
Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches.
对于shareable memory allocation,如果一个core改动了它,所有的core/requestor都要看到这个改动。
Transaction Serialization
Reads/Writes to a single memory location must be seen by all processors in the same order.
对于同一个shareable memory location的读写,所有的core看到的读写顺序是一样的。写后读的时候,读到的数据都是最近一次写入的数据。
下面一段介绍write propagation:
In a multiprocessor system, consider that more than one processor has cached a copy of the memory location X. The following conditions are necessary to achieve cache coherence:[4]
- In a read made by a processor P to a location X that follows a write by the same processor P to X, with no writes to X by another processor occurring between the write and the read instructions made by P, X must always return the value written by P.
- In a read made by a processor P1 to location X that follows a write by another processor P2 to X, with no other writes to X made by any processor occurring between the two accesses and with the read and write being sufficiently separated, X must always return the value written by P2. This condition defines the concept of coherent view of memory. Propagating the writes to the shared memory location ensures that all the caches have a coherent view of the memory. If processor P1 reads the old value of X, even after the write by P2, we can say that the memory is incoherent.
Consistency是coherency的前提。解释如下:
Therefore, in order to satisfy Transaction Serialization, and hence achieve Cache Coherence, the following condition along with the previous two mentioned in this section must be met:
- Writes to the same location must be sequenced. In other words, if location X received two different values A and B, in this order, from any two processors, the processors can never read location X as B and then read it as A. The location X must be seen with values A and B in that order.[
Coherence mechanisms
The two most common mechanisms of ensuring coherency are snooping and directory-based, each having their own benefits and drawbacks. Snooping based protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors. The drawback is that snooping isn't scalable. Every request must be broadcast to all nodes in a system, meaning that as the system gets larger, the size of the (logical or physical) bus and the bandwidth it provides must grow. Directories, on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond) but use much less bandwidth since messages are point to point and not broadcast. For this reason, many of the larger systems (>64 processors) use this type of cache coherence.
Snooping[edit] Bus snooping - Wikipedia
Main article: Bus snooping
First introduced in 1983,[7] snooping is a process where the individual caches monitor address lines for accesses to memory locations that they have cached.[4] The write-invalidate protocols and write-update protocols make use of this mechanism.
For the snooping mechanism, a snoop filter reduces the snooping traffic by maintaining a plurality of entries, each representing a cache line that may be owned by one or more nodes. When replacement of one of the entries is required, the snoop filter selects for the replacement of the entry representing the cache line or lines owned by the fewest nodes, as determined from a presence vector in each of the entries. A temporal or other type of algorithm is used to refine the selection if more than one cache line is owned by the fewest nodes.[8]
Types of snooping protocols[edit]
There are two kinds of snooping protocols depending on the way to manage a local copy of a write operation:
Write-invalidate[edit]
When a processor writes on a shared cache block, all the shared copies in the other caches are invalidated through bus snooping. This method ensures that only one copy of a datum can be exclusively read and written by a processor. All the other copies in other caches are invalidated. This is the most commonly used snooping protocol. MSI, MESI, MOSI, MOESI, and MESIF protocols belong to this category.
Write-update[edit]
When a processor writes on a shared cache block, all the shared copies of the other caches are updated through bus snooping. This method broadcasts a write data to all caches throughout a bus. It incurs larger bus traffic than write-invalidate protocol. That is why this method is uncommon. Dragon and firefly protocols belong to this category.[3]
Directory-based[edit]
Main article: Directory-based cache coherence
In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.
A snoop filter determines whether a snooper needs to check its cache tag or not. A snoop filter is a directory-based structure and monitors all coherent traffic in order to keep track of the coherency states of cache blocks. It means that the snoop filter knows the caches that have a copy of a cache block. Thus it can prevent the caches that do not have the copy of a cache block from making the unnecessary snooping. There are three types of filters depending on the location of the snoop filters. One is a source filter that is located at a cache side and performs filtering before coherence traffic reaches the shared bus. Another is a destination filter that is located at receiver caches and prevents unnecessary cache-tag look-ups at the receiver core, but this type of filtering fails to prevent the initial coherence message from the source. Lastly, in-network filters prune coherence traffic dynamically inside the shared bus.[5] The snoop filter is also categorized as inclusive and exclusive. The inclusive snoop filter keeps track of the presence of cache blocks in caches. However, the exclusive snoop filter monitors the absence of cache blocks in caches. In other words, a hit in the inclusive snoop filter means that the corresponding cache block is held by caches. On the other hand, a hit in the exclusive snoop filter means that no cache has the requested cache block
总结一下
cache coherency 就是所有core看到的数据是一致的。它有两个方面:1)一个core写入的数据要被其他的core看到;2)所有core看到的读写顺序都是一样的。
实现机制有两种:1)snoop;2)directory based(snoop filter)。其实都是snoop,只不过后者对snoop进行了过滤。
snoop的类型有两种:1)write invalid;2)write update。前者比较简单,用的多。