In arm1176jz-s, each cache is implemented as a four-way set associative cache of configurable size. The caches are virtually indexed and physically tagged. You can configure the cache sizes in the range of 4 to 64KB. Both the Instruction Cache and the Data Cache can provide two words per cycle for all requesting sources.
Each cache way is architecturally limited to 16KB in size, because of the limitations of the virtually indexed, physically tagged implementation. The number of cache ways is fixed at four, but the cache way size can vary between 1KB and 16KB in powers of 2. The line length is not configurable and is fixed at eight words per line.
Write operations must occur after the Tag RAM reads and associated address comparisons are complete. A three-entry Write Buffer is included in the cache to enable the written words to be held until they can be written to cache. One or two words can be written in a single store operation. The addresses of these outstanding writes provide an additional input to the Tag RAM comparison for reads.
To avoid a critical path from the Tag RAM comparison to the enable signals for the data RAMs, there is a minimum of one cycle of latency between the determination of a hit to a particular way, and the start of writing to the data RAM of that way. This requires the Data Cache Write Buffer to hold three entries, for back-to-back writes. Accesses that read the dirty bits must also check the Data Cache Write Buffer for pending writes that result in dirty bits being set.
The cache dirty bits for the Data Cache are updated when the Data Cache Write Buffer data is written to the RAM. This requires the dirty bits to be held as a separate storage array. Significantly, the Tag arrays cannot be written, because the arrays are not accessed during the data RAM writes, but permits the dirty bits to be implemented as a small RAM.
The other main operations performed by the cache are cache line refills and Write-Back. These occur to particular cache ways, that are determined at the point of the detection of the cache miss by the victim selection logic.
To reduce overall power consumption, the number of full cache reads is reduced by the sequential nature of many cache operations, especially on the instruction side. On a cache read that is sequential to the previous cache read, only the data RAM set that was previously read is accessed, if the read is within the same cache line. The Tag RAM is not accessed at all during this sequential operation.