"... seems to be no inherent limitation in using a ... flash cache ..."
Not sure I agree with that statement.
For one thing, cache, by definition, is a staging area; stale data isn't expected to hang around very long; i.e., once a particular chunk of data isn't considered "useful" any more (by whatever criteria the array controller uses to make such a determination), it's supposed to be flushed to disk. In so doing, one can keep the size of the cache -- which is expensive -- small (on a relative basis, compared to overall persistent storage capacity).
So let's say the server(s) connected to the controller keep reading and updating a very large sequential data set. The data set could be anything, perhaps a chunk of petroleum geo-prospecting data. This data set is large enough, in fact, to use almost the whole cache. Since the servers crunching the data keep updating the set, the chunk never gets flushed to disk; it is constantly held in cache. Since almost the entire cache is occupied by a single data set, I/O for other data sets ends up being heavily constrained. This might not be a problem, if the array is dedicated to a server cluster oriented toward a single task.
However, in the HPC/supercomputing arena, one is often dealing with multiple large data sets being processed in parallel, and competing for processor time and storage resources. In combination, these data sets could dwarf the size of the cache. Your cache becomes a bottleneck, because it isn't large enough to handle the competing storage access demands.
Better to build the whole array out of Flash, and (if necessary) use on-controller RAM for cache, methinks, than to use a flash cache with mechanicals hanging off the tail-end. Much more expensive, sure, but probably a lot less of a performance bottleneck for large sequential storage requests.