Parallelism is key
"Given that multiple read/write heads aren't viable on HDDs (unlike tapes), then the only way of fixing this is more, and smaller drives"
(I'll assume you meant multiple heads seeking different sectors simultaneously, since hard drives do have multiple heads already)
Why not have multiple heads on each side of the disk that seek independently of one another? Alternatively, mirrored drives could spin in a phase locked loop. Either way, you would halve the seek access time. However I suspect the aggregate performance is what people really care about.
"What this means is that if you quadruple density the achievable sequential read/write access density on a per GB basis is halved whilst the random access density is reduced by a factor of four. To read a 4TB drive from start to end is going to take over 12 hours."
I don't understand what the problem is. A hypothetical 10TB drive will always match or beat the performance of a 1TB drive. The random seek times should be identical, the sequential read will be 10x faster.
If one needs to improve the aggregate random seek times, one could use 10 * 1TB drives, the sequential read will still be 10x faster, and the aggregate seek time will be 1/10th for parallel asynchronous loads.
Consider a database server with high load on large datasets on a stripped setup across X disks. The db is not limited to sequential synchronous access, therefor it can issue numerous asynchronous requests in parallel. Each disk has probability 1/10 of serving a request (depending on stripping and record allocation), therefor, with enough requests, each disk could be kept busy in parallel such that there are linear gains over an individual disk.
I welcome any rebuttals, but I'd like those to address why a parallel asynchronous requests can not scale across more spindles in the same way that they can across a cluster of computers.