DRAM has been produced at successively smaller scales. This reduces the power requirement per Mb and the latency per Mb. Both improvements have been trashed by the increased capacity. The CPU cache could be made bigger or faster - but that would cost speed or capacity.
Putting the cache on a separate die allows using a process dedicated to DRAM, which is cheaper than using a process that can create DRAM and a CPU on the same die. Also, using a separate die means the DRAM does not suffer from having to work at the CPU die's temperature, allowing the cache to be bigger, faster or cheaper.