The whole problem with Van Neumann machine
... is power required for accessing the memory where program and data are stored. Compared to the power budged of actual computations, it used to be small in the previous century. No more - currently it is orders of magnitude higher than power used for actual computation. Additionally, the latency getting the data out of memory has not much improved in the past decades, compared to increasing CPU computing power. Even worse, since increasing parallelism had become the only viable choice for increased software speed, the synchronization of data in memory (that is, completed memory writes and cache synchronization between cores) has became critical to computing performance. There is little that can be done while we are still saddled with inefficient DRAM. However, FPGAs or ASICs also need to read and store data somewhere - even if the program is hard-wired. Of course for small programs there is nothing wrong with small amounts of SRAM, but things are different if you look to deploy these devices into wider environment, with large amounts of data flowing around. Which means they will hit memory limit too (actually I am pretty certain they are hitting it already). When much faster and cheap (both in terms of money and power budget) alternatives to DRAM become commercially available, the tables might turn again.
Still, it pays off to (and will continue to) know both hardware and software side of programming, so kudos for the article.