That "12.8% improvement" has been debunked dozens of times on the web.
You can't do a "per core improvement" when looking at a fully utilized server. The performance numbers that we are quoting are for total throughput of 16 cores vs. 12 cores.
to give you an idea of how that scales, look at two examples:
AMD Opteron 4-core to 6-core - 50% more cores, 33% more throughput
Intel Xeon 4-core to 6-core - 50% more ccores, 33% more throughput
AMD Opteron 12-core to 16-core - 33% more cores, 50% more throughput
Obviously there is some architectural improvement that is boosting overall throughput. As for single threaded performance (the 12.8% often quoted), If you look at only 1 thread running on a 12-core and only 1 thread running on a 16-core, you will see significantly more performance increase than 12.8%.
When people try to make this comparison it is like trying to determine how long it will take you to get to the office at 3AM if it takes 30 minutes during rush hour. I can guarantee you that when traffic is lower at 3AM you can get there a lot quicker.