Much depends on the exact workload. If multiple threads work on the same data, so they can share the cache more effectively, it works a lot better than (as in my mous common use case) each thread works on different chunks of some (very large) image (several Gpixel). I haven't found hyperthreading to contribute anything in the latter case. On some other code, where we can process data from a single image row in multiple threads, you can get reasonable gains (30-40% as I recall).