Mono
Dumb question: is this the same Mono as in the open source .Net?
It's not been a good summer for Samsung. It packed its Galaxy Note 7 smartphones with detonating batteries, sparking a global recall. And its whizzy Exynos 8890 processor, which powers the Note 7 and the Galaxy S7 and S7 Edge, is tripping up apps with seemingly bizarre crashes – from null pointers to illegal instruction …
Yes and Yes.
Xamarin bug is here - https://bugzilla.xamarin.com/show_bug.cgi?id=39859
Edit: clicky
Here, it has to be done by applications even, not the OS. The reason is, modifying program's own code is something that's "not normally done", so CPUs are not designed to automatically invalidate instruction cache lines when the associated memory gets modified. Even if that was the case, you still need to ensure the newly written code is flushed through to RAM from the _data_ cache before you can safely jump to it in a multi-core system.
Really? AFAIK, that's been deprecated since I learned to program on a PDP-8, well before Intel and of course ARM existed, and also of course well before this cache thing that attempts to hide lousy memory latency and bandwidth from a CPU so manuf's can deny there's been basically little progress in RAM all these years.
I'm sorry guys, you'e not smart enough to do self modifying code right. One really smart guy might pull it off on a small project, but no team will ever succeed at that.
Heck, even overuse of globals for "spooky action at a distance" is bad practice. And this at the system level?
I'm not sure I agree. This isn't a case of somebody deciding that a store absolute + a load absolute is six bytes and eight cycles but a store absolute + a load immediate is five bytes and six cycles so they'll do the less readable thing, it's a compiler just like any other compiler except that compilation is just-in-time and somebody didn't think enough to realise that constants you read from your processor may not be subjectively constant if processors are heterogeneous.
So as to the code generation itself, this is just a compiler doing exactly what compilers have always done. It isn't self modifying. It's one actor, and it's outputting another — not modifying it, and not modifying itself.
They've just messed up the announcement of completion.
I once had to look after self modifying code in the 1980s, it was a real lesson as overtime you listed code it was different :)
Making changes to the code that modified was fun as well as bugs tended to eat the whole system (you learn't the value of backups when developing).
I was then tasked with writing a report generator that looked at the system and worked out where the relevant data was and created the report. It is the only time in my career that I've had to use triple indirection (used double many times) and recursion together. I used to come home with stunning headaches and next day spend an hour working out what I'd written the previous day.
After a few weeks I'd got it working and written a user interface for selecting the data you wanted and how to layout the report type if you wanted a new report. All fully documented :)
I left a few months later and then came back a few years later to see that no-one else had ever generated another report type after I left. The reason... you had to understand data structures in the original system to build a report and no one could be bothered to learn. Some programmers tried building static programs to build reports, however when system modified they stopped working :)
I think you will find RAM is quite a bit faster than in the days of a PDP8. Did you ever use an 8S?
And self-modifying was actually quite useful in dedicated purpose low power microprocessors. In fact, with FPGAs, we can also have self-modifying hardware.
However, now that the average phone can outperform a Cray, what is worth implementing? One does have to wonder quite where sanity lies!
Personally, I am not going to buy a smart watch till it has a SCSI interface and tape backup. (At the present rate, I expect that to be before 2020).
"However, now that the average phone can outperform a Cray, what is worth implementing? One does have to wonder quite where sanity lies!"
You should note that most of the guilty programs are emulators: in particular, emulators of some pretty recent systems. The general rule of thumb is that the host system needs to be overpowered by a factor of about 100 to reliably emulate the system. You can reduce it, though, if you account for newer CPU optimizations. Emulation optimizations like DynaRec and JIT also help to reduce that ratio.
In terms of raw speed, yes, but not as a factor of processor core speed. Many of the old dinosaurs would be equipped with RAM that operated with short enough latency that the requested data would be at the processor before the next instruction even begins to execute. In modern systems, you have to initiate the copy operation, then wait 20+ cycles before the data is available to be used by the processor.
There's a number of things that don't quite make sense here. I'd guess the real villain is the lack of a 'dirty' bit for the instruction cache to automatically invalidate cache lines when written by something other than the processor executing the code. This would make sense in a single core design but in a modern machine one processor's instruction may very well be another processor's data. Having the instruction cache manipulated manually by an application as 'workaround' or 'kludge' written all over it (or as we all tend to find out the hard way, "An Accident Waiting To Happen").
" Having the instruction cache manipulated manually by an application as 'workaround' or 'kludge' written all over it (or as we all tend to find out the hard way, "An Accident Waiting To Happen")."
That's why they sometimes call it the bleeding edge. The programs in question are trying to extract every last bit of performance from the CPU (because they're doing something pretty demanding like emulating a CPU and other hardware from less than ten years ago) because raw performance becomes the baseline by which everything else becomes possible for it.
The two sets of cores operate with different instruction sets. The M1 cores contain quite a few additional instructions for media applications (Video and audio decoding, accelerated 3-D graphics), advanced math functions, encryption/decryption, etc. Each of those instructions would take much, much more time and energy to execute on an A53 core. However the complexity of the M1 core requires quite a bit more power to run, even executing the same instructions. With this set-up, the M1 can be powered off the majority of time until it becomes more efficient to use the beefier cores. The M1 cores become the much more efficient option when viewing a DRM-Protected content or playing a game that is highly graphics-intensive.
When Apple switched from the Motorola PowerPC 604 to the IBM PowerPC 970 (aka "G5"), a similar problem occurred. All prior PowerPC processors had used 32 Byte cache lines, and software was written with the expectation that the "DCBZ" (Data Cache Block Zero) instruction would zero 32 Bytes. The PowerPC 970 used a 128 Byte cache line, so the DCBZ instruction zeroed the 32 Bytes that were expected, then continued and zeroed the next 96 Bytes as well. Sometimes it was data, sometimes it was text, but frequently it was a mess. IBM added a mode bit that caused the DCBZ instruction to operate on 32 Bytes instead of the full cache line and made that the default setting on the parts sent to Apple.
I was taught about this in high school and it took years for me to find real cases.
The ones I found were.
The Apollo Guidance Computer. Used to extend the instruction set.
The "Blit" bit mapped terminal developed by Bell Labs as the UI for the Plan 9 OS. Used to assemble optimal instruction streams on the stack for certain graphic functions.
Both were work around specific limitations of the architecture, either in terms of word length (allowing an extended instruction set) or processor speed.
It looks like squeezing the last gram of performance out of an architecture remains the reason for doing this.
But boy can it get messy.
Common practice in the early 80's game programming to wring the most out of the feeble processors, also used for a lot of the copy-protection of the time, which some times even included undocumented op-codes (6510 in c64) to confuse the debuggers so the code looked like garbage when examined.
Not really needed now as games are pretty much C/C++ or even standardized engines with scripting for game logic.
But if there's one place where wringing the most performance still exists, it's something like a console emualtor, and emualting a Wii or PSP counts as pretty much the pinnacle of console emulation for the time being (no one expects anything above those to be feasible anytime soon, as it was around that time that computer performance stopped climbing so rapidly).
processors
Yes, that would be exactly the sort of environment I'd expect it to get used.
I've nothing against self modifying code in principle just as long as people recall Knuth's point that "premature optimization is the root of most (programming) evil," and leave it as a last resort, not a first resort. This is low level stuff and likely to break most efforts at a compilers attempts to optimize the code first.
Thing is, while DynaRec and JIT can be compiled high-level, self-modifying code usually isn't compiled but assembled, as to do it right you really need to go low-level and hand-tune everything. It takes a VERY specialized language to be able to practically repeat the feat with a compiler.