What a mystery
"not written in C++ and it's not compiled with Microsoft's Visual C++ 2008".
What a mystery there is something else too.
Security researchers are appealing for help after discovering that part of the Duqu Trojan was written in an unknown programming language. Duqu is a sophisticated Trojan reckoned to have been created by the same group behind the infamous Stuxnet worm. While the finely tuned Stuxnet worm was designed to home in on specific …
It's existence till now has been a closely guarded secret. The only previous known use of the language was when Jeff Goldberg wrote a quick hack on his PowerBook and uploaded it to the alien mothership. From the little that's known, it supposedly combines the readability of Perl, the speed of JavaScript, and the intuitiveness of Haskell.
A 2D programming language. Loops are real loops! Maybe there is a befunge++ out there
Reminds me about the story of the supposedly biggest ever deployment of the scheme language was an interpreter some poor techie embedded into his employer's toolbar / adware / malware for the express purpose of detecting rival's malware and disabling it. There was such a constant state of flux between the different camps, a lightweight framework for distributing and executing the day's new rules gave them a huge advantage apparently.
In modern terms though, object orientated and lightweight would suggest Lua. Perhaps the byte code is obfuscated.
Igor Soumenkov says it's not Lua.
My money is on some kind of Lisp.
After all: http://www.franz.com/success/customer_apps/animation_graphics/naughtydog.lhtml
"With leading edge game systems like ours, you have to deal with complicated behaviors and real-time action. Languages like C are very poor with temporal constructs. C is just very awkward for a project like this. Lisp, on the other hand, is ideal."
Lateral thoughts: Anyone remember Thierry Breton's "Softwar" Cyberthrilling Cyberpotboiler back from the 80's?
So, you have forgotten how to read English? "These guys" have no problem reading the x86 disassembly and understanding what the code DOES. What they are wondering is what language it was originally written in and compiled from. It definitely wasn't hand-written x86 assembly.
From the looks of it, my guess would be one of the relatively less-widely used object-oriented languages. Maybe compiled Pyhton or Forth... Compiled Perl might be worth looking at, although personally I think it's unlikely.
This sort of news does not inspire confidence in an already dubious anti-virus industry, that spends more money on market research than anti-virus research and has to call out to the masses: "Help us find out how this was written."
What I would do with actual budget figures from a major AV firm. Even without that information, if they spent more money on AV research than market research, we'd have an off-the-shelf profile-based virus product that can catch this sort of thing before it's written, instead of boxes of the same-old after-the-fact garbage with pictures of Iron Man on the front.
Realistically, given the likely provenance of these babies, if I was running the project then the first thing I'd do would be write a language specifically for them ... after all, if it's a government project then money isn't going to be a big issue. And a virus^H^H^H^H^H payload specific language would offer significant advantages.
From what little I remember about Ada when I took the class, was that it was not a compiler, not an interpreter, but a translator, which spit out FORTRAN on the IBM 4361. What a joke. One Ada run took 8 minutes to complete and if more than one was running, it was more like 20 minutes.
I was going to speculate before I read the article. Then I thought, if it's really that obscure, those spooks just want to know if anyone has knowledge about it, so they can interrogate^H^H^H^H^H^H^H^H^H^H question the person about whether or not they had anything to do with writing the actual code (!)
@Destroy All Monsters: Yes there are! Once ADA runtimes emerged that actually used O/S facilities like threads instead of re-creating those things for themselves, ADA got a *lot* better. From what I vaguely remember, Greenhills ADA on VxWorks was pretty decent indeed.
I can remember the problems that a bunch of colleagues had in the very early '90s with ADA (on Vax I think). The application they'd written was too large for any of the ADA runtimes of the day to actually run. I never found out if they ever got it going...
I think you must be going back a long way. I don't know if early Ada was ever implemented as a translator to Fortran, but I'm pretty certain by Ada 95 (when I was learning it), it had its own compiler that did not go via Fortran. I think performance between Ada 95 and Fortran was comparable. In any case, the reason you used Ada wasn't for speed but because its safety features meant your code was "provably" correct. (Just don't mention the Arianne 5 explosion).
I seriously doubt anyone has written the core of a virus in Ada. Though I would be amused to be proved wrong.
HP2100 assembler, hand-craft into 1's and 0's, then punch it in on the front panel switches.
That's how I was taught REAL programming by the great Dr Munro, Yr 2, EE at Imperial.
Time to design a USB-connected proper front panel for the Raspberry PI, then these schoolkids can really learn how a computer works.
</old-fart>
Well, it's perfectly feasible, given the obvious level of manpower and resources that were poured into putting together this very professional bit of code, but what's most likely is that they've used some pre-existing (but obscure) language tool that just isn't used very much compared to C and C++ and so the look of the code it generates isn't well known.
It's feasible but not as a tool for generating the machine executable code (which is what most programming boils down to). Any self-respecting investigator is probably just looking at the machine instructions (decoded into assembly language).
However I can see a mystery high level language being relevant if the payload is an interpreter+script I suppose. Maybe some kind of intermediate p-code a la .NET.
In that situation when you disassemble it all you'll get is the interpreter/compiler and a load of unknown junk requiring further investigation. It's an interesting idea but a lot of work for a malware author I'd have thought.
It might be quite sensible for them to develop a domain specific language to write their code in. It's a fairly standard technique. It would also hint at the people behind it being reasonably clueful. Maybe a warning about what happens when you have too many bright computer science graduates around without a job.
Decades of asm error files myself so I find this high level language stuff really confusing - does it make it easier to do divide routines and indexed addresing - stuff like that? Probably take several people months of disassembly to figure it out - I'd guess if Kaspersky are on it at least us humans have a chance!
It seems to me that this is some pretty obscure stuff, and whether it's DOD, GRU/Spetsznas, China, North Korea, South Korea, the UK, whomever, it is probably so compartmentalized that the very one who speaks of it gets killed along with the handful or more of people on the team and peripheral to it (a few friends, a few relatives, a line supervisor from a previous division/department...)... So, unless it is some group with a sense of humor, there probably are some deadly serious people behind it.
That said, it is possible, I imagine, that it is an exercise at seeing how trustworthy a given agency feels its workers are. The longer this work goes uncracked, the likelier the team gets elevated to the real, much more difficult project ahead.
@dssf:
You were lucky. A couple of weeks ago I suggested that a man who shot his daughter's laptop might not be acting in a reasonable, moderate way, and I got 59 down-votes!
Equally mysterious is the fact that your post about the down-vote has itself been down-voted twice. I'm more than a little worried about the votes I'll get for this reply.
Either a bot or someone who behaves like one when it comes to downvoting - I've seen a lot of that.
All I can think is that you may have gone a little too far in speculating that people who know/reveal the answer are likely to be killed.
I don't believe it is necessarily the work of any government or spook agency and I'm sure the skills required, though high, are not exclusive to such people (I bet there's enough skill amongst those who comment on this forum to achieve the same).
However, it is clear that these people are very serious and I for one would not want to speculate too far on their identity/methods. "JustaKOS" is just a pseudonym, not a guarantee of anonymity.
Lets face there's not exactly a shortage of grossly obscure backwater languages that could be used, some (I imagine) even have quite good network support.
Given a lot of languages have been mapped onto virtual stack machines (including the internals of some *FORTRAN* compilers) I'd wonder if different languages leave a remnant of their high level structure in the stack activity. FORTH is known for it's low level credentials. And since we're on the subject how about postscript?
And of course if you want to make sure no one will spot what you're working on there's always whitespace.
But for proper obscurity I'd go with Enochian.
I'm going to put my money down. I haven't heard anyone suggest D, yet. But it's comparable to C++ (better, actually), modern and appeals to people who want to be learning new things (but don't want those new things to be Javascript or Python), It has the support you'd need to make a working module on Windows and it's freely and easily available.
I called it. It's compiled D.
I shall be posting back quotes of this when they find I'm right.
OK since they are using windows language in the OS itself it is written left to right. But Muslum countries like Iran have a language that read right to left.
So what if the readable part is left to right for the win OS BUT the code for the nuclear machines was written right to left SO the program is left to right EXCEPT for the objects which were written right to left and maybe the object part of the code was compiles separately first then added to the readable part.
So then you would have to decode/read it by decompiling each separately and backwards to each other like mirror images intergrated into each other.
I likely don't have a clue and apologize directly to the enlightened ones that frequent this sight in advance for having to read my drivel if that is the case. "Covering my ass so any real BOFH's here don't take action upon my lame ass for posting this"
It's never a stupid question if you don't know the answer, so there's no problem in asking. But what you suggest isn't how things work.
There are two reasons for this. A superficial one and a real one. The superficial one is that the programming languages don't have Right to Left variants. If you run the gcc compiler against some source code, it expects that source code to be the same whatever your native language. That's the superficial reason because you *could* write a parser that read things right to left if you wanted to. It wouldn't be hard. (For all I know, someone could have).
But there's a real reason why it wouldn't work like this as well. That is that regardless of how a single statement is written, the compiler is likely to render it to the same outputted machine code. So both "return 0;" and our hypothetical ";0 return" are both going to come out of the other side of a compiler as the same thing. That's the job of a compiler - to turn statements into their machine equivalent. Whether you say something in Arabic or in English, the *equivalent* in machine language is going to be the same.
(Also, I bet this isn't "Muslim". Israeli would be my bet).
Hope this helps.
If you're writing malware for a specific purpose, like crippling centrifuges, you're unlikely to be much troubled by AV software that relies on identity-based detection. Assuming you don't flash your malware around before deploying it for its deadly purpose, no-one will have seen it before - so there won't be any identities available for it (for a few days at least, maybe much longer, but long enough to hit the target).
However, your Achilles heel is heuristic malware detection. To avoid that, you want to write code that looks as "average" as possible and doesn't stand out from the crowd. So using a custom programming language that no-one else uses is decidedly odd. If the boffins at Kaspersky can design a heuristic to detect that language, then ALL the malware written with it is dead in the water from that point on.
Even if the language is a standard, but little-used one, it still makes the job of malware detection a lot easier.
- Heuristic detection is not based on language pattern but on behavior.
- For any language, there's always a lot of ways to compile the code to native instructions. That's why there's different compilers out there and not just Microft Visual C++. They all produce different "signature" for a given code.
More likely than making their own language, they did their own compiler and coded the thing in C, using some libraries (they could have wrote them) that provided object orientation for C.
It make sense because using a close to hardware language gives you a closer representation of the native code you will get, thus allowing you to shape it at will, and make it harder to read.
You all think much too abstract.
This post has been deleted by its author
"So far, the researchers have worked out what the mystery code does, but are still mostly in the dark about the grammar and syntax of the programming language, they said."
WHY would anyone expect to be able to figure out the grammar and syntax of a higher-level language by looking at the compiled code -- let alone try to do so?
SEKRIT PS
SECRET EQ B0,B0 SECRET
DONE EQ B0,B0 SEKRIT
I've got to say, after reading the blog, that these Kaspersky guys are really switched on. Almost makes me regret dumping Kaspersky for MSE.
But, Soumenkov doesn't give any insight into why this component was written in another language. Everything else was written in C++, so what's different about this bit? It seems to be responsible for network comms. He knows a lot about the structure of the code, but the 2 things that stand out (to me) are that it's very object-oriented, and it's event driven. This isn't a job for a roll-your-own language - it should have been written in C++ as well, if only to make sure that it blended in.
So, looks to me like the guys who wrote this component have screwed up and left a smoking gun. They used their own in-house or research framework, and someone out there will finger them.
"Unlike the rest of Duqu, the Duqu Framework is not written in C++ and it's not compiled with Microsoft's Visual C++ 2008."
I humbly propose that any mission critical machine/device shouldn't be relying on windoze in the first place.
Lastly, I am dubious to assign some degree of sophistication to an implementation forged from a MS development base. It smacks of sub genius.
This post has been deleted by its author
As the blog author states.
I'm no better equipped to play CSI:Malware than most of you but I'll note a few things.
The author makes no mention of *any* kind of development tool ID buried in the DLL. AFAIK this is SOP for *all* commercial tools and most open source stuff. So this *looks* like some kind of in house one site compiler. No ID string because in the event of a bug you just go down the hall and have a word.
This is not unknown in big companies. McDonald Douglas IT arm used an in house language as did Rockwell Collins avionics systems (and ran on their own stack based *processor* to boot).
This feels like some weird joint venture between 2 teams, 1 COTS, 1more secretive and very bespoke.
The payload itself sounds more like an OS in it's own right. Message queues, call back functions etc. Is this SOP for malware? I dimly recall a fair bit of this is built into Windows (although who actually *uses* the Background Intelligent Transfer Service apart from Windows Update is a mystery to me).
Was anyone thinking about Sybian and OPL? I've no frame of reference but it just seems *excessive* somehow. And Symbian was open sourced for download and also made extensive use of callback functions for low power and fast response to events.
Back in the day a UK company wanted to be the next big thing with a parallel OS they called "Helios" (Written up by Dick Pountain in Byte). The designers had games backgrounds so it was very light weight but lots of OO support. Not an HLL but (IIRC) mostly assembler with *lots* of macros to give a (mostly) consistent handling of objects without overhead. AFAIK they moved into set top boxes.
The comment about methods being able to be called through a table but also *directly* put me in mind of them. Follow the rules for consistency but if necessary call functions directly for speed. Or maybe they are just the rough edges of an in house development effort.
This stuff about low resource use and fast response would seem to be pretty much *irrelevant*
with modern PC's but highly relevant for something for embedded use. Yet it's also *tightly* coupled to the Windows API.
Some kind of big code base hosted on something else *before* being ported to Windows?
This looks to be *so* unique that tracing the environment will pretty much find the developers, but I wonder if some of the developers even *realize* how their language has been used.
My names not Goff, but I am off.
In order to lock in customers for programmable logic controllers (PLC) used in SCADA systems, the operating code for the PLC is usually very proprietary and frequently very compactly compiled to make use of the minimal memory.
Back in the earliest days of PLC's, there was negligible memory to use, perhaps up to 16kb. One Meg would have been a luxury.
This type of programmer knew how to write code to run on ancient processors that was reliable enough to launch missles and run nuclear plants and yet way more efficiently written than most anything today.
Who is to say that one of these guys did not come out of retirement to write this mystery section of the code?
What they did back then could be completely unrecognizable today.
So any of the .Net (except C++?)languages are out
As for it being one or more old SCADA developers most SCADA systems operate as real time control systems. Historically they have used various graphical tools to encode something called relay ladder logic. Guess how long that's been around for.
By the time SCADA systems were dealing with anything as complex as Ethernet or TCP/IP those functions would be delivered as a function call in the RLL language probably implemented as compiled C function on a mainstream microcontroller or microprocessor.
SCADA processors tended to be hard coded TTL systems. With TTL clocking up to 35Mhz at a time when SoA processors like the Z80 and 8086 were hitting 4Mhz. The production volumes (100s to 1000s, not millions) plus the performance still made discrete or MSI logic viable.
Porting a SCADA internal language makes zero sense, but then neither does the different language idea (unless you *already* have one lying around and you're familiar with it).
Helios update. I mentioned that it was mostly macro assembler but that was my memory of the Byte article. Checking further I've found compilers were listed as being available to support Modula 2, FORTRAN and C/C++. I've no idea which were delivered however.
So is there *anything* that can be done *easily* in language X that's a PITA in C++ ? That would be the only *real* justification for using it . It's the old joke about if you've got basically an AI problem how much of LISP will you have to implement (in any language you have available) to solve it, because that's what you're going to use in the end.
Most likely written and compiled on with the same proprietary platform that the siemens drive software is written,assembler derivative ,so that any other payloads which the av vendor,i doubt the av vendor knows what the payload is ,unless they have a lab full of siemens drives and controllers and the closed source code they are trying to work out .
Read proprietary compiler /;..and where did the authors get that from ? ,reeks of .
As with symantec the source is the key ,and if the av vendors have the source ,or they are playing dumb ?and have the source .....