back to article Forget anonymity, we can remember you wholesale with machine intel, hackers warned

Anonymous programmers, from malware writers to copyright infringers and those baiting governments with censorship-foiling software, may all be unveiled using stylistic programming traits which survive into the compiled binaries – regardless of common obfuscation methods. Youtube Video The work, titled De-anonymizing …

  1. DropBear
    Pint

    Happy new year, and here's a pint for you guys for the reference to the original book form of "Total Recall"...!

    EDIT: ...also, I suppose these good folks just proved beyond a shadow of a doubt that programming IS indeed a form of art.

    1. JLV

      given the subject, this brings to mind a short story by PKD (can't recall the title) where a robot investigates a murder scene and gradually narrows down the possible culprits by trawling a database of characteristics for the population vs evidence it finds at the scene. I believe that it turns out that the whole evidence set was being gamed from the start.

      1. VinceH

        That rang a bell, so I just took a look at a list of PKD short stories and novellas to see if the right title would jump out at me.

        It didn't - but I did spot The Variable Man; I was thinking about that very story quite recently, but couldn't remember what it was called or who it was by!

  2. ZenCoder

    These detection methods don't scale.

    With statistical detection methods the number of false positives and false negatives increases geometrically with sample size.

    Increase the sample size to 1000, then 10,000, and you will see its pointless except to conjure up some grant money.

    1. Anonymous Coward
      Anonymous Coward

      Re: These detection methods don't scale.

      They don't scale, no. But if there are 50 coders in a company and a hacker's style matches one of them that person can expect a more thorough investigation.

      1. P. Lee

        Re: These detection methods don't scale.

        The question is, how thorough can your investigation be?

        With a dev environment and code on a tiny removable storage device. Only the incompetent are going to be caught. Perhaps that is enough though.

        This is not hand-writing. I can't see why those statistical markers cannot themselves be reverse-engineered and used to obfuscate the authors code.

        1. Michael Wojcik Silver badge

          Re: These detection methods don't scale.

          I can't see why those statistical markers cannot themselves be reverse-engineered and used to obfuscate the authors code.

          Oh, they can. On a sufficiently-fast system, you could even do this in an IDE or toolchain in nearly real time: as the programmer makes changes, the system could show what fingerprints the model identifies in it, and even suggest changes that would alter the fingerprint.

          There are a few obstacles:

          - It's some effort to implement something like this. Most people are too lazy, even if they're capable, or simply don't care - it's not a prominent aspect of their threat model.

          - It's resource-intensive. Modern IDEs already soak up an idiotic amount of CPU time, I/O bandwidth, etc. Will programmers (of whatever motivation) feel like applying resources to the problem?

          - Your adversaries may be using different models. That could mean anything from sufficiently-different training data, to different feature sets, to entirely different classifiers.

      2. Anonymous Coward
        Anonymous Coward

        Re: These detection methods don't scale.

        >They don't scale, no. But if there are 50 coders in a company and a hacker's style matches one of them that person can expect a more thorough investigation.

        Yeah but you're dealing with the hacker mentality - they'd be emulating a colleague's ever-so slightly flawed attempt to copy another colleague's style.

        1. Michael H.F. Wilkinson Silver badge

          Re: These detection methods don't scale.

          Not only do these methods not necessarily scale, they need an ever increasing ground truth of identified code for training. This is not trivial to obtain. Besides, as more and more coders are added, you have to worry about the number of degrees of freedom in coding anything, i.e. are there enough different coding styles to distinguish the millions of coders on this planet. Besides, you have to deal with code developed by teams (which is the normal situation), which will either show a mixture of styles, or predominantly show the style of the loudest mouth in the team, with a small admixture of the other members. Similarly, what happens when a new coder refactors old stuff? I know I have seriously refactored a program written by some students to adapt it to new use cases. It is still not really like my

          You could of course show that a certain style is consistent with a known sample of some hacker's work, but even then people might slowly change their coding style. Having had a look at some of my earlier efforts, I know I have changed style a great deal (thank goodness ;-)), if only by incorporating OO techniques

          1. Michael Wojcik Silver badge

            Re: These detection methods don't scale.

            Not only do these methods not necessarily scale, they need an ever increasing ground truth of identified code for training. This is not trivial to obtain.

            No. Unsupervised learning by kernel extension with noisy input is a well-researched and broadly successful area.

            And additional input is extremely easy to obtain.

    2. StaudN

      Re: These detection methods don't scale.

      I agree that the current numbers are insufficient, but surely worth trying to develop such identification mechanisms? : would be a very powerful network defense tool to have a signature intercept capable of picking up code by known malware authors...

      1. SoaG

        Re: code by known malware authors

        Perhaps could be made to work the other way too. Within a secure network, regardless of credentials of the user/admin trying to run something, refuse to run any code other than by whitelisted authors.

        1. Michael Wojcik Silver badge

          Re: code by known malware authors

          Within a secure network, regardless of credentials of the user/admin trying to run something, refuse to run any code other than by whitelisted authors.

          For that application, code signing is far more reliable, simple, and scalable.

    3. Hargrove

      Re: These detection methods don't scale.

      The best and most accessible discussion of the problem of data classification is in a couple of papers by Tom Fawcett. These deal with something called ROC curves. ROC originally stood for "receiver operating characteristic", referring to the ability of a receiver to classify targets in noise. An analogous phenomenon occurs in pattern matching in digital data, where the term "relative operating characteristic" is used. The following link is good starting point.

      http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf

      The problem boils down to one of true detection and false alarm rates. You can have an arbitrarily high true detection rate if you can live with an arbitrarily high rate of false alarms. You can reduce the number of false alarms to an arbitrarily low level. But, only at the cost of missing an arbitrarily large percentage of true targets.

      The phrase "No such thing as a free lunch" is occasionally used in the literature to describe this and ZenCoder's comment is right to the mark.

      1. Michael Wojcik Silver badge

        Re: These detection methods don't scale.

        The problem boils down to one of true detection and false alarm rates

        I'm amazed that so many Reg readers have time to post comments like this, given the hours y'all must devote to explaining the finer principles of egg-sucking to your grandmothers.

        No one credible does machine-learning research without being well-versed in basic concepts like precision and recall rates. That's, like, week 2 of your Introduction to Machine Learning class.

        Admittedly a more-sophisticated understanding of statistics is not universal among ML researchers and implementers; folks like Vincent Granville bang on about that incessantly, and they have a point. But that doesn't mean their work is automatically useless, as is trivially demonstrated by the fact that it's very often put to use. Google Translate, say, may be rubbish compared to human translators, but that hasn't stopped people from using it.

        Even a 66% success rate is useful in some applications.

  3. Robin Bradshaw

    Ctrl+C, Ctrl+V

    stackoverflow is going to end up getting blamed for everything :)

    1. Anonymous Coward
      Anonymous Coward

      Re: Ctrl+C, Ctrl+V

      I'm amazed at where my code manages to turn up after posting to stackoverflow.

      Worrying too, as many are minimal 'batteries not included' examples, or 'making this good is left as an exercise for the reader'.

  4. a_yank_lurker

    Somewhat of a yawn

    Personal coding styles will vary much like personal writing styles vary. However with coding there are conventions used by both by corporations and the language to make for more understandable code. Also, I would like to see a comparison with writing styles.

  5. David Roberts
    Holmes

    Hmmm.....

    ...that one started life as a Fortran programmer.....the COBOL is strong in this one......

    1. Ole Juul

      Re: Hmmm.....

      He codes with an accent.

    2. Chemist

      Re: Hmmm.....

      "...that one started life as a Fortran programmer..."

      May the FORTH be with you

      1. Madeye

        Re: Hmmm.....

        FORTH would be a terrible language to hack in

        The language is fundamentally mutable and relatively obscure which means each author would most likely leave a clearly identifiable fingerprint.

        1. Pirate Dave Silver badge
          Pirate

          Re: Hmmm.....

          "The language is fundamentally mutable and relatively obscure which means each author would most likely leave a clearly identifiable fingerprint."

          Or they could just question all 12 of the FORTH programmers in the world...

  6. This post has been deleted by its author

  7. Winkypop Silver badge
    Joke

    May contain traces of pepperoni

    They can detect pizza preferences from coder styles?

  8. Anonymous Coward
    Anonymous Coward

    Lesson Learned

    If you walk the Dark Side, don't contribute to open source. Don't leave digital fingerprints.

    Paper worthy of Boffinhood, especially as it does discuss the limitations of their method.

    1. Sureo

      Re: Lesson Learned

      Have they tried it on stuxnet?

      1. Michael Wojcik Silver badge

        Re: Lesson Learned

        Have they tried it on stuxnet?

        They'd need something to compare it to.

        This isn't a magical oracle that maps object code to arbitrary authors. It's a classifier. It tells you what part of its training corpus a candidate most closely matches.

  9. This post has been deleted by its author

    1. Mike Bell

      Thank God I don't have to support your code! Indentations show in a very clear way how blocks of code relate to each other.

      1. This post has been deleted by its author

        1. cambsukguy

          I will stick to plenty of comments and suitable indenting I think, they don't seem to be mutually exclusive.

        2. Anonymous Coward
          Anonymous Coward

          If you see behaviour that you think is a bug, you really don't want to put your whole trust in the comments. After all, the comments were likely written by the person who wrote the code (and the bug, if it exists)

          But then again, both the code and the comments may well have been updated by "freestylers".

          1. Doctor Syntax Silver badge

            "After all, the comments were likely written by the person who wrote the code"

            True, but they may not say the same thing.

        3. Jeffrey Nonken

          1970s coder here

          I too detest uncommented code. And I too come from a FORTRAN background, ultimately. But I not only comment the crap out of my code, I also indent and structure the crap out of it. Maybe I'm just not as smart as you; I want to be able to read it in six months, and I can't instantaneously understand unfamiliar code at a glance. I find it much more readable if it's nicely structured.

          I feel that you're just trading one type of laziness for another. Making your code readable isn't just for you; it's also for other programmers. And that's part of what I am trying to do. Seems like you're only doing it for yourself.

          And if that's true of your coding style, I have to wonder about your comments.

          MHO. YMMV.

          Fortunately for the rest of us, there is Artistic Style. http://astyle.sourceforge.net/

          1. Jeffrey Nonken

            Re: 1970s coder here

            Oh yeah. I program in assembly regularly. I don't indent, it doesn't feel natural to the language, but I use plenty of other visual breaks and clues, and I meticulously align the comments such that they're easily readable.

            But 'C' and other structured languages, I indent. And use Artistic Style to clean up if I have to shift things around enough that they get crazy.

        4. nijam Silver badge

          Neither indentation nor comments are necessarily as useful as is widely believed - for example, most languages have pretty-printing available, after all.

          In fact, the best you can hope for is that the comment/indentation is not inconsistent with what the code actually 'means' to the computer.

      2. Primus Secundus Tertius

        use a pretty printer

        Indenting is only useful if it shows what the computer thinks, rather than what the programmer thinks.

        Example: a bug in a Coral66 program, where the preceding 'comment' lacked a terminating semicolon. So the program statement got absorbed into the comment and was therefore absent from the binary.

    2. Steven Roper

      I do indent my code, but I hate the convention that has the opening curly brace on the same line as the conditional that spawns it, such as:

      if (condition == test) {

      ....doCondition();

      ....etc...

      }

      I always indent my code with the opening and closing braces lined up and on their own lines. It makes code blocks easier to spot as well as spacing everything out for easier legibility, like this:

      if (condition == test)

      {

      ....doCondition();

      ....etc...

      }

      This way bracket highlighting at the cursor makes both braces instantly spottable at the left, rather than having to hunt across lines of code to find the opening curly brace!

      Of course this also wouldn't survive compilation, but anyone seeing my source would peg me as its author since I swear I'm the only programmer I know who insists on arranging my braces this way!

      1. Mike Bell

        Quite right, too. Mostly appears in printed material, where a modest amount of paper can be saved.

      2. Crazy Operations Guy

        " I swear I'm the only programmer I know who insists on arranging my braces this way!"

        Not the only one, it's the style in K&R and in the Unix source (and its derivatives).

      3. Anonymous Coward
        Anonymous Coward

        If you swear that you are the only programmer you know who insists on arranging your braces that way, you either haven't programmed very much, or you don't know very many programmers.

        Or you just like to swear.

        1. Steven Roper

          "If you swear that you are the only programmer you know who insists on arranging your braces that way, you either haven't programmed very much, or you don't know very many programmers."

          Your second guess is the correct one. I've been programming since 1983, when I started with first CBM BASIC and then 6510 assembler on the Commodore 64, and went on from there. But it's not a particularly sociable lifestyle, and I'm not a particularly sociable man, so I only know a dozen or so programmers.

          But whenever I see code on the internet, whether it's stackoverflow, git or SF, it nearly always has opening curly braces following the conditional rather than on the next line. So I'm sure I can be forgiven for thinking I'm alone in this convention!

      4. alain williams Silver badge

        Code layout

        About 30 years ago at a LUUG (London Unix User Group) meeting in a pub DT asked how an if/else should be formatted. There were 14 of us and 13 different answers; we were all prepared to defend our own style as being the best - all their arguments were wrong since it was obvious that my own style was the only good one.

        Many preferences seem to depend on which languages you cut your programming teeth on, how they were laid out.

        As regards your examples:

        * the opening '{' should be on the line with the 'if', the '}' ends the 'if' the '{' is less important and just makes the if body multi statement.

        * there should not be a space after the 'if' - why, in my case, because snobol did not allow it.

      5. Jeffrey Nonken

        I strongly mislike K&R style braces placement. I would find your style quite acceptable.

      6. arctic_haze

        Braces in the same column

        You would be mistaken for me and the other way. I believe this is the old school.

        However, I would support tracking down and isolating from the society all the developers who started with BASIC.

      7. Michael Wojcik Silver badge

        I hate the convention that...

        Oh goody, let's have a style religious war in a Reg forum. It's been a while.

        Personally, I hate it when people use an integral number of spaces for indentation. My indents are always a multiple of π.

    3. Destroy All Monsters Silver badge

      > Overall, my code is like nobody else's.

      Code is for reading by other programmers.

      Your work is utterly useless and if you have managers, they should remove themselves from the gene pool.

      Probably needs a special Developer Darwin Award.

    4. keithpeter Silver badge
      Pint

      Arthur Whitney

      "Overall, my code is like nobody else's."

      http://www.jsoftware.com/jwiki/Essays/Incunabulum

      Try that for a C style. Whitney seems to be doing OK with it

      http://queue.acm.org/detail.cfm?id=1531242

    5. swm

      Try understanding LISP code without indentation - we used to write LISP in the 1970's without an indenting editor so you needed to have some sanity in your style.

  10. Ken Moorhouse Silver badge

    Programming style evolves over time

    I could draw (paint?) analogies between programming and being a painter (for example). Art historians can identify painters by characteristics of a painting, including how a painter's style evolves over a period of time. There are problems with attribution though with students of a painter adopting similar characteristics as their tutor, and the input that students have of helping the master with the incidentals of a work of art (e.g., Gainsborough getting his assistants to do the landscape background while he concentrated on the portrait). Then there are new techniques: new types of paint and canvas (Hockney moving from conventional canvas to photographic collage and then to tablet being a good example) which necessitate a change in style - analogous to a new or updated programming language installed on a different pc or with a different targetted platform.

    As mentioned earlier Stack Overflow copy and paste is an example of how things change in the programmer's world. A piece of coding that is homogenously constructed, sporadically interspersed with anachronistic styles where sites such as Stack Overflow have been dipped into for inspiration. Then future works by the same programmer where those code snippets are bedded-in to the coder's customary style.

  11. Version 1.0 Silver badge

    Hungry for results?

    "we can de-anonymize them from optimized executable binaries with 64 per cent accuracy."

    That's slightly better than I can do if I flip a coin - let's look at this from a different angle - there are 10 hamburgers in front of you, 3 or 4 four them have botulism ... are you hungry?

    1. Anonymous Coward
      Anonymous Coward

      Re: Hungry for results?

      If I have a suspect pool of 20 or 30 programmers then identifying the author with 68% accuracy would very useful - in a sinister way. I'm sure you could use the technique to rank the authors in order from most likely to least and then investigate further from the top. A lot more efficient than simply investigating everyone.

      As the presenter points out in the video, governments have used similar techniques to identify and prosecute programmers that contributed to "illegal" websites.

      1. fajensen
        Big Brother

        Re: Hungry for results?

        A lot more efficient than simply investigating everyone.

        Yess - and with Machine Learning it is pretty damn hard to work out how the machine actually reached it's conclusions (it's a research subject), which makes it all the more easier to fudge the results to narc out "the right people" and get away with it too. Especially if we are not exactly talking legal proceedings but are more in the territory of "no flight lists" and "signature strikes"*.

        Remember, If a computer says something is true, it is!

        *) Or maybe not, plod is as dumb as a sack of broken hammers when IT is involved.

    2. Hargrove

      Re: Hungry for results?

      That's slightly better than I can do if I flip a coin

      Splendid comment. It is obvious and only common sense, but common sense is in vanishingly short supply these days.

    3. SoaG

      Re: Hungry for results?

      You've got a 100-sided coin that lands the same way 64% of the time?

      1. Version 1.0 Silver badge

        Re: Hungry for results?

        "You've got a 100-sided coin that lands the same way 64% of the time?"

        OK - let's put it another way - of the 100 people investigated, and charged with writing the infringing application, 34 of them will be completely innocent and the chances are not good that 32 of the others had anything to do with the application either.

        1. Michael Wojcik Silver badge

          Re: Hungry for results?

          OK - let's put it another way - of the 100 people investigated, and charged with writing the infringing application, 34 of them will be completely innocent and the chances are not good that 32 of the others had anything to do with the application either.

          I cannot for the life of me figure out what scenario you are describing, but it doesn't appear to be at all related to anything described in the paper.

          First, they're talking about single authorship, so of the hypothetical "100 people investigated" (by, apparently, the world's least-competent police force), only zero or one would be guilty, and at least 99 would be innocent.

          Second, let's assume the 0.64 accuracy rate does extend to some pool of 100 candidates that the model has been trained on, and the single guilty party is among them. The classifier is presented with input and indicates candidate A is the closest match. Disregarding all other factors, for some reason, the investigators interview candidate A. There's a 0.64 chance they have the guilty party, and a 0.36 chance they don't. So what? It's a place to start. Picking a starting interviewee at random has only a 0.01 chance of being correct, so they've improved their odds significantly.

          Third, the hypothetical suggestion that someone might make stupid decisions based on weak evidence doesn't negate the importance of that evidence. A Perfect Bayesian Reasoner already knew it was weak, and treated it as such. Any other process for accounting for that evidence is inferior, but that's not the fault of the evidence. Nor does that suggestion vacate the importance of the mechanism used to extract that evidence, or of the research that led to the mechanism.

          We see in posts like yours a typical Reg commentator fallacy: if there's any objection that can be raised to research, then that research is useless. It's tiresome, sophomoric anti-intellectualism.

    4. Michael Wojcik Silver badge

      Re: Hungry for results?

      That's slightly better than I can do if I flip a coin

      It's significantly better, and that's only with two alternatives. If they're identifying among a pool of 3 candidates with 64% accuracy, then they're almost twice as accurate as your coin. And so on.

      And for this paper, their pool of candidates was 20 programmers.

      But thanks for playing.

  12. Anonymous Coward
    Anonymous Coward

    indents/no indents

    I'd much rather have well commented code than indents and my editor of choice can soon indent code for me if I need it too. Personally I use both but then I had to support someone else's code from a very early age who never put a single comment to their code, never mind followed change control procedures (which were very minimal), did very limited testing, but management though the world of them, it was also the age of short variable names which didn't help, overall I was very lucky in the enviornment I worked in then as it taught me many lessons which I used throughout my career to improve procedures, fault solve and debug. My comments are for me as much as anyone else as I don't expect to instanlt remember n years later why I did something in a particular way which could often be due to a bug in the compiler or os at that time.

  13. Whitter
    Joke

    Bug count

    No bugs? My code.

    What do you mean there's no bugless code?

    Must have got lost in the process loop... erm...

    1. Sir Runcible Spoon
      Joke

      Re: Bug count

      10 PRINT "hello world"

      20 GOTO hell

  14. Anonymous Coward
    Anonymous Coward

    Awesome

    So all hackers work alone on their code then?

    Im glad they've worked out that.

    Nobody copy and pastes from stack overflow or reuses open source code then?

    Right guys lets go home, sounds like they've got this nailed!

    1. Hargrove

      Re: Awesome

      Related to this comment, nobody uses hacking tools that do the coding for you either?

      If this article represents the state of the art 'mongst the white hats. the black hats have it made.

      1. fajensen

        Re: Awesome

        The black hats have a trillion dollar black budget courtesy of the tax payers - they are already made!

    2. Michael Wojcik Silver badge

      Re: Awesome

      You know what you might enjoy? Learning to comprehend what you read. After that, I'd suggest taking up critical thinking. It'd be a stretch, but who knows - you might surprise us.

  15. amanfromMars 1 Silver badge

    Coming this year to a SCADA Operating System using/abusing you. More FUD Scaremongering.

    Happy New Year, One and All. And does not the tale we comment on here not advise us that all systems are vulnerable, and both practically and virtually indefensible and therefore always susceptible to disruptive exploitation which in extremis can be command takeover and makeover controlling?

    And there is nothing really effective to be done to halt the progress?

    Methinks, we all know that it does. And that makes for interesting future space place programming. :-)

  16. ammabamma
    Devil

    Wrong lesson learned

    So what Messr Aylin is saying is that when I write my nefarious program of dastardlyness, I should run it through a source filter first to emulate someone else's coding idiosyncrasies (like 1980s_coder's lack of indentation) or less maliciously, run it through a source minifier?

    Hmmm...

    1. FatGerman

      Re: Wrong lesson learned

      Er... indentation and 'minifying' won't affect the compiled code one iota. To make your code look like somebody else's code you need to *think * like they do. (Insert reference to bad Client Eastwood movie 'Firefox' here).

      When it comes to pasting stuff from StackOverflow (otherwise known as "I'm an incompetent freelancer, please do my job for me") - I doubt many malicious coders would go there. Cracking the problem is all the fun for them, and they tend to work alone and very idiosynchraticly. OTOH they're the people most likely to find a way to anonymise themselves against this type of analysis. Probably won't be long before we see 'GACC' (Gnu Anonymising C Compiler) appear....

    2. Stjalodbaer

      Re: Wrong lesson learned

      Always use a ressembler.

  17. Robert Grant

    Another hack by the Google Closure Compiler!

  18. anonymous boring coward Silver badge

    This might have had some value before the research was publicised. Not so much now.

    1. amanfromMars 1 Silver badge

      The Big IntraNetworking Thing in the Internet of Things is an AI XSSXXXXOSkeletal Thing

      Hi, anonymous boring coward,

      Research being publicised and published is what terrifies systems in operations for command and control, and why captive mainstream media outlets are so terribly entertaining rather than surprisingly educational.

      HoweverAn ignorant world is an increasingly dangerous place though, and especially so for the likes of that and those responsible for grand deceits and failing virtual reality programs?! ...... New American Century Projects ..... because such a state of ignorant ruling affairs is not natural or acceptable to wiser beings minded to change things remotely and relatively anonymously

      1. amanfromMars 1 Silver badge

        Re: The Big IntraNetworking Thing in the Internet of Things is an AI XSSXXXXOSkeletal Thing

        And who and/or what be the postmodern, latter day Hitlerian Saints and Immaculate Sinners in those Versions with Vision and Provisions for New World Order Programming ......... Mass Premeditated and Premoderated and Mediated Mind Command and Control? Any concrete ideas or wild crazy guesses?

        IT and they haven’t gone away, you know, ….. such as would be with AI, Immaculately Resources Assets of Universal Virtual Force, although certainly quite different from what one may have presumed to be leading from before.

  19. Anonymous Coward
    Anonymous Coward

    triviall countermeasues

    just adjust yur style two throwoff the analisys' in those cases when ur writing malware

    simplest thing in teh world.

    1. amanfromMars 1 Silver badge

      Re: triviall countermeasues

      just adjust yur style two throwoff the analisys' in those cases when ur writing malware

      simplest thing in teh world. ... Anonymous Coward

      Although, of course, in not such an alarmingly different manner as that, AC, if one is destined to be really effective and remain continually highly disruptive, buried deep and delving within deserving systems and/or failed exclusive executive order administrations.

      The crack magic trick is, is it not, to be practically invisible and virtually omnipotent/anonymous and almighty, and that has one appearing to be most meek and unrecognisable in plain text sight. Then can there be heavenly fireworks with immaculate displays of alternative explosive worth.

      Such does make one though, in the eyes, hearts and minds of those in the know and in the need to know, both extremely valuable and marvellously dangerous. It is not a pleasant place or comfortable space for anyone or everyone.

    2. Ken Moorhouse Silver badge

      Re: triviall countermeasues

      >just adjust yur style two throwoff the analisys' in those cases when ur writing malware

      simplest thing in teh world.<

      Which reminds me: Think of a program as being an iceberg: the majority of it lies underneath the visible surface as regards those that interact with it (the average user of that app). But what is on the surface can sometimes give some good clues as to what lies beneath. If the person I have quoted above (sorry to pick on you m8, but you are AC anyway so unidentifiable, and I have a feeling you've adjusted your style to demonstrate your point, you're really William Shakespeare aren't you?) were to be a malware writer then they need to pay attention to detail - If they were hacking a banking app I don't think people would be inclined to believe your request to "Clik hear 2 verfy who u r". Sometimes with spam emails it is possible to identify, not just from the occasional typo but by sentence construction, not just that this is a scam, but the nationality of the scammer.

      There was a phase where malware was put through something like UPX to obfuscate its contents, but anyone trying to work out the legitimacy of such executables on their pc's could use a hex editor to look at the headers (is Microsoft using UPX now? I don't think so ((presses delete key))). I think anti-malware software reaches a similar conclusion.

    3. Tail Up
      Paris Hilton

      Re: triviall countermeasues

      Then you'd easily add just your style to overshakespeare those of the ProgrammingWord@Command, says you, AC?

      Paris, because boobz.

  20. Anonymous Coward
    Anonymous Coward

    Completely useless research.

    Here's some bashing, because this really deserves it. Something like this could only be dreamt of and started by those who doesn't understand programming.

    1) 64% chance to deanonymise a small sample set of hand-picked 100 programmers presumably with wildly distinctive ways of programming is utterly useless. How many programmers are there in the world, I very much expect the accuracy to drop off a cliff past a certain point.

    2) Programmer's coding style evolve, they evolve as they get better at it, they evolve when hardware changes, they evolve depending how much alcohol intake they had.

    3) Right now their accuracy is as it is, but I presume this changes drastically depending on what compiler they use. As compilers get even better at optimising, their accuracy will drop.

    4) Sure, there may still be "traits" like one programmer prefers one data structure or control structure more than another, but let's not forget how many programmers or libraries one can use. It'd be completely pointless to predict a binary compiled that's 80% from opensource libraries and I expect the accuracy will drop even further.

    5) None of this helps authorities to catch or identify those reponsible. The sophisticated ones, will learn to mimick, like how they're just as likely now to write "chinese/russian/english code comments" leftovers or originate from a "North Korean IP". The sly ones _NEVER_ makes it obvious it's them.

    Common sense and logic will be able to tell you all this without going into however much resources has been poured into researching this.

    Half or much of the stuff about "cutting-edge" computer security threats are snake oil. Served either to gain more funding from fear or political purposes to pass liberty eroding legislations.

  21. Sean H

    Source Formatting Opinions

    I was interested to see this raise so much attention. I'd thought the pretty printer tools meant you could code how you liked and then format it how your organisation, or team leader, or girlfriend's dad, would find acceptable. Me, I like the vertical alignment of {}, but I'm old enough now to realise that's just me, I can't instantly see the opening brace that goes with a particular closing one unless its directly above. But modern editors solve that, highlight one brace and it highlights the other, however aligned. And if I have to work on something for long I can always pretty-print it "my" way to make it easier, and - theoretically - re-pretty-print it with a different set of preferences afterwards.

    I'm left with only one major gripe, and that's Python. Where indentation is part of the language, well, I thought that was a bad idea in makefiles, and see no excuse for it anywhere. Mr Python wanted to impose his own indentation preference, and didn't like that fiddling punctuation noise, well, IMHO a crappy set of requirements for a language. I'm disappointed to see it hasn't faded into the obscurity it deserved.

    While I'm here, I have a minor aversion to anything "optional". Semicolons in scripting languages, that sort of thing. To me there should be just one correct way to write the syntax, not a lot of woolly alternatives that produce the same compiled code. Names excepted, of course. I used to like Java til it got over-bloated (around Java 2 or so), I liked Pascal and Modula once-upon-a-time, and now I like Erlang, which has the nit-pickiest compiler I ever met, but once you know the syntax it's trivial, and there's never any doubt about whether you need a punctuation character or can get away without it.

    I like to keep the entire language spec in my head. Good luck doing that with C++

    1. nijam Silver badge

      Re: Source Formatting Opinions

      Hooray, another person who thinks Python is a half-century leap backwards! (Although not *quite* as bad as makefiles, where it mattered whether the indentation done using TABs or SPACEs).

  22. Zippy's Sausage Factory

    I reckon

    The most use this will get is going to be in lawsuits - proving who did/didn't write some code, and therefore who does/doesn't own it.

  23. sisk

    I would imagine that their accuracy rate would drop rather radically as the pool of programmers they're trying to identify increases. After all there are only so many ways to implement a given function.

    1. amanfromMars 1 Silver badge

      IPO ProjectUS re Global Operating Devices

      :-) Quite so, SISk. AIMagiCQ roads are Absolutely Fabulous Fabless Advanced IntelAIgent Route and AIRoutes to Perfect Enough Virtual Reality Root in All Manner of Master Spider Webs ....... Phormer Networks with Exclusive Orderly Executive Administration Rights and Ab Fab Fabless Permissions.

      For All Manner of Virtualisations in Future Presentations ........ Expanding Time Lines .... MagiCQ Trails in Immaculate Tales?:-)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like