Re: These detection methods don't scale.
Not only do these methods not necessarily scale, they need an ever increasing ground truth of identified code for training. This is not trivial to obtain. Besides, as more and more coders are added, you have to worry about the number of degrees of freedom in coding anything, i.e. are there enough different coding styles to distinguish the millions of coders on this planet. Besides, you have to deal with code developed by teams (which is the normal situation), which will either show a mixture of styles, or predominantly show the style of the loudest mouth in the team, with a small admixture of the other members. Similarly, what happens when a new coder refactors old stuff? I know I have seriously refactored a program written by some students to adapt it to new use cases. It is still not really like my
You could of course show that a certain style is consistent with a known sample of some hacker's work, but even then people might slowly change their coding style. Having had a look at some of my earlier efforts, I know I have changed style a great deal (thank goodness ;-)), if only by incorporating OO techniques