back to article Surprise! Copying crummy code from Stack Overflow leads to vulnerable GitHub jobs

Among those learning how to program, and some more experienced software developers, it's common practice to copy and paste code snippets from Stack Overflow, a Q&A forum for asking about coding problems. There's even a faux O'Reilly-styled book of sorts, "Copying and Pasting from Stack Overflow," to highlight the practice, …

  1. sbt Silver badge
    Holmes

    Free advice is worth exactly what you pay for it.

    If you're trying to answer a question clearly and simply about an algorithm, pattern or data structure, coding best practices such as input validation can cloud the issue. It's an education site and it's on the users if they're taking example code snippets unvarnished into serious code bases and it's on employers and clients for hiring folks unqualified to write code properly and who can't use SO and similar resources responsibly.

    1. Paul Crawford Silver badge

      Re: Free advice is worth exactly what you pay for it.

      I was going to say the same: if you asked for, say, best python trick for doing XYZ then you (probably) get that answered, but not any pre/post steps needed to make sure you only ever get what you wanted done in the cruel world outside.

    2. macjules Silver badge
      Holmes

      Re: Free advice is worth exactly what you pay for it.

      Agree with the sentiment BUT here's the rub: a lot of open source software frameworks have limited documentation or explanations that are strangulated by the inability of technical writers or developers to clearly express themselves. This is where SO and it's many spinoffs come in to help. In answering someone's question as to how to do x or y there has to be a presumption that the original poster or the copy/paster is aware of standards and au fait with coding.

      I would make a point with SO to never completely help someone how to solve a problem from scratch but refer them to the framework's documentation with a caveat or explanation on the bits that are impossible to comprehend.

      1. sbt Silver badge
        Alert

        Refer them to the framework's documentation

        Certainly a better idea but difficult given the moderation policy on SO and sister sites about self-contained answers rather than off-site links. I can see why they want such an approach; it increases the value of the site, reduces the SEO juice shared with other sources and reduces the risk of link rot. This policy creates a bias towards short direct answers lacking in context, making a naked code snippet more likely to get upvoted.

        1. James Anderson Silver badge

          Re: Refer them to the framework's documentation

          The policy is due to the empherial nature of links. The answer could come up in a search 10 years later whereas the link may have only been valid for 10 weeks.

          1. sbt Silver badge
            Holmes

            Ephemeral links

            I get why SO and co have this policy, which I characterised as link rot. To be clear, I'm not saying it's a bad policy or SO should change it. I'm just saying it contributes to the bias away from links to in-depth sources or more complete or in use production code samples.

            Now that so much code is browsable on the Web thanks to Github and co, there must be plenty of good working code examples to refer to.

          2. baud Bronze badge

            Re: Refer them to the framework's documentation

            The few time I answered with links toward the doc, I also added the relevant part of the documentation page, as long as it's short enough, to avoid the issue of link rot. (even if the answer I'm writing have rarely any use except for the person who's asking)

    3. kjw

      Re: Free advice is worth exactly what you pay for it.

      Your title appears to state the answers on stack overflow are worth nothing? Your title does not seem related to the content of your post.

  2. Claptrap314 Silver badge
    Paris Hilton

    Let me see if I understand...

    A dev who does not understand the system he is working on, and either cannot find the answer in documentation or does not even look at it. He posts a question to SO. We are to presume that this individual is qualified to check the answers for vulnerabilities?

    1. Dan 55 Silver badge

      Re: Let me see if I understand...

      In theory upvoting and downvoting do that, but of course if it doesn't work because answers that were right (or thought right) in 2012 and now known not to be right hog the top spot forever, it's going to be useless for detecting vulnerabilities.

      Maybe upvotes should disappear after an year or two so there's a continual incentive to upvote better answers and poor answers will eventually drop down.

      Who am I kidding, people will just copy and paste the first answer and ding the upvote button.

      1. Anonymous Coward
        Anonymous Coward

        Re: Let me see if I understand...

        Actually, the upvote/downvote mechanism often does not work. Because many people look for the simplest quicker solution, not the correct one. I've seen simpler but not safe answers get many votes, while the correct, more complex one is buried below with far less.

        Without reviews from truly skilled developers, votes from people who have not that skill are useless.

      2. JohnFen Silver badge

        Re: Let me see if I understand...

        "In theory upvoting and downvoting do that"

        I would expect up/downvoting to provide an indication of the correctness of the answer, not an indication that the code presented is ready to use as is. In fact, it should be obvious to all devs that such code is not ready to use as is, since it's pared down to make it easy to follow the actual solution and does not include all of the scaffolding that is necessary in production code.

      3. pavel.petrman Bronze badge

        Re: people will just copy and paste the first answer

        DuckDuckGo will even show you the most upvoted or accepted answer (not sure which) on a prominent place in their own results page if your search phrase looks similar to a question on SO.

  3. Kevin McMurtrie Silver badge

    Transport Layer something

    Don't forget all the entries to disable SSH fingerprints and TLS host validation. Those hacks are so popular that instructions to fix the problem correctly are nowhere to be found.

  4. devTrail Bronze badge

    Dumb community

    I tried few times to post questions in term of algorithms instead of code and I got either no answer or trolls complaining that I was asking them to do the work for me without even understanding that I was not asking for some code. It even happened that an admin closed a question for being too high level. The trouble with stackoverflow is that the entire community has been dumbed down to the role of copy and paste bots without any deeper reasoning or willingness to understand what they are doing.

    1. Irongut

      Re: Dumb community

      It is completely useless for asking questions beyond "how do I do this bask task in #LanguageOfTheWeek?" I was an early user of SO but often questions I asked were either unanswered or closed for spurious reasons and any answers were of dubious quality. I don't even remember the username for that account now and click SO links last wehn Googling a development question.

      1. devTrail Bronze badge

        Re: Dumb community

        Google search results are not much better. In the first few pages you only find links to a lot of blogs and just a handful of them are decent. Most of the blogs are just another collection of copy and paste code. Coding on the internet has just become like youtube or facebook a place crowded by wannabe who are ready to pretend or simulate anything to get the people attention.

    2. myhandler

      Re: Dumb community

      Tradiontal forums were so much better in that you'd know which posters were expert. SO is a lottery. The fact that many posters want upticks is madness.

      Context and discussion is critical - but it's verboten on SO.

    3. Mike 137 Bronze badge

      Re: Dumb community

      "the entire community has been dumbed down to the role of copy and paste bots"

      and these copy & paste bots are creating the next release of the software you rely on for your business or your online security.

      As I've mentioned a few times before both here and elsewhere over the last few decades ;-) anyone who tries to solve a computing problem by immediately launching into coding has completely failed before they started. Engineering consists of [1] first defining the problem, [2] then defining an approach to solving it, [3] then creating a solution using that approach, [4] then testing it to make sure it not only works but is robust and safe and [5] finally packaging it in a convenient presentation. Ever since "Visual <whatever_language>" step 5 has come first, and since pseudo-agile took off, only steps 5 and 3 (in that order) seem to be conducted at all.

      1. Jimmy2Cows Silver badge

        Re: Dumb community

        Agile doesn't mean you don't plan ahead. You just don't try to plan every minute detail up front, frequently evaluate progress against milestones, and be prepared to change plans quickly if something isn't working.

        Overall understanding of what you're trying to achieve is still essential, along with the management skills to stay on track, and the tools/ability to recognise problems early before they blow up in your face.

        But, not every project is suited to agile, and that's where the problems appear. The main issue with agile is buzzword-bullshit-bingo PHBs thinking it's the next wonderful and will magically make every project better in every way, while completely failing to grasp its true meaning and intent.

        1. JohnFen Silver badge

          Re: Dumb community

          "You just don't try to plan every minute detail up front, frequently evaluate progress against milestones, and be prepared to change plans quickly if something isn't working."

          So, just like pre-Agile development methodologies, then.

    4. JohnFen Silver badge

      Re: Dumb community

      SO these days seems primarily geared toward providing students answers to exam questions.

    5. Jamie92

      Re: Dumb community

      How sad it is, sometimes trying to find the answer on your own is much more time-saving and light-hearted

  5. chivo243 Silver badge

    student use

    I know of programming students grabbing stuff from Github for their assignments. Just get it from github was the phrase I heard many times...

    1. Steve Davies 3 Silver badge

      Re: student use

      Then you ask them to explain how it works and ....

      I've seen one so called expert developer copy something almost verbatim from SO. The fool didn't even bother to change the variable names. The problem was that there was a huge flaw in the code. I knew that because I'd copied it myself (when on a different project) a year earlier. I found the flaw and in the end, I used the SO code as a guide but worked the problem differently.

      The squirming and dodging the answer that follwed my challenge would have made a Politician proud.

      That 'expert' didn't bother turning up for work the next day because he'd lost face in front of his colleagues.

      That's life ain't it eh!

      1. Anonymous Coward
        Anonymous Coward

        Re: student use

        Well, when I spot a dubious piece of code in a review I'm usually quite sure it will come up in a Google search, and probably from SO.

      2. chivo243 Silver badge

        Re: student use

        Back in 2004 a MS contractor tried to push a VB logon script off as his own. When the script didn't work right, during our search for answers, one of my colleagues found a very similar script on the net somewhere, with the same spelling errors. Elementary was spelled Elementry is the one I remember.

        Sad thing is, this guy knew what he was doing, he couldn't be arsed to do the work himself. I heard he's selling stuff on youtube now.

      3. kjw

        Re: student use

        Did you post to stackoverflow with your findings?

    2. macjules Silver badge
      Facepalm

      Re: student use

      I recently had a DevOps applicant send me his answer to a problem I sent as a part of his interview. Unfortunately he sent me the verbatim copy of something I pushed to GitHub about a year ago, complete with the 2 deliberate typos.

      Suffice to say he did not get the the job.

      1. gnasher729 Silver badge

        Re: student use

        That devops applicant could have just said “I found this on stackoverflow, it looks like it mught help to solve the problem but I haven’t tried it yet”. I _have_ actually used “I don’t know, and nobody on stackoverflow seems to know” as an answer.

        1. JohnFen Silver badge

          Re: student use

          "“I don’t know, and nobody on stackoverflow seems to know” as an answer."

          But that's a terrible answer. The combination of your knowledge and SO is nowhere near comprehensive, even if you're limiting your search to the net.

          I do sometimes bring code I found on SO to design meetings, properly credited, but only when I'm presenting a list of different approaches to a given problem. SO is never the only source for these lists.

  6. Ian 55

    Copying code from a site named after a bug..

    .. what could possibly go wrong?

    1. Tom 7 Silver badge

      Re: Copying code from a site named after a bug..

      Not a bug - an achievement.

  7. Doctor Syntax Silver badge

    There's even a faux O'Reilly-styled book of sorts, "Copying and Pasting from Stack Overflow"

    To be followed up by "Five copy and paste from Stack Overflow".

  8. Nick Kew

    Chicken or Egg

    Which came first?

    Were stackoverflow snippets copied to or from code that happens to be on github? I expect both, but in what proportions?

    1. gias

      Re: Chicken or Egg

      We only checked whether the code shared in Stack Overflow were copied in the GitHub repositories. The bidirectional copy-paste that you suggested is an interesting proposition, though!

      1. James R Grinter

        Re: Chicken or Egg

        Particularly as there are a lot of SERPs out there throwing up code samples from existing open source projects (and, sometimes their unit tests are the only place to find an API example.)

  9. Pascal Monett Silver badge
    FAIL

    "the researchers developed a Chrome extension"

    So the researchers are all about security, and then they use Chrome ? The browser that tells everything to Google ?

    Does not compute.

    1. gotes

      Re: "the researchers developed a Chrome extension"

      Arguably a privacy risk rather than a security risk.

      1. JohnFen Silver badge

        Re: "the researchers developed a Chrome extension"

        Privacy risks are security risks --- just a particular subset of them.

    2. gias

      Re: "the researchers developed a Chrome extension"

      If the chrome extension is well-received, extensions for other browsers can be developed.

  10. a_yank_lurker Silver badge

    Misuse

    Stack Overflow is not a bad resource when used correctly. However copy/paste an answer from a question is never best use. The best use is to see what kind of code works for a specific problem. The problem needs to be read carefully because I have found often they are similar to my question but enough different that the answers are not quite what I need. But they can point me in the right direction particularly when then point out a function in the standard library I was unaware of that might do the trick. Also, pay attention to the dates of the post as that might tell you if the answer is applicable to the language version you are working with.

    To me the problem is misuse of a resource not the resource itself. Remember code snippets are intended to be examples of how to do something not the what you should do in production code.

    1. Baldrickk Silver badge

      Re: Misuse

      I'm a fan of the language lawyer style questions myself - the ones about the gotchas, what results in undefined behaviour and how to avoid it.

      Things in the vein of the most popular question/answer, that one about branch prediction fails.

      This is where I find some value in the site, for more than procrastination at least.

    2. Jason Bloomberg Silver badge
      Thumb Up

      Re: Misuse

      they can point me in the right direction particularly when then point out a function in the standard library I was unaware of that might do the trick.

      Exactly that. It's a hint or pointer to what should be used or what direction to look in for those who have never walked the path.

      But it's up to the reader to understand it's just the start of the journey, not necessarily the arrival at a destination.

    3. david 12 Silver badge

      Re: Misuse

      The depth of knowledge on Stack Overflow is pathetic. I'm in the top 15%, and that's by virtue of randomly quoting from the obvious documentation as an occasional recreational activity.

      Questions where I have an actual technical question are unanswered, and that's because the only people posting their are beginners, or, like me, lazy recreational users. And questions where the answer can't be answered with a restatement of a well-known resource are routinely closed, because a question that the administrators don't know the answer to must obviously be a bad question.

  11. DerekCurrie Bronze badge
    Angel

    If this is a 'Surprise!' to any coders, they're still living in the 20th Century

    This core FAILure of Object-Oriented coding has been blatant and well known for decades. O-O turned into Uh Oh!

    If you pull code you didn't write from anywhere and stuff it in with your own, expect problems. That's the lesson of Object-Oriented coding. And of course, there is the usual litany of why this is the case:

    1) Inadequate Coding Tools and languages.

    2) Code-By-Committee, which is the default these days for mammoth projects, with consequential incoherence.

    3) Code complexity beyond the comprehension of any single human.

    4) Poor or no code documentation. This is commonly for the sake of the usual short term thinking of profit over code quality.

    5) Lack of adequate code QA.

    6) Lack of vetting of incoming foreign or internal code objects. IOW blind faith.

    7) . . . Add your own . . .

    1. Benson's Cycle

      Re: If this is a 'Surprise!' to any coders, they're still living in the 20th Century

      All of these problems were being experienced by the 1960s and in languages like COBOL. Where you you think the phrase "garbage in, garbage out" originated?

      1,2, 3, 4, 5 and 6 all applied to a project I worked on in 1982. In Coral-66.

      For 6, what makes you think that back in those days we were able to vet the code of things like floating point libraries, let alone scientific ones?

      This is all nothing to do with OO and everything to do with programmers trying to cut corners, managers trying to cut cost and, unfortunately, a lot of Dunning-Kruger infested narcissists out there who will offer their half baked solutions to problems to people trying to do jobs which are beyond them because the bosses hired a new grad instead of someone with ten years of experience dealing with shit and bullets.

    2. Loyal Commenter Silver badge

      Re: If this is a 'Surprise!' to any coders, they're still living in the 20th Century

      Did you copy-and-paste your comment from SO? Because it reads like some of the "answers" on there...

  12. Winkypop Silver badge
    Alert

    A little knowledge

    Is a dangerous thing

    1. trindflo

      Re: A little knowledge

      And there is nothing so frightening as ignorance in action.

  13. Paddy
    Coffee/keyboard

    Well Duh!

    | "Basically, what we tried to show is that using Stack Overflow without reviewing it carefully can lead to potential vulnerabilities inside applications,"

    It seems the rate of misuse of bad code was low. A more positive headline of something like "Most Github projects avoid using SO code with known vulnerabilities" seems to be less desirable.

    I have both answered questions and got questions answered on SO.The worst problem is those thankless takers who don't even bother to aknowledge any answer, they just disappear leaving readers/helpers with no idea if any of the solutions were appropriate.

    I try and write good questions - some times it's easy as in when I had a short Python function and asked if a numpy guru could make it faster for me. I got four answers from one guy and one from another, so slotted in my own data and posted timings and my thoughts on how their examples might fit my use case, as well as selecting an answer to close the SO question. I tried to give back something to those who took their time to answer me; in a way that I had found useful in the past.

    Open source doesn't work when too many take.

  14. ZenCoder
    Boffin

    It is wise to learn from the examples of others, but LEARN is the key word here.

    I study examples, learn from them, and then use what I learned in writing my own code. I believe that in many cases to do otherwise is both lazy and dangerous.

  15. Blackjack

    Or in other words... beware of copy pasta!

    It is the worst kind of pasta!

  16. gnasher729 Silver badge

    “Using code from stackoverflow without carefully reviewing it is bad”. No, that’s not the problem. Using code from stackoverflow is bad.

    Stackoverflow is a brilliant tool if you don’t know how to attack a problem at all, telling you usually five ways that don’t work and one or two that work, if you’re lucky. At that point that should be it. You now have enough information to write the code yourself. Just remember that whatever you do it’s _your_ code and _your_ full responsibility now.

  17. BobBob
    Stop

    Copyright

    It’s worth noting that the examples are subject to copyright of the original author and shouldn’t be copied and pasted into your code without checking for permission/license first...

    1. Zippy´s Sausage Factory

      Re: Copyright

      I basically stopped once I found out that they have a complete database dump of the site available for download. That made me exceptionally uncomfortable and I stopped contributing as much as possible.

  18. fredesmite Bronze badge

    We never had these problems

    back in the days of punch cards and paper tape!

  19. Pete 2

    one in a thousand

    > they looked at more than 72,000 C++ code snippets in 1,325 Stack Overflow posts and found 69 vulnerable snippets

    You are whinging about 70 errors in 70 thousand samples?

    If Microsoft managed such a low rate, Windows would have had the invulnerability of Fort Knox in 1989 (even before SO existed).

    Hell, if any professional programmer made so few mistakes, software development would almost be considered a respectable way to earn a living.

    1. sabroni Silver badge

      Re: You are whinging about 70 errors in 70 thousand samples?

      >those 69 vulnerable snippets show up in 2,589 GitHub projects

      That's what they're whinging about. The number of times those vulnerabilities were copied.

  20. Loyal Commenter Silver badge

    The root of this issue is how people are using SO

    Rather than "help me understand how to X works, so I can write correct code", which is arguably the right way to use the site, many people take the "I don't understand X, so give me some code to do X without me needing to know how or what it does" approach instead.

    It's understandable, to a degree, when people are told by their managers, "just make it work, I don't care how." You don't get paid any more for doing things properly. It's a cultural problem, not always one of laziness on the part of programmers.

    1. JohnFen Silver badge

      Re: The root of this issue is how people are using SO

      "when people are told by their managers, "just make it work, I don't care how.""

      You know, I've been a developer for a very long time, and I'm pretty sure that I have never once had a manager tell me anything like that. If they did, I'd take it as a loud message that I shouldn't be working there. There are lots of companies who don't engage in that degree of unprofessionalism.

      1. Loyal Commenter Silver badge

        Re: The root of this issue is how people are using SO

        There are lots of companies who don't engage in that degree of unprofessionalism.

        Sadly, there are also lots that do, and lots of software pumped out by them. First to market and just-about-working beats second to market and well constructed.

  21. antman

    Newsgroups

    I know they're not as widely used these days but if you can find the right newsgroup (that's NNTP not HTTP) you can get good advice (which may include pointers to better articles in SO and other places) from experienced programmers. Those of us who've been in the game a long time tend to prefer newsgroups over web forums.

  22. tiggity Silver badge

    Surprised

    It is so few.

    If you are using a code snippet to illustrate something you will just type in the bare minimum, without any "sanity checks" as you expect user to make sure whatever they produce is robust.

    I would expect code with nasty buffer overflows etc. would be very common as people are not typing in "production code" but "how to" coding hints.

    So (depending on language) I would not expect to see try / catch, using, free, unsafe, assert etc. in SO snippets as error trapping code makes proof of concept code almost unreadable except in the simplest cases.

  23. Mark White

    One answer is never enough

    If I can only find one correct answer on Stack Overflow, I try a different question. If my investigation only has one answer then i haven't tried hard enough, any problem should have multiple solutions, especially transferring logic to a programming language.

  24. holmegm Bronze badge

    If the problem is that you don't understand the code, than *that's* the problem, not that you got it from SO.

    It's not as though copying from your first year textbook is likely to be any more secure, out of the box.

  25. Loyal Commenter Silver badge

    Obligatory...

    I can't really let an article about Stack Overflow go past without linking this...

    Jon Skeet Facts

  26. JohnFen Silver badge

    You think?

    Copy-pasting code from SO (or any other site) might have a downside? What a shocker.

    That so many devs do this is alarming, and doesn't speak well of the current state of the software industry.

  27. ocelot

    Quantum Programming With Stack Overflow

    I have sometimes used SO for "quantum programming" where people post bits of wrong code asking a question .. Filling in all the known wrong solution space ..

    Eventually I find the solution by specifically avoiding all that code....

  28. heyrick Silver badge

    But that's not a good assumption to make

    I don't see why not. The replying person is usually answering a specific question like how to do a quick and dirty translation from CYMK to RGB, or how to bash 8859/1 into UTF-8. Your answer ought to be short and to the point.

    It is then up to YOU (and you alone) to apply this into whatever it is you're doing, complete with appropriate checks and validations. It is nobody's fault but your own if you blindly copy stuff from The Internet (anywhere, not just SO) and paste it in and think "job done".

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019