back to article Pretend Python packages prey on poor typing

The Slovakian National Security Authority on Thursday warned that PyPI, the repository for Python software packages, has been hosting malicious software libraries. The group's cybersecurity division, SK-CSIRT, identified 10 fake libraries designed to dupe developers through typosquatting. The names of the malicious libraries …

  1. Andraž 'ruskie' Levstik

    That would be Slovakian - .sk - Slovenian would be .si.

  2. Michael Hoffmann Silver badge
    Unhappy

    There but for the grace of god...

    Ugh, I mistype a fair bit before my first or after my fourth mug of caffeine. I presume these are just the ones they fund (sic!)?

    Unless the spoofers are so thorough they also post docs, a more reliable method would be to grab a full list from reathedocs or such and correlate it? What fun!

  3. thames
    Boffin

    The Real Problem is a Bit More Complicated

    The Pypi repository has supported author GPG package signatures for years. Pip however doesn't automatically check them. You can supposedly do this via Debian uscan, but I haven't tried that so I can't say how easy that is to do.

    Signatures however don't solve the problem being described here. If the package was signed by the author, all that tells you that the package hasn't been altered on the server after it was uploaded by the author. If you can't trust the author, the signature doesn't help you at all. You get exactly the same issue with Github, Cpan, or any other public repository of source code or binaries.

    The real problem is that adding signature checking does nothing to address the actual root cause. The actual problem is verifying that what you told the package manager is what you thought you really meant. If you meant to install package 'x' but told Pip to install package 'y', then Pip can only go by what you said. In a repo of developer tools and libraries, there are legitimate reasons for having some very dangerous tools in the toolbox.

    The real solution for most people is to install from their Linux distro's repo, rather than directly from Pypi (or any open repo). Thinking about these sorts of problems is exactly what the people at distros are there for. They go out and get the packages they think should be in their distro repos, rather than letting just anybody upload whatever they want. Stuff gets added to the repos when there is genuine demand for it. If you are running Debian and want Django, get it from the Debian distro and you will get a version that is vetted, has automatic security notifications and updates, and you will get a version which has been integrated and tested to work with the rest of the system.

    If you really need to get something directly from Pypi (because for example it's a more obscure package which is not in Debian's repos), then check out what you're asking for rather than just browsing around and picking a name that looks close. You can download a package and install it later from a local copy if you're really worried about poor typing skills.

    1. Anonymous Coward
      Boffin

      Re: The Real Problem is a Bit More Complicated

      Installing packages only from your distro's repo is all very well so long as you like to run very old versions of a very small selection of packages. That's great if you don't actually use much of the surface area of the system, I suppose.

      The actual solution is not this: I'm not sure what it is, but not this.

      1. AVee

        Re: The Real Problem is a Bit More Complicated

        I guess the combination of both is the actual solution. What a distribution provides is a curated feed of packages. Right now it is the only curated feed we have for Python packages, so at the moment it is the only solution. Now I've tried doing exactly that, only use the Debian provided packages for a Python application. I failed. The solution seems to be obvious, we need a feed of Python packages which is properly curated. To me it doesn't make sense to use the feed of a OS, they have other priorities. But it does need to be managed the same way, the Debian model works and could well be applied to a Python specific repository.

        1. Charlie Clark Silver badge

          Re: The Real Problem is a Bit More Complicated

          But it does need to be managed the same way, the Debian model works and could well be applied to a Python specific repository.

          You can do this already with devpi.

    2. FatGerman

      Re: The Real Problem is a Bit More Complicated

      ..and using a Linux distro's packages doesn't help those of us not using a Linux distro (eg OSX).

      Even on a decent Linux distro the real problem is that pip is braindead and often overwrites system packages with incompatible versions of its own devising.

      As I recall, ppm was rather good.

      1. Charlie Clark Silver badge

        Re: The Real Problem is a Bit More Complicated

        Even on a decent Linux distro the real problem is that pip is braindead and often overwrites system packages with incompatible versions of its own devising.

        Cart before the horse: it's the distro's that mess with Python's packaging and then expose it to users. Setup correctly and pip will install to the user's home directory and they'll never have to fuck around with apt-* which should only be used for system stuff. Separation user and system packages? We've heard those BSD guys do that but we can't be bothered…

    3. Charlie Clark Silver badge

      Re: The Real Problem is a Bit More Complicated

      The real solution for most people is to install from their Linux distro's repo

      I don't agree with this at all. Distro's packagers often get it wrong and Debian abuses Python's packaging so much that it causes problems for other libraries. I have had to deal with this a couple of the times in the past and it is infuriating! Furthermore, user project installs should always be separate from system ones.

      But your main point stands: typos are an established attack vector.

  4. Morten Bjoernsvik

    pypi should have a better weighting

    Visiting pypi I always find lots of modules doing the same thing. You have weight that goes up to 15?. But it does not say much about documentation, ease of use, updated and used in other modules. I went through 5 modules concerning active-directory with weight more than 7 before I found one I got working. Even requests the most popular pypi module only have weight 8. This is way better in perl on cpan.org where you get weight based on user feedback. I was never in doubt when adding a sub quality module.

    On nodejs we have meteorjs which have their own wrapper around npm so you only download verified versions of npm packages that work with meteorjs. Assure if works both client and server. Helped me a lot especially upgrading old single page apps.

    1. Charlie Clark Silver badge

      Re: pypi should have a better weighting

      And who do you think is going to do any of this? Not that I think anyone really gives a fig about the weighting on PyPI. I think it was added at the start when people thought these mattered. I suspect that nowadays most PyPI packages are installed automatically as dependencies and CI systems.

      It would be far better if some kind of static code analysis could be run on every uploaded package but, again, who's going to set this up and pay for it?

  5. Anonymous Coward
    Anonymous Coward

    Re: What about auto-updates?

    I think the command needs a space at the end.

    pip list –format=legacy | egrep '^(acqusition|apidev-coop|bzip|crypt|django-server|pwd|setup-tools|telnet|urlib3|urllib) '

  6. Suricou Raven

    This should be easy to detect.

    Just calculate the Levenshtein distance between each pair of package names. Any pair with distance <=2 should be taken as a sign of suspicion and sent to the site admin team for manual review. Repeat for each new submission.

    1. Anonymous Coward
      Anonymous Coward

      Re: This should be easy to detect.

      That's a good trick. In fact the people who manage the index should be doing this themselves (conveniently, there's a python package...)

    2. Peter X

      Re: This should be easy to detect.

      Cool!

      So, Levensthien Distance you say? ;)

    3. veti Silver badge

      Re: This should be easy to detect.

      That would only work on half of the examples listed in this article.

      Which is better than nothing, sure, but it still leaves a lot of attack space.

  7. Anonymous Coward
    Anonymous Coward

    The number of packages for a language...

    ... is inversely proportional to the average programming ability of the people using them.

    There's a package for a telnet server? Seriously? Just how hard is it to open a listening socket, accept some connections , parse a simple inline protocol like telnet and do some tty handling? Answer: Not very.

    Half the problem with bloat and exploits in code is down to endless use of external packages by lego brick "programmers" who are incapable of writing even moderately complex code themselves. Javascript is the worst for this but python is coming up fast behind.

    1. Anonymous Coward
      Mushroom

      Re: The number of packages for a language...

      Because every scientist who wants to use Python to process their data should understand telnet, right: it's not enough to understand all the maths they need to actually do science, they also must learn all the tedious bureaucracy of every networking protocol they need to talk? Or perhaps the language they use should be able to isolate them from all that crap, the way it isolates them from understanding the fine details of various numerical algorithms. Because, you know, it's not 1956 any more.

      1. Anonymous Coward
        Anonymous Coward

        Re: The number of packages for a language...

        " right: it's not enough to understand all the maths they need to actually do science, they also must learn all the tedious bureaucracy of every networking protocol they need to talk?"

        No, it isn't and yes. If you're going to be writing networking code you should damn well have a clue whats going on, otherwise its exploit city.

        "Or perhaps the language they use should be able to isolate them from all that crap, the way it isolates them from understanding the fine details of various numerical algorithms"

        If they need that much hand holding then perhaps its better just to get someone in IT to write the code for them.

        "Because, you know, it's not 1956 any more."

        You mean back in the day when coders actually had a fucking clue? Yeah, you're right there.

    2. veti Silver badge

      Re: The number of packages for a language...

      Seriously, someone is still making this argument?

      Clue: if you're writing everything yourself from scratch, then clearly your time isn't worth much.

      1. Anonymous Coward
        Anonymous Coward

        Re: The number of packages for a language...

        "Clue: if you're writing everything yourself from scratch, then clearly your time isn't worth much."

        So he thinks one should never rewrite code from scratch? What, like Linux being a rewrite of unix and GNU rewriting all the unix utils? Yeah, I wonder how that turned out....

        I'm afraid Joel doesn't always get it right. Sometimes re-writes ARE the best solution and sometimes writing code yourself instead of using libanother from some unknown author who'll probably drop support when get gets bored IS the best approach.

        1. Charlie Clark Silver badge

          Re: The number of packages for a language...

          So he thinks one should never rewrite code from scratch?

          Stop being disingenuous: he said rewrite everything from scratch. Reuse or rewrite is casting build or buy in a different light and should be assessed for each project. This notwithstanding: PyPI is full of fairly pointless libraries. Rinse or repeat for all other public repositories.

          The success of Linux was as much a happy accident as anything else. If BSD Unix hadn't been mired in a legal case with AT&T for years, Linux would probably have remained an unremarkable Minix clone.

          1. Anonymous Coward
            Anonymous Coward

            Re: The number of packages for a language...

            "The success of Linux was as much a happy accident as anything else. "

            Ditto DOS and Windows. The point is that Linux is a rewrite and its now the most used version of Unix in the world.

            1. Charlie Clark Silver badge

              Re: The number of packages for a language...

              The point is that Linux is a rewrite and its now the most used version of Unix in the world.

              Maybe (could get into arguments as to whether Android is Linux, etc) but what does that have to do with your argument?

              A better example would be something like LibreSSL or BoringSSL vs OpenSSL. Forks, ports and rewrites all have their place but how often do you want to rewrite a sorting algorithm?

    3. JSTY
      Flame

      Re: The number of packages for a language...

      > python is coming up fast behind

      Oh it's not even close. I present for your amusement http://www.modulecounts.com/ - Javascript modules are the digital equivalent of grey goo it seems.

      Burn it. Burn it all with fire.

    4. Doctor Huh?

      Re: The number of packages for a language...

      "There's a package for a telnet server? Seriously? Just how hard is it to open a listening socket, accept some connections , parse a simple inline protocol like telnet and do some tty handling? Answer: Not very."

      That depends on how you do it. A typical, half-assed, 2-minute typing session that works passably on the happy path and blows up in the face of the tiniest deviation from that path is easy. Bullet-proof code that handles error cases with grace and as automatically as possible while making available accurate and helpful status information is more difficult and takes enough time that encapsulating it in a library is worthwhile. Now, of course, if half-assed crap code is encapsulated in a library and widely distributed, nobody benefits and the entire programming language ecosystem is harmed, but that situation is usually rectified with some alacrity.

      The great programmer Tolstoy opined that happy paths are all easy and all the same, but every error path is erroneous in its own way. Or something like that.

      1. Anonymous Coward
        Anonymous Coward

        Re: The number of packages for a language...

        > The great programmer Tolstoy

        Must upvote that. :-)

  8. Anonymous Coward
    Anonymous Coward

    At least it's not Java.

  9. pompurin

    If they're to use non memorable names for packages then I'm not really surprised.

    They should be pointing some of these 'mistyped packages' to the correct ones in most cases, like meta packages in ubuntu/debian.

    Why on earth are 'bzip, crypt, pwd, telnet and urllib' not reserved or meta packages? At the very least something on the lines of 'you typed telnet, did you mean telnetsrvlib?' based on package popularity.

    1. Charlie Clark Silver badge

      You're comparing apples and moon rocks. PyPI is where new stuff gets published, distros choose want they want to offer.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like