Security researchers have uncovered critical flaws in open-source software that implements the Extensible Markup Language in a staggering array of applications used by banks, e-commerce websites, and consumers. The bugs uncovered by researchers at Finland-based Codenomicon were contained in virtually every open-source XML …
XML is flawed in that there's no standard for storing untrusted data. CDATA is trivial to break out of. Escaping '<', '>', and '&' is common but wrong. Even with proper escaping, most parsers can't handle binary data. Base 64 is safe but it's huge. Then there's the character encoding issue. The encoding is stated in the encoded file itself! The file has to be read once in ASCII then re-read with the correct character set. In the rush to meet deadlines and avoid compatibility problems, most coders ignore the issues.
I love the smell of bullshit in the morning ...
Surely *any* text-based file that is being read by an application makes it just as vulnerable to the same type of issue? I'm not a big fan of XML as a storage format but its no more susceptible to this type of attack than any other text based storage such as INI, CSV, JSON or whatever. Or am I missing something?
I can see how C apps may be more susceptible to buffer overflow generally (for a wide variety of reasons), but can't imagine, say, .Net apps using standard XML parsing libs to be much of a problem. Nor Python or PERL come to that ... its up to the programmer to prevent buffer overflows, not the format of the file it happens to be reading.
It's not Python but Expat that is vulnerable
Expat is a popular XML parsing C library that could have buffer overflow errors in it.
The original article is here: http://www.codenomicon.com/labs/xml/
It's longer, but it's just as vague. Apparently this is as to not disclose any information that could be used by hackers, but this feat also makes the article pretty useless to anyone else.
I find it quite hard to believe that major open source XML libraries should be susceptible to simple overflow attacks. That they may pass on garbled information rather than returning an error in case of a bad XML file could probably happen. Thus theoretically an otherwise safe application could be compromised because of the parser returning data that it should not be able to return. But realistically, in most cases you would have to perform some sort of data validation beyond what the parser does, that should set potentially dangerous data straight.
The closest thing that I can come to a conclusion on this matter: There is probably loads of vulnerable applications out there that can be attacked through an XML file, but I doubt that the parsers have a lot to do with it, even if they are flawed.
So let me get this straight,
"we tested out some XML frameworks and some of them broke". Good, this is nice to know. Now tell me which ones so I can see if I have a problem. Not telling? The CERN advisory has a very short list but if that is the full extent of what they found then its not much. @Fazal Majid says that expat has a problem - OK, that's interesting to me.
"broke things might run other people's code". True. Do any of these top pieces of software break like that or is this just a statement of general principle? I agree with the principle but not all broken software breaks in the same way.
"here is a list of XML parsing software - we haven't tested most of it but it may all be broken". Or not. I'm having a little trouble with this logic. I want a list of what these guys have tested, not a wikipedia entry on XML.
"We have a piece of software that everyone should be using to test their libraries". OK, now I understand what this article is all about - its an advertisement.
In reality most XML parsing software is regularly tested with broken XML. I do it all the time without even trying. A typo here, a misplaced character there, some broken encoding, whatever. And what happens? I get a message telling me that my XML is broken. Just like it should. Now, if the application using the library is too stupid to realise that something is broken and chugs on regardless then bad things might happen, or if the application lets the library stop the program (very unusual in my experience) then we might have a denial of service attack against the application.
Many applications using XML do so with XML that is completely under control of the software or the local user so there isn't likely to be any direct threat. Its only the applications that process XML from untrusted sources that are at risk.
Maybe not everyone is doomed after all.
ASN.1 is not really a network standard but rather...
...a specification of a notation used to specify the data structures in a protocol. For example, it is used to specify the structure of the X.509 public-key digital certificate. The errors referenced in the article were in libraries containing the routines used to extract data from ASN.1 structures.
...I seem to have a storm in my teacup.
The FUDmeisters are at work.
Would have been more useful to have pointed out where the bounds checking was missing so it could be fixed.
Not too good validation
As far as I know, this isn't a valid xml document, but it gets parsed ok.
<?xml version='1.0'?><protocol v='1.0' id='1'><greetingrequest />as></protocol>
I think the '>' should be escaped.
""We have a piece of software that everyone should be using to test their libraries". OK, now I understand what this article is all about - its an advertisement."
"Many applications using XML do so with XML that is completely under control of the software or the local user so there isn't likely to be any direct threat. Its only the applications that process XML from untrusted sources that are at risk."
Thanks Brian :-) That's exactly what I was thinking. The report just looks like it was created by their "fuzzy" scaremongering software to me.
My clue to the fact it was an advert was the fact that the company name was mentioned twice in quick succession at the beginning of the article.
Tell us which libraries are affected, ffs..
I agree with Brian Scott - the list of libraries listed is very limited. Does it include the MSXml controls?
Having said that, I wrote a server with embedded libexpat. That'll need recompiling. joy.
Broken XML != malicious XML
I have no idea what the issue here is but people expressing surprise that a library could be broken should enter the mind of an attacker. Let's say an app allocated a 1024 byte buffer to hold the tag name. The attacker might like to see what happens if they entered a 1023, 1024, 1025 or greater tag. C strings are null terminated so a 1024 byte buffer can only hold a 1023 byte string plus terminator. It would be easy for a programmer to screw-up on a length check and cause an overflow. If they can get the app to crash then they've discovered an exploit. Likewise they might try tags containing binary data, or UTF-16 or null characters, or deeply nested data, or long entity names, or entities containing certain data. Anything that might break internal buffers used to hold state information.
Open source offers some protection against this since the code has been scrutinised by a lot more people than closed souce. But at the same time it isn't immune from bugs or exploits. In the case of an XML parser, it may well be that a disproportionate number of commercial and non-commercial apps rely on expat which means they're all vulnerable to the same issue. I am surprised to see Java listed as exploitable - perhaps they also use expat or some other native library to speed up parsing too.
This is why we run services as "nobody" - you do not trust the application to do any more than its job, and therefore, no additional privilege is given to that application. This is common sense, and it has been done since ages past, because sysadmins always know they can never trust application developers. Zones (Solaris 10) and jails (BSD) afford extra security - and are, thankfully, common practice among serious sysadmins.
To be honest, XML is not really the biggest issue out there - if your firm has a process that allows someone to sneak an infected XML file on to your Apache server, you have bigger problems than an unpatched httpd. Try looking at your configuration management process, and then have a look at security. If you are running a bank, you should consider nothing less than BoKS - it routinely logs all keystrokes. If a sysadmin knows their security officer will be looking down a long microscope at everything they type, I guarantee you, they will watch what they are doing.
So is everything based on libxml broken?
Just through computer programming 101, right?
"In the rush to meet deadlines and avoid compatibility problems, most coders ignore the issues."
Yeah, see, that's why we have libraries. If you use libraries instead of hand coded XML parsers, all these problems go away.
Depends what parser you use! When I'm writing Perl I use XML::LibXML... a module which is nothing but a set of binder functions to a pre-compiled library written in C. Therefore, Perl/Python apps will indeed be vulnerable if you use libxml/libxml2.
Who knows what the M$ position is?
Looks like it is in DTD handling
In the Xerces patch off the CERT alert
1. it's in the C++ Xerces implementation, not the java one
2. its in the DTD processing, where you can send it into a loop, hence a denial of service.
In the Apache SOAP engines, DTD processing is switched off before parsing begins; this is because SOAP demands it. You can turn a fair few of the XML features off (schema resolution, entity expansion), all of which are worth doing before you start parsing untrusted content.
And the flaw is...?
> It's impossible to know now if the flaws uncovered in XML will be as far reaching as all that.
Yes it is impossible to know, BECAUSE YOU HAVEN'T TOLD US WHAT THE FLAW IS!
already fixed in Java
See also the bug fix in the latest Sun Java update:
6845701 jaxp parse Xerces2 Java XML library infinite loop with malformed XML input
You could reverse engineer the vulnerability from diffing the jdk source
Don't parse DTDs
Having looked at the Apache patch, the issue seems to be in DTD parsing.
My suggestion: don't parse an untrusted DTD. (Why would you ever want to parse an untrusted DTD? Surely the point is that you load your own trusted DTD and use that to validate the untrusted XML input.)
Just enable the NoScript extension.
hey, wait a minute...
1. I'm responsible for a number of XML applications, most of them open source.
2. I'm responsible in some way or another to two of the developers mentioned in your article: namely Sun and Apache.
So I need to know what you're talking about, right? How does an advisory about Sun or Apache reach me through El Reg without having come on a security@ list?
OK, these are both big orgs, with lots of different XML applications. Must be none of those I work on or with are affected, right? But your article says "most" opensource XML apps (echos of Eggwina there), and the C libs are the worst affected. Yep, I use mostly C libs, and they're open source.
So I follow your link. Right, neither of the most popular C libs (libxml2 and expat) are listed as affected, unless using expat with python (tick, nope). Good, that's all-but-two of my apps in the clear, and one of the two is documented as long-since-abandoned-don't-use. What about the final app, which uses Xerces-C++?
The report you link to is still way too vague to be useful. And just to cap it, the two CVE links at the end both lead to Not Found errors from NIST.
Useless FUD? Or what?
"This is why we run services as "nobody" - you do not trust the application to do any more than its job, and therefore, no additional privilege is given to that application."
Yet, if an application is compromised, it will be able to compromise all your other services running as 'nobody'.
Use a seperate id for each service, where possible.
an obviously biased article asserting a lot of FUD against Open Source software.... or 'freetards choice' as the Register seems to angle such options.
at least they could test the Open Source stuff properly as they has access to the source for their fuzzer. like to predict how many proprietary software apps and libraries are just (if not more) vulnerable? I'd choose the Open Source stuff any day - at least fixes tend to be quite rapid rather than waiting a month or more (eh Cisco?) until some special patch day - or until a known exploit is out in the wild :-(
The problem is XML
XML is such a simple concept. Provide a language that permits you to write structured text (text with tags etc. to give hierarchy and structure to it). Unfortunately, if you go and read the XML standard you discover that the language is an absolute nightmare. It is packed full of special cases and idiosyncracies such that you need a mega-complex parser to handle it all. Just the section on auto-detecting which character set is in use is a nightmare. I've written a full XML parser so I'm well aware of how complex the language spec is and how easy it is to screw up when you implement the parser. Personally, I would love to drop the disaster that is XML in favour of something similar but much simpler. Get rid of a lot of the unneccessary fluff like DTDs and Entities. Have simple mark-up and escape characters. Also have an explicit way of figuring out what character set you use (probably with a fixed length magic string at the top of the file). If you want the complex stuff like DTDs and the like, write the specifications as add-ons, rather than as core language. And insist that the add-on will be a well formed version of the original language.
- iPad? More like iFAD: Now we know why Apple ran off to IBM
- Apple orders huge MOUNTAIN of 80 MILLION 'Air' iPhone 6s
- +Analysis Microsoft: We're building ONE TRUE WINDOWS to rule us all
- Climate: 'An excuse for tax hikes', scientists 'don't know what they're talking about'
- Black Hat anti-Tor talk smashed by lawyers' wrecking ball