MWN?
I thought I'd stumbled upon Mac Weekly News. It's like LWN but with more colour.
Nice article - proper nerdy.
There has been much sniggering into sleeves after wags found they could upset iOS 6 iPhones and iPads, and Macs running OS X 10.8, by sending a simple rogue text message or email. A bug is triggered when the CoreText component in vulnerable Apple operating systems tries to render on screen a particular sequence of Unicode …
> Nice article - proper nerdy.
The Williams[*] is back! (Back to writing articles at least, not sure if he ever went away completely). This is excellent news.
By the way, I had no idea Mr. Williams was so technically competent, in addition to his pathologically thorough journalism.
[*] The one and only. One of very few who deserve to be called a Journalist, feared all the way from BT offices to Ofcom corridors.
Chris Williams, the author of this article, works on The Register's sub desk.
Chris Williams, who wrote all the Phorm articles, went to work for The Telegraph, and now has the byline "Christopher Williams".
To avoid confusion between the two we changed all bylined articles on The Register by Chris Williams the FIrst to "Christopher Williams".
Agreed. Really good to see an article like this on el Reg.
Unicode is notoriously difficult to get right, so I have sympathy for the Apple developer w̻̔̽ͯ̄͒́̎ͅh̻̰̭̗̣̪̩͗̎ͯͣ͆̓o̬̱͚̟̹͉ͦͥ̔̈́̓ͨ͋ ͤͤg̭̩̲̍͐ͣ̈́̆͗ͅͅǫ̐ͥͬͣ̀̿̂t͚̤̙̠̫̐̌̾̉̽ ̫̳̫̈̅̍͗̑ṱ̴͎̲͇̯͉̖̊ͤ̈͐ͬḧ̤̳̭̠͉̱͌ͬ͞i̜̺͓̞̳̓̉̓ş͔̩̲͙̤̺ͬ̆̉̂ ̲̭̍̑̉̉̄̆ͫ͞wͬr̛͖̭͎͉̪ͬ͂ͩͥ̚o̢̰͉͙͇͖ṅ̌g҉̫͕̺
I'm not a Mac owner or user so I don't have access to a Mac and I have no familiarity with coding/decoding tools but that debugger looks quite nice. In my time, oh so many years ago, I used Softice, which undoubtedly some of you know.
That's the first time that I have seen any reverse engineering for quite some years and it beings back some nice memories. All nighters trying to determine where program logic could be "modified" ever so slightly for a variety of reasons ;-) ;-) ;-)
That's the first time that I have seen any reverse engineering for quite some years
Plenty of this sort of reverse engineering (typically of similar crashes) appears in messages posted to software-security lists like BUGTRAQ and VULN-DEV. Prominent examples include Tavis Ormandy's 2010 explanation of the Windows #GP Trap Handler bug, and the flurry of Java vulnerabilities documented by Security Explorations / Adam Gowdiak last year.
Incidentally, in the article Chris writes "it's mighty hard to leverage an end-of-array read fault into something more serious". Rather overstated, I'm afraid; exploiting integer over/underflow and OOB access is of course a time-honored practice,1 and memory-access violations (SIGSEGVs on *ix systems, exception 0xc0000005 on Windows, etc) in particular have some well-known exploit vectors. Typically those require a second vulnerability to be anything more than a DoS - for example, a vulnerability that lets the attacker alter trap handling for the process - but "mighty hard" is not warranted.
1That's why it's Sin #3 in The 19 Deadly Sins of Software Security, which should be mandatory reading for anyone who writes code.
Seems to me that CTRunGetGlyphCount is returning -1 to indicate an error and the bug is simply that this error condition is not caught and handled. (Probably because these days programmers use Exceptions too much and have forgotten that return values can also be used to indicate problems.)
I thought that, too, but Apple's definition of CTRunGetGlyphCount() is clear: "The number of glyphs that the run contains, or if there are no glyphs in this run, a value of 0."
It's not impossible that Cupertino's manual doesn't match operation, of course, but the code for CTRunGetGlyphCount() is pretty simple: it returns 0 or a pre-calculated count for the run. It's probably within the glyph run creation.
C.
Right, but the "-1 = error" logic seems to hold: Probably a miscommunication between programming teams. One team expects a string-handling function to always be successful (hey, after all, how hard is it to parse a string?), and the other team knows better their Unicode-fu, knowing an error condition *can* easily be reached with invalid strings.
> ... The question remains: Why are they using signed integers to represent a quantity that can never be <0 by definition ?
It has probably been implemented by someone used to returning negative values from a function to indicate errors, although in this case that appears to be not the case. Perhaps the type is used elsewhere where this is done.
Personally, if I see this in code that I'm reviewing, I insist it be changed. It is an abomination. Just like 'char' being signed by default in most C compilers, it makes no logical sense and is a hanger on from less enlightened times.
This post has been deleted by its author
I think you are looking at this from the wrong starting point. It's poor practice to mix data and control / status inband within the same variable.. ie: A function that normally returns the result of a computation should not return an invalid value to indicate failure. Better to call that function with a pointer to the output , then use an enumerated status type to return success / fail. One of the fundamental things about programming is being fussy about data typing and for embedded work at least, you nearly always need to know and track the size and type of variables.
What was the title of that old s/w engineering book ?
Algortithms + data structures = programs :-).
Fwir, a bit anal in it's approach, but can't argue with the title....
Chris
"There is a perfectly valid case for returning -1 for an error in a function that should always return zero or positive from normal processing, rather than throwing exceptions, but that valid case assumes that you are using a language which does not have unsigned numbers."
Given the documentation, the conventions followed by the sibling functions and the nature of the bug that case definitely does not apply here. The code is broken in this case.
That said occasionally I make use of returning magic numbers myself, but I prefer to throw an exception if possible. Sometimes I am forced to revert to the old C style where you return an OK/FAIL code and write the results via a pointer argument. Mixing exception conditions with results is asking for trouble in my experience.
The docs say: it returns "The number of glyphs that the run contains, or if there are no glyphs in this run, a value of 0."
I expect it's something along the lines of:
charcount=0
here=0
repeat
if char[here]=DELETE then charcount=charcount-1
and then it's asked for the length of the string "DELETE". I remember in my Software Engineering uni course 25 years ago several people getting caught by this.
You can start by flooding the sender of the offending text with clean texts, this will push the naughty text off the screen. I guess that you will have problems if you try to go through the text history though.
You could try asking apple but they (still) haven't publically acknowledged the bug, and thus accordingly aren't offering public advice.
Unsigned integers are best avoided in C (and C-derived languages like C++ and Objective-C) because they are contagious. For example, (1u - 2) is not -1 as you might expect, but some huge number.
It's also useful to have -1 available to represent "no such number"; for example, the length of a file that doesn't exist. Use 0 would be wrong because it's a legitimate value. More generally, having invalid representations adds redundancy which can help error checking.
"For example, (1u - 2) is not -1 as you might expect, but some huge number."
True, but that's a gap in the language and the coder should be aware of it.
"It's also useful to have -1 available to represent "no such number"; for example, the length of a file that doesn't exist."
The length of a file that does not exists should never be a concept you deal with. The program should have already done an exists(filename) call before even thinking about asking for the length of the file (not to mention stat or something). That sort of code is like something I would have written in BBC Basic 30 years ago. We should have moved on in our coding styles, even if still using languages like C which allow it.
Well, it might be. Consider the following:
unsigned int ui = 1U - 2;
signed int si = 1U - 2;
printf( "Unsigned : %u %u\r\n", ui, si );
printf( " Signed : %d %d\r\n", ui, si );
This gives:
Unsigned : 4294967295 4294967295
Signed : -1 -1
Which shows that the context is important, not just the operation.
It's all to do with the (complex) integral promotion rules within the language. Another trap for the unwary is that:
a + b + c
may not give the same result as
c + b + a
"It's also useful to have -1 available to represent 'no such number'; for example, the length of a file that doesn't exist. Use 0 would be wrong because it's a legitimate value. More generally, having invalid representations adds redundancy which can help error checking."
Wouldn't this be a good case to use the humble "null" value rather than resorting to signed integers?
Wouldn't this be a good case to use the humble "null" value rather than resorting to signed integers?
There is no "null" value at this low level of coding. What is treated at higher levels as null is generally represented at this level by a separate bit in a bitmap or a value outside the acceptable range.
"Tsk! Baby out with the bathwater again, youth of today, general lack of wherewithal, three world wars, shrapnel in the head, wouldn't've happened in my day, flogging too good for them etc etc."
E by gum, we used to have to lick out pond etc ad nauseam :-)
Nothing wrong with the youth of today. They are just as rebelious and arrogantly (un)sure of themselves as we all were at that age :-).
The problem is that programming is now seen as easy, when it never was or is easy to design and build robust systems. While any fool can write a two page utility that works, designing systems takes knowledge of data structures, algorithmics, hardware capabilities and limitations to get the best results. I wonder how many web programmers that you meet these days have any clue about what goes on under the hood, in the hardware, have ever read a book on data structures, or operating system principles ?...
Chris
Don't use one parameter to mean two (or more) different kinds of things.
That's one of my golden rules. I'm also a firm believer in Syntactic salt. Combining two things into one is 'syntactic sugar' - it's more convenient for the developer. I actually avoid returning errors when I can. Most languages allow you to ignore returned values I can't force developers to use an error code returned by reference but I can bloody well force them to declare a variable to store it so at least they can't claim they didn't notice it :)
"The compiler should really kick up a fuss for implicit casts from signed to unsigned. If there is an explicit cast then the programmer is due a spanking."
The compiler only knows what it's told. If data is coming in from outside then it is depending on the coder to correctly define what the format is.