Google may be "99.9 per cent" certain that it will leave China, but Twitter will instead move into the Middle Kingdom. Eventually. Or so Twitter creator Jack Dorsey told a New York gathering sponsored by news site ReadWriteWeb. According to a report in Tuesday's New York Times, Dorsey was asked by Chinese artist and activist Ai …
"The web may support the most universally popular of these - Unicode - but as the NYT rightly points out, few cell-phone SMS systems do."
Really? SMS messages can use UTF-16 coding, limiting the message length to 70 characters. Sending and receiving Chinese SMS messages depends on handset support, but, strangely enough, phones sold in China do have Chinese support, and have done for a long time. I've got an ancient Nokia 6150 hooked up to a PC using gnokii that has been happily sending Chinese SMS messages for years.
Even the 70 character limit is no problem, as words are only one, or sometimes two, characters, and no spaces are required.
Of course Unicode is supported everywhere in China
Chinese phones are mandated by law to support GB18030, which is a Unicode encoding.
What is more of a problem is that SMS messages in China are limited to 1000 characters, not 160.
Richer messages? Don't make me laugh
The whole power of twitter is that it's restrictive. 140 characters gives about 25 words, a bite-sized, scannable nugget of information.
Twitter in Chinese would only really work if it was restricted in a similar way. And to be honest it would need to be, unless Twitter expanded the database - due to the richness of Mandarin, a Chinese 'character' (really, word) takes a lot more than a byte to store.
Not to mention - do you support Mandarin AND Cantonese, do you support simplified or traditional scripts... etc.
Anyway, regardless of language, expanding the size of the message does not add to it's 'richness'. There's a whole unread blogosphere out there to prove this :)
"due to the richness of Mandarin, a Chinese 'character' (really, word) takes a lot more than a byte to store" - In Unicode (and also in Big5 and GB2312), it takes 2 bytes, OK, it's a 100% increase, but is that "a lot"?
If you're supporting Unicode, then choosing between Mandarin and Cantonese (or other dialects, such as Szechuanese, Shanghainese etc.) is unnecessary - the problem was fixed hundreds of years ago when the writing was standardised (just make sure you have the HKSCS extensions for the Cantonese swear words). Supporting both traditional and simplified is just a matter of using the right fonts.
Anyway, Twitter already accepts Chinese characters in tweets, the only issues appear to be translation of the user interface, whether some servers are located in China, and how it is marketed.