back to article Facebook gains power to Like any word ever written

Facebook as signed up as a full member of the Unicode Consortium, the body that universal character encoding standard for written characters and text. Why should we care? Because Facebook is just the eleventh full member of the organisation and now has voting rights alongside the likes of Google, Apple, Oracle, SAP, Microsoft …

  1. This post has been deleted by its author

  2. Anonymous Coward
    Anonymous Coward

    Unicode (not to be confused with Unabomber) is awesome, The fact that facebook is now on board only makes me wish we had it when i was at school 40 years ago. Imagine writing your homework in Cherokee using 16 bit or 32 bit variants and telling your teacher it was Big Endian. What a laaaaarf we would have had.

    1. Forget It

      Come on unicode you can dot it!

      In Chinese, emphasis in body text is supposed to be indicated by using an "emphasis mark"which is a dot placed under each character to be emphasized. This is still taught in schools but in practice it is not usually done, probably due to the difficulty of doing this using most computer software.

      src: http://en.wikipedia.org/wiki/Emphasis_%28typography%29#Punctuation_marks

  3. Chairo

    Asian languages and Unicode

    The implementation of Chinese characters is really a problem in Unicode. You have simplified ones, used in Mainland China, traditional ones used in Taiwan and a slightly different subset of the traditional ones used in Japan + an additional of ~100 other characters that they use for the two other alphabets they use in Japan. All in all we talk about a set of several thousand characters for each set. It seems 16 bit Unicode is already at its limit there. Using a Chinese smartphone with Japanese web pages gives mixed results. Some of the Japanese style characters in unicode seem to be replaced with slightly different Chinese ones.

    I suppose the Chinese have the same trouble, the other way around. I wonder if this is sorted out with newer implementations of Unicode.

    1. nijam Silver badge

      Re: Asian languages and Unicode

      Unicode is not 16-bit.

      You're thinking, perhaps, of the UTF-16 encoding which is used in some Microsoft systems but not AFAIK elsewhere, because it isn't a particularly good way of encoding Unicode. Most network systems, as well as Linux, use UTF-8, which is a more natural encoding scheme for an essentially-unbounded set of symbols.

      1. Vincent Ballard
        Coat

        Various nit-picks

        It's not essentially unbounded either. It's 20-bit. Nor is UTF-8 unbounded: in principle it can encode 36-bit values (although it's never been specified for more than 31-bit), but beyond that you need to make major design changes. UTF-16 is a good way for encoding certain strings, in particular ones which mainly use characters from the top half of the BMP (e.g. a lot of the supported Asian languages).

  4. phil dude
    Headmaster

    this is important....

    ...as that means translation can work properly....

    You might still not like what is said, but at least you can read it!

    P.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like