ICANN meets in Seoul, South Korea this week, and top of the agenda are proposals to allow web addresses in non-Latin script, opening the way for Japanese, Arabic and other web addresses. The week-long meeting is expected to approve an initial limited use of "International Domain Names" before the end of the year. Rod Beckstrom …
Oh frabjous joy
This is going to be great... for scammers. IDN bases off of unicode, and unicode, that one million and a bit collection of characters and other typographic artifacts, not all of them defined (yet), is not really suited to _unambiguous_ representation. You may have an A now, quite distinct from the other allowable signs (though 1 and l and I and 0 and O regularly pose problems already), then you'll have a varieties of A with or without accent, or just funny looking. As to accents, you have characters-with-accent (one code point) or you can have the same as character (one code point) and accent (another code point). Or maybe you try character-with-accent (one code point) with that same accent (another code point). Or you pick another character that looks an awful lot like this one, but is from another language. And then you re-start the accenting game again. Or you mix in zero width spaces, or whatever else you can come up with. Only a million-and-a-few choices. Did I say not really suited to unambiguous representation? I ment really not suited to unambiguous representation of domain names.
Now, the ICANN bois know this, so there are Rules what is allowable and what is not. But the problem is that they have a big space of possible abuse and the rules punch out a few attack vectors. Did they really think of all possible attack vectors? I think not.
Thus, frabjous joy ahead.
I have some Japanese Language domain names in the .jp TLD, I've had them for years, it uses some system called punyode to encode the characters.
Why do i get the feeling that this won't be used to represent any of:
1. having or showing a merry, lively mood: gay spirits; gay music.
2. bright or showy: gay colors; gay ornaments.
3. given to or abounding in social or other pleasures: a gay social season.
I'm going to register microsöft.com asap.
They could impose rules that only Latin letters must be used for specific TLDs (.com, .co.uk) and all the dodgy characters restricted to the country's TLD.
... there goes my url regex
The letter "A"
U+15C5, U+15CB (Canadian syllabics)
U+1D00 (Latin small capital)
U+1DC2 (Latin superscript)
U+FF21 (Latin full-width)
And this list is probably incomplete, and certainly doesn't include the innumerable accented versions of A.
Unicode is a mess, frankly. It's a noble effort, but its foundations lack logical consistency: it includes typographical (stylistic) variants (the last three items in this list) as well as authentic letters. If the ICANN proposal allows any Unicode character, we are in trouble. If, on the other hand, it specifies that only certain characters may be used (iow forbids the stylistic variants), we're still in trouble because the coding will be horrendously complex and prone to errors.
In addition, the Unicode mechanism of using another font if the one chosen lacks a given character (an effect readily seen in the Character Map applet under Ubuntu Linux - right click a few obscure characters and they're quite likely NOT from the font chosen) means it's difficult to prevent display.
I'm keeping my fingers crossed that the gurus who have worked out the internationalization of domain names are six or seven jumps ahead of my small mind and have figured out effective solutions to such issues.
Nice way of making the mega corporations fork out for hundreds of permutations of domain names they'll never need.
Also, can I set up my own TLD please? They seem to be giving them away
@Piers re. .gay
Does this make you feel .sad ?
They can't impose anything within .co.uk as ICANN do not make the rules within a countries registry.
As for the other readers "Hardly New" comment, you are correct this is not new at 2ld and below but IDN's are very new at the TLD level (and hence in the root zone)
Well this should be good
I cant wait to see how many new and unique website URLs we will end up with. Joy.
Joy of Joys
I doubt the .cn TLD will use traditional characters... .hk or .tw, more probably :-)
can we have .hetero, .lesbo and .bi TLDs too please
- Vid Hubble 'scope scans 200,000 ton CHUNKY CRUMBLE ENIGMA
- Google offers up its own Googlers in cloud channel chumship trawl
- Bugger the jetpack, where's my 21st-century Psion?
- Interview Global Warming IS REAL, argues sceptic mathematician - it just isn't THERMAGEDDON
- Apple to grieving sons: NO, you cannot have access to your dead mum's iPad