And on this great day...
yay the phishing commence!
ICANN has voted, as predicted, in favour of non-Latin web addresses. The move means that from 16 November countries can apply for extensions in their own languages. And if approved local domain registrars can start accepting registrations. Peter Dengate Thrush, ICANN's chairman, said: "Right now Internet address endings are …
yay the phishing commence!
...will now have double the number of domains to squat on.
So now you can be directed to sites (esp Porn & Spammers) that most blockers can't handle. How many Internet users can read
Russian + All other Cryllic based Languages
Chinese or any dialect
Great move Icann.
I'll bet this also exposed some bugs in DNS etc. How much of the code is capable of handling UTF-8 let alone UTF-16?
We should have got IP V6 deploeyed first.
Every phisher in the world will now be preparing to register addresses with the following suffixes:
Try copy/pasting them on to the end of http://www.google and seeing what happens.
... what if there is a website in China that I need to get to, but it's in Big5? Does that mean I will need to map a layout, or get an onscreen package to key in the characters?
Might be enriching for the non-latin countries, and I'm not against it. Just wondering is all.
I seem to remember this was demanded in 1997 and is only now being done after Putin demanded Cyrillic name spaces in 2008 and threatened to do it anyway, and China was going to follow suit.
So yeh! Go ICANN! Finally, the full alphabet in domain names, more than a decade after it was in the web pages! Only a decade delay! Well done!
Wouldn't it be easier if everyone just spoke English?
If you can't type the address, there's always your bookmarks/favourites file....
@Steve Davies 3
The next version of your spam blockers will deal with this. Even better, any moment now there will be a Firefox plugin that deactivates likes to all non-Latin URLs because you're a monolingual bigot who wouldn't understand what was on the page anyway!!!!
If only there were some way of either searching for web-sites, or storing a reference to a page in such a way that you could use a clever bit of software to navigate to that reference without having to type it in. Is someone working on something like this? Hopefully someday my vision will be realised.
Luckily many people run that nasty "Bind" piece of work chrooted now. Get ready for a long sequence of buffer overflow bugs.
Reverse the situation.
Chinese, Arabic, Russian and Japanese people have manage to get to "normal" address despite not having the keys on their keyboards, but I guess they are less stuborn than English speakers.
And as for phising, what's going to stand out more, as address in chinese or one in US version of english?
Oh dear, that is a worry.
Domain names are so all-pervasive around the internet that I can't help feeling this will cause all sorts of problems.
What about email sent from/to these new domains? Have all the systems in the mail delivery chain (smtp/pop3/limap servers, local mail clients) been updated to handle this? And as another correspondent pointed out above, the phishing opportunities could be rife.
Agreed but a simple way to overcome this in browsers (at least in the url bar) is to colour space each character according to character set.
Also it cant be that hard for a browser to throw up a warning to users that addresses contain multiple character sets.
The only real problem is email. Lots of people, especially in business, labour under the false belief that email is guarenteed, when it never has been. Getting users to type latin characters correctly into email addresses can be fun, let alone our foreign counterparts in big corps, adding the fun of non-latin characters to domains!
I work on Oracle, we have full UTF support, I'm not bothered! Pity the network and email admins!
..I think you are missing my point, but it must be me not explaining my point very well.
Sarcasm aside (I'm looking at you @AC 10:55, you coward), favourites would work on the return visit, but searching would be a bit difficult when the address is www.英國臺灣同.ch.
The idea is that any registrar who lets those pass will lose their accreditation. ICANN have basically spent 10 years devising a technically workable definition of "likely to mislead". We are now about to discover whether they succeeded or not. It might be a good moment to discover what your browser does with such URLs.
Remember that the TLD still needs to be approved... much in the same way as we don't have issues with .c0m or .cºm domains now, we shouldn't have issues with .cסm with this change.
That'll be 'cos they have the Latin chars on their keyboards and in the base character set for acronymns, anglicisations, holding conversations internationally and such, including (surprise) typing in URLs.
The content providers'll be pleased. De facto balkanisation of the internet, just what they wanted for Christmas and I wouldn't mind seeing where the heavy side of the lobby cash for this one came from.
The phishing argument's a valid one though, given two URLs of "line of twelve squares with three dots in it" and "line of twelve squares with three dots in it", which one's the phishing site? Not a problem for an automated antiphishing tool looking at the bytes behind this, but it's a bitch for eyeballs. I suspect most will just avoid URLs with embedded local charset nonprintables. Oh, there's that balkanisation again.
Allowing non Latin in things like IT protocols is a huge mistake, unfortunately I'm the only one who thinks like that.
IT protocols were a space where everyone enjoyed using a single language.
I'm all for cultural richness and all the politically correct crap, but in the IT space is just asking for trouble. These changes were asked by politicians, people who usually have no f***ing clue.
And before I get burned, languages appear or disappear because they are convenient for people, once a language is not convenient anymore it disappears.
SPAM SPAM SPAM SPAM SPAM .....
This is great for further democratising the web.
People who are complaining don't realise that the majority of the planet do not understand latin characters. Not only is 'google.com' meaningless, it is illegible.
Up to now, only the educated elite were able to get access. Now every literate person should be able to get online. Regardless of whether they speak arabic, chinese, russian, afghan, etc.
It'll take a little while to get used to seeing these 'weird' domain names. Get used it!
What I don't understand is why it's just the TLD that's changing, why not everything after www.?
almost everything posted here so far is from the clueless and ignorant.
the DNS protocol has always been 8-bit clean. UTF isn't involved in internationalised domain names either. they start with unicode which then gets normalised and encoded as ascii strings. so テスト(test in Japanese) gets translated to "xn--zckzah" as an IDN. a web browser or mail program can in theory then take a domain name like fubar.xn--zckzah and display it in the appropriate script for a Japanese user.
buffer overflows don't come into it either. the length of an IDN still has to fit into the DNS protocol limits. besides, there have been no buffer overflows in BIND9. (look at the vulnerability matrix on the BIND web site.) the software's written in a way that makes these almost impossible.
next, ICANN is planning to introduce IDN versions of country codes. other top-level domains might come later. the first IDN domains will be for the likes of china and korea. these will probably map everything under .cn and .kr respectively to their xn--* IDN equivalents.
And I thought my new 100,002 key keyboard would never catch on!
Will be quite hard to me to type an address like www.arussiansite.рф
One may say that's not so much people reading sites in foreign languages, but we have good translators nowadays.
Anyone remember the http://www.pаypal.com/ incident? (watch out, looks like pay pal, but it's not!)
That's why I think it's not a good idea...
the reason this is just the TLD changing is because that's all ICANN are responsible for - the TLD operators can define their own policies on allowing it for second level domains (and several countries have already done so - either through choice of lack of action, others have explicitly blocked them though) and of course below that subdomains you can do yourself in your own DNS server and it will just work (with compatible clients of course, like all such domains)
@the suggestions of things breaking - the DNS infrastructure isn't being changed at all, it's still staying ASCII - non-ASCII domains are encoded in to a subset of ASCII characters (using punycode encoding) which can then be used in hostnames normally - the domain name "ôô.com" typed in to your browser actually causes a DNS lookup to be performed for "xn--ldaa.com" (try it in firefox and it actually shows the ASCII version of the name... although it's an awful spammy landing page) similarly with the new TLDs ôô.ôô.ôô would result in a DNS lookup for "xn--ldaa.xn--ldaa.xn--ldaa"
Upper case and lower case???
Chris 12 writes "What I don't understand is why it's just the TLD that's changing, why not everything after www.?"
Don't you know how the interweb works? It's up to each TLD operator to decide how and when to introduce IDN. This is harder than it first appears. There are all sorts of technical, policy and legal issues to work out. eg Does the IDN version of fuckwit.cn in Chinese go to the current holder of that name in its ascii form? If yes, do they pay for that second registration? Does the ascii version of a name automatically block registration of its IDN equivalents? What if the punycode for some IDN string is already registered? How are trademarks going to be handled? Would Coca-Cola be able to register the Japanese IDN for Pepsi?
Once IDN is in the root, the TLDs who become internationalised (like .cn) will be better placed to introduce IDN registrations inside their TLD.
Paris icon 'cos she's good at communicating in many tongues.
Actually, I tried that a little while ago. At least for the Ṻᶇⓘḉȱᵭḝ ɼₑᵹᶖǭⁿꜱ I tried (and every character in those two words comes from a different one), all the .com extensions appear to map to .com in ASCII. So at least *something* is converting isomorphs to ASCII.
Whether this is a feature of my Linux system or something in DNS I don't know; I was unable to try it on Windows because Windows has terrible Unicode support, particularly for anything above U+FFFF.
As long as tinyurl can handle them, what's the problem?
My how politically correct. Hope it doesn't break anything important.
now foreign people can have their own domain names and I don't like it. waaaa, waaaa.
god some people will bitch about anything.
This is a great thing.
Since I don't know anyone who'd be contacting my domain(s) using non Latin characters, it makes it easier to block.!
From an international perspective, the domains will stick to the Latin characterset.
From an 'in country' where the language isn't based on the Latin-1 char set, it makes for easier adoption of the internet.
So its really a good thing. Maybe then they'll understand the spam and phishing problems.
Can you see it now? 419ers sending their con's in Russian?
Phishing attacks against Russian and Chinese banks?
Maybe then their governments will crack down on organized crime operating from their countries. ;-)
I demand Jedi web addresses too!
No one controls the net, it is all done on agreement, if they want they can even just change the routing tables at the borders, perhaps some IP number is considered lucky by some culture, then they are free to nab it. Anyone can run a router wherever or not other routers wish to route through is up to them.
If you want to run root DNS servers then grow a pair and just do it, I am sure any country could pony up the cash for the hardware. In fact it would be better if they did, and then you could select which root servers to bother querying.
Actually I am going to this moment, to also wonder if the yanks should not be forced to create another language as their mother tongue, that would also improve the net quite a lot, Tower of Babel and all that.
just block all traffic with non latin characters in adress will fix spam problem
Somebody complains tell them to get a real adress or bugger off
This will most likley happen anyway due to the amount of spam and phishing anyway and nobody will bother using it because everybody will block it.
Then all the l33t speakers can fill the space up with boards for swapping scripts and so on.
in reality i think this is just a pointless PC stunt.
"People who are complaining don't realise that the majority of the planet do not understand latin characters. Not only is 'google.com' meaningless, it is illegible."
Not true. The majority of the planet speaks English if you take into account that English is virtually the universal first choice of second language (and hence latin alphabet). There are more English speakers in India and China than in USA, UK, Canada, Australia (etc) combined. I've seen road signs in Russia and Japan (and wales) with Latinised versions of the local wording and many Russians will write "russian" on a latin keyboard using widely understood latinisations.
To find those for whom English and latin script are totally alien you need to go to remote rural areas, subsistence farmers, groups for whom getting enough to eat is higher priority than getting a localised domain name (or even electricity ).
English is the international language of business, air and sea navigation and the educated members of society.
And IT and the internet. I recently tripped over trying to convert a website to Russian. My text editor handled the script OK but it failed on global (all files) replace of "home" with ГЛАВНАЯ. Then I hit the problem of PHP handling of Cyrillic.
It's time we in UK stopped beating ourselves up about how terrible it is that when a foreign businessman comes here he speaks English but when a Brit businessman goes abroad he needs a translator so we need to improve language teaching in schools. Wrong.
In non-English speaking countries the choice of second language is obvious, that which will be of most use: English.
What's of most use to us? On the basis of global usage as a first language the contenders are Chinese and Spanish. But no, our real language skilll is that we can understand foreigners' slaughtered english - I can understand the Indian call center (usually).
Sometimes I can even understand a few words of Glaswegian (usually of the 4 letter variety).
On the other hand when I've attempted to speak a foreign language the natives seem unable to understand - because I said "le table" which is of course complete nonsense, it should be "la table" (or is it the other way round?) - the French seem unable to let that pass, at best they will correct me (whereas a Brit would rarely correct their "ze table" to "the table").
Language is how we communicate. When I was 8 I was the only one in my school who'd been outside UK, few Brits had much need to know a foreign language. Now I can sit here and communicate instantly around the globe, and for that to be effective we need a common language. Minority Languages are dying out for good reason, keep them on life-support as museum pieces if you like but don't expect anyone else to make allowances for them.
BTW - My proposal is that the EC should adopt an official language and put an end to the "tower of babel", the waste of translating all EC communication into 27(?) languages. I wonder which language they would agree on...
OK, but what about Braille?
Isn't that a representation of latin text in Braille form?