Edge
Shows the real address in Edge on W10.
Click this link (don't fret, nothing malicious). Chances are your browser displays "apple.com" in the address bar. What about this one? Goes to "epic.com," right? Wrong. They are in fact carefully crafted but entirely legitimate domains in non-English languages that are designed to look exactly the same as common English words …
Looking at them on this Linux Mint box, my RSS reader shows them as https://xn--80ak6aa92e.com and https://www.xn--e1awd7f.com/ respectively.
My browser (Firefox on this box) shows the first one as the article describes, except for one significant difference: The 'l' looks like a capital I - presumably a side effect of the font in use here, but the important point is that for me it stands out a mile.
The second one, however, does just look like epic.com
This post has been deleted by its author
That decision was criminal in its stupidity. Example: НSВС.com - that is Russian N, S Russian V, Russian S, .com.
You can create a mixed encoding homophone for nearly anything and it will be virtually indistinguishable from the real thing. Now throw in a certificate and voila - phishing, here it comes.
This isn't a fix, it is a work around. You fix the problem that you are not mislead by malicious IDNs, but you have a new problem that you cannot see any IDNs.
It's like someone complaining that their editor doesn't work in Arabic, and being told that the fix is to write in English.
For most English speakers, not seeing IDNs is likely not much of an issue.
Maybe a compromise would be that with punycode 'true' it shows the punycode domain name in the address bar to avoid (English speaking) people getting fooled, but the shows proper name when you hover over it if you were i.e. visiting a Russian site.
Obviously another solution will need to be found for them, but English speakers are likely to be the target of the vast majority of hijacking attempts that use punycode domains masquerading as real ones.
Is a solution a bad one if it only fixes the majority of the problem, rather than 100% of it?
Obviously another solution will need to be found for them, but English speakers are likely to be the target of the vast majority of hijacking attempts that use punycode domains masquerading as real ones.
No, you are only thinking of the problems that an anglophone will encounter from homographic IDN attacks, it is still a form of colonialism.
You haven't considered that due to our earlier anglophone-only internet, most of those non english speakers will actually be using a lot of domains that have english domain names, for instance paypal, google, mpay and so on. A work around that "works" for anglophones, but still allows the remaining 84% of the world to be pwned is not a valid solution.
For instance, a user in India almost certainly would want punycode on for local websites, but they still won't want to go to xn--mesa-g6d.in thinking it is mpesa.in.
If you can't apply the workaround, you'll need to check certificate for sites you really care (let's encrypt cert is a red herring). It sucks that not only urlbar gets spoofed but also noscript sees no harm so drive by is that much more likely to happen (if you apply permanent exceptions to domains you trust).
But this is still a perfectly valid and complete "fix" for that person if that person only actually wants/needs to write in English.
Of course, the "fix" for the person who never needs to visit IDN domains is an "it's broken" for someone who does. Isn't it ?
Which is the real problem, no ?
But your text editor analogy falls somewhat short. A text editor that does not support Arabic cannot be used to send a document to someone that looks like English but is in fact Arabic.
"I sent the infidel the instructions for assembling a bomb, and they thought it was a shopping list because I used Arabic that made it look like a list of English words for grocery goods. How surprised will they be when they go out to buy milk and eggs and instead blow up the supermarket ?!"
:)
Sorry, this might seem a little simple, but, as we know what characters looks like what in other languages, when some applies to have a domain like raural.com that become paypal.com is to simply flag it as unavailable, just like if someone owns the domain already - surely it wouldn't take much longer for a script checking to see if the domain you wish to buy permeates the unicode and checks all possibilities before returning the results with a big fat "computer says no" when you're trying to spoof a domain.
Yeah, a few people may end up not being able to get the domain they wish, but let's face it, most people buying a domain face that problem these days anyway as someone's beaten them to all the good names anyway.
Or am I over simplifying things? I could quite easily be, I'm rather the idiot..
you are. under your proposal, a hypothetical corporation peddling nuclear reactor fuel (mox.com) should be able to lock out an equally hypothetical innocent grop of russian lichen-fanciers (мох.ru). The existence of a company website opal.com should not stop a hypothetical local nightclub in the middle of siberia from calling itself ора1.ru, after a little local river. ideally, these hypothetical russian entities should also be able to register their names in the .com or .org namespaces - saying otherwise would strongly imply that some animals are more equal than others.
most IDN are used entirely innocently, and are a great help in online those of us who do not speak english, or at least another laguage based on the latin alphabet, fluently. making them second-class does not help anybody.
Currently mox.com and mox.ru can both exist, even if owned by different entities. That's the whole point of having different namespaces. Given that, мох.ru should be allowed whether or not mox.com exists, so long as mox.ru doesn't exist.
If both spellings want the same namespace, as in мох.com and mox.com, then it should be handled as if the spellings were the same. First-come, first-served, or whatever the rule is. That isn't making IDN second class. It is treating them the same as everything else.
This post has been deleted by its author
With the launch of IDN equivalent TLD's for CNO along with the newGTLD's, ICANN had an ideal opportunity to fix this problem for good. Instead they made it worse.
What should have happened: Complete banning of mixing scripts between levels. All IDN's in CNO should have been moved over to their equivalent IDN newGTLD (eg cyrillic .com's should have been grandfathered over to .ком, etc,) and the system returned to only ASCII registrations allowed in the plain old ASCII CNO TLD's.
Instead, ICANN sat on it's hands and even let mixed scripts proliferate into the ASCII new GTLD's! So now you can register chinese scripts in .xyz. How useful.
SSAC were asleep at the wheel.
But don't get me started.
In fact it's become some a huge mess that Verisign, having successfully applied for 12 transliterations of .com and .net, have only launched two of them - .コム for Japan and .닷컴/.닷넷 for Korea - and that was over a year ago. They have abandoned launching the rest. That would make for an interesting article in itself- why would a powerhouse like Verisign not be able to handle launching the lot of them at the same time, given they're for completely different markets?
"different but look almost identical"
A letter is just a symbol with a certain shape - if two letters look identical, they are identical. It doesn't matter if different languages use that shape in different ways to represent different sounds, the only thing a computer needs to do is display the shape when told to do so; there's absolutely no reason to come up with multiple codes to represent the same shape just because that shape is used in different alphabets.
And before objections that the letters aren't quite identical and the minor differences justify the different codes, that sort of minor change is a function of font. The difference between a Times New Roman "P", a Comic Sans "P" and a Wingdings "P" is far greater than the difference between an English and a Russian "P". If you want Cyrillic-looking letters you choose a Cyrillic font, if you want Latin letters you choose a Latin font. Defining multiple codes for effectively identical letters really doesn't help matters.
So true - the issue is how the user responds to the symbol displayed and not what the computers are using internally to represent it.
There is a necessary trade-off between making things easy or friendly for the less IT-literate (i.e. most non-IT) people, and giving those same people a risk-proportionate way of avoiding ne'er do wells. The risk is browser makers/writers putting in things like that Firefox IDN punycode default to simultaneously shield users from the details while opening an avenue for said users to be misdirected by the ne'er do wells.
A typical UK or US English user is unlikely to need a URL to include Cyrillic or other variants of their normal symbols appearing in URLs. Same for typical French or Arabic or other users - that should apply en masse per locale/region and doesn't seem to be a particularly insurmountable technical problem.
I'll tell you what. Poke both your eyes out so you're dependent upon a screen reader. Then see if it makes any difference which alphabet is used.
Hint: symbols that look the same may represent different phonemes in other languages.
If you still can't figure it out, well you're now blind so you won't be posting any more stupid comments. At least not until you've got the hang of that screen reader.
A letter is just a symbol with a certain shape - if two letters look identical, they are identical
OK, then answer this simple question: does "C" come before "P"? What about "С" and "Р"?
(a hint: The first pair is in a Latin script; the second is in a Cyrillic script).
Lexicographical sorting is pretty fundamental for many uses of computers; having a common namespace for most of the world's living languages (e.g. unicode) makes it much more manageable. An alternative would be to attach a code page to every snippet of text. This is possible, and has been done before - but if you ever had to deal with a code page-based representation of a multilingual text, you will likely find the unicode solution much more pleasant to deal with.
If I have my language set to English, I should not display domains using Cyrillic characters. If I have my language set to Russian, I should.
What if I have several windows (or even tabs) in several languages? For example, I have no trouble reading English, German, French, and Russian - and I would frequently have tabs open in at least two of those at the same time. Why should I not be allowed to conveniently use the languages I do understand?
If you routinely browse in multiple languages, then you're sufficiently unusual that it's not unreasonable to expect you to be the one who has to do something different.
Like, maintain a separate browser window for each language. To me that doesn't sound too big an imposition. Note that you could still read Russian pages in your English-language browser window, except for the Cyrillic URLs. If you want to read those, you'd have to switch the native language in your current session.
"If you routinely browse in multiple languages, then you're sufficiently unusual that it's not unreasonable to expect you to be the one who has to do something different.
Like, maintain a separate browser window for each language. To me that doesn't sound too big an imposition."
That's only because you haven't tried it in real life.
I just got an e-mail with an update to an order i placed before Easter. The e-mail is in Danish. When I click the link, should I choose the English or the Danish browser? (Turns out, the website is in English, despite the Danish e-mail).
http://www.logitech.com - better use my English browser. Turns out, Logitech detects my location and redirects me to http://www.logitech.com/da-dk, so now I have to switch to my Danish browser?
Visiting Lenovo's Support site, I end up on http://support.lenovo.com/dk/en (DK for Denmark but EN for English, since the language on the support site is in English. So English or Danish browser?
And yes, we have domains with special Danish letters: Æ, Ø og Å.
If you routinely browse in multiple languages, then you're sufficiently unusual ...
Unusual? I don't think so.
If anything, I speak fewer languages than nearly every person around me; I am handicapped by growing up in a country large enough and chauvinistic enough to insist on not giving its children a meaningful second- and third- language education (or even casual foreign-language exposure) until you reach the university. As the result, I had to pick up my foreign languages as an adult - which unfortunately means that I will never entirely get rid of my accent, even when I am otherwise as proficient as a native speaker would be.
In almost every European country, it is completely normal and in fact expected that a moderately well-educated person will speak multiple languages. It is a uni-lingual person who is an aberration.
This post has been deleted by its author
'The limitations of this approach became apparent very soon after people in other countries started using the domain name system and there was no way to represent their language'
The Internet should be in (British)English only, it will do Johnny Foreigner no end of good to learn a proper language.
You're never going to be able to save the unwise from themselves. But there are some things you can do.
Your browser already knows what languages you speak (because you can tell it). So:
If you have a domain that uses glyphs from a language you do not speak, it could appear differently (color, font, or accompanying icon).
If you have a domain that uses glyphs from a mixture of different languages, it could appear differently (different color again, different font again, another accompanying icon).
In neither case do you actually need to break punycode.
You mean "scripts" rather than "languages" but, yes, I suspect that this is how the issue will be resolved.
I believe there is some opposition to this on the grounds that several thousand people have legitimate registrations that would be classified as "mixed" under your rules and would therefore be penalised despite doing nothing wrong. Yeah, sad, but sometimes a few shits spoil things for everyone else and I think this is one of those times.
It's interesting that Chrome 57 is broken. I found a note on the Chromium project that Chrome 51+ should display punycode (rather than the IDN characters) if latin is mixed with either cyrillic or greek (or cyrillic and greek are mixed). Apparently this isn't actually happening (never actually implemented? bug?).
It's also rather disappointing that Chrome has no way to turn off the display of non-english characters in URLs (it's not a fix but it would be far safer for me since I very rarely go to any sites with non-english URLs).
Mac OS X 10.6.8 running SeaMonkey 2.38 shows the real address.. https://www.xn--80ak6aa92e.com/ the second link gives the fake epic.com address.
Hmmm. Interesting. This is sort of why I only click links that are in certain email messages - like The Register and a few other sites I am subbed to. None however are tied to anything that deals with putting in any financial info.
Any site I use that is linked to my credit card or banking, is entered manually.
End of story.
The real problem with IDNs is the lack of so-called universal acceptance - the ability to use an IDN seamlessly in any context that a traditional domain name would be used, such as email, web browsers, user account identifiers, URLs in content, certificates etc. The view that IDNs present a security threat risks inhibiting the essential work that is required to fix universal acceptance.
More than half the world's population does not currently have access to the Internet. Many of those people are not familiar with the Latin alphabet, which forms the basis of traditional domain names - as your piece points out. Language is an access issue: English is still the language of more than 50% of web content. A person who cannot speak English is statistically less likely to be online or to own a computer. Research shows that IDNs are more likely to be associated with linguistically diverse content than traditional domain names (http://idnworldreport.eu/ch.... Therefore IDNs have the potential to enhance linguistic diversity in on the online environment.
Homograph attacks described in your article have been understood for at least a decade. However, while uptake of IDNs remains so low (only 2% of the world's domains are IDNs), the threat remains largely theoretical. In contrast, phishing attacks and other security issues relating to traditional domain names are a daily occurrence. Google's 'solution' of displaying Punycode is not a solution at all. It erodes the user experience of IDN and does not eliminate risk. When a user sees Punycode in a browser bar they have no visual clues to tell them that they are where they expected to be, and as a result could easily be side-tracked into a malicious environment.
We all need to be mindful of security in the online environment and many challenges exist. At the same time, the DNS industry should not let the fear of the unknown stand in the way of possible, positive developments. For example, it is disappointing that while Google was able to release its 'fix' within a matter of days, its progress on implementing support for internationalised email addresses is taking years. The technical community needs to work harder at enhancing universal acceptance of IDNs, so that every person can enjoy the benefits of the online environment - no matter what language they read or speak.