Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
Why do we want so much to control others when we won't control ourselves?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Friday, April 21, 2017

The Problem with Full Unicode Domain Names -- apple.com vs. appІe.com

Well, this one is taking longer to boil over than I expected. I've been watching for the storm for over fifteen years, and convoluted fixes on fixes have dodged the bullet this long.

[JMR201704221101: addendum]

I should note that the primary danger comes from clicking links given you by untrusted sources. The best solution here is not to do that. Abstain. Don't click on the links.

Copy them out, look at them in a text editor using a technical font that shows differences between I, 1, and l, and between 0 and O, etc.

Plug the URL into the search field of a web search engine -- Not into the URL bar of your browser, that takes you straight there. Let the search engine tell you what it knows about the site before you go there.

Then type in the domain name part by hand. If you have the URL

the domain name part is
(There's more that can be said, but I don't want to confuse you about controlling domains, so just type the whole domain name.)

If that's too much trouble, maybe you didn't want to go there anyway. But at least click on something the search engine shows you instead of the link in the e-mail.

[JMR201704221101: end addendum]

The problem:

Depending on your default fonts, you may be able to see a difference between the following two domain names:

apple.com vs. appІe.com
It's similar to the problem with
apple.com vs. appIe.com
but with a twist. The first one uses a Cyrillic (as in Russian) character to potentially cause confusion, where the second one keeps the trickiness all in the Latin (as in English) alphabet.

Let's look at both of those again, and I'll try to specify a font where there will be problems. First, we'll try the Ariel font (if it's on your computer):
apple.com vs. appІe.com
(Latin little "l" -- Cyrillic capital "І")
and next the Courier font (if it's on your computer):
apple.com vs. appІe.com
(Latin little "l" -- Cyrillic capital "І")
And we'll look at the Latin-only domain names, first in Arial:
apple.com vs. appIe.com
(Latin little "l" -- Latin capital "I")
and then in Courier:
apple.com vs. appIe.com
(Latin little "l" -- Latin capital "I")

Do you see what's happening?

Someone could grab the domain with the visual spoof and trick you into giving them your Apple login and password and maybe even your credit card number.

When domain names were all lower case Latin, we had fewer problems. In other words,
was properly spelled
and the browser would display it in the latter form.

Well, there was still the problem with
substituting the number "1" for the little "l". But the registrars tended to try to help by refusing to register confusing domain names. And browsers were careful to use fonts that would show the differences in the URL bar.

Some time ago, pretty much all Unicode language scripts became allowed in domain names. This was strongly pushed by China, where they did not want {sarcasm-alert} to have all their loyal subjects surfing the Internet in Latin. That would let everyone see how superior English is, and that would never do.{end sarcasm-alert.}

(I shouldn't be sarcastic. They do need Chinese URLs. Otherwise, there would be too many companies competing for bai.com and ma.com.)

Apparently, non-Latin scripts seem to be even allowed to use capitals. Or, at least, unscrupulous or careless registrars seem to be allowing them in some cases. I'm not sure why.

(Here's the RFC. What am I missing?)

If the Cyrillic visual spoof I am using as an example were coerced to lower case in the URL bar, here's what it would look like in the Ariel font:
apple.com vs. appіe.com
(Latin little "l" -- Cyrillic lower case "і"
That would solve a lot of problems.

If you are worried about this, one thing that can help if you are using Firefox, type
in the URL bar. (That's where URLs like
show up, and you can type them in by hand to go there.)

You'll get a warning that tells you that the Mozilla Foundation is not going to take the blame if you use non-default settings. (They won't anyway, but don't check the box that says you don't want to be warned. And remember that you have done this.)

Use the search bar to search for
and you'll find
Double-click the "false" and it will turn to "true". And then URLs like
will be displayed in the status bar as URLs like
Now, that's ugly, don't you think? Anyway, you won't be mistaking it for
(This is called punycode. Hmm. Actually, the Japanese page on punycode shows what's happening a little better than the English page.)

Then again, you will be wondering what that URL means. So I don't really know if I want to recommend it.

If I were a Mozilla developer involved with this, I would take a clue from what I've done above and do it like this:
www.apple.com (all Latin)
www.appІe.com (Cyrillic "І")
In other words, all the characters in URLs from languages other than the browser's default language would be displayed with colored backgrounds to make them stand out. And I might even add a warning bubble or something that said,
Warning! Mixed language URL contains Cyrillic "І"!
floating over the URL. This approach would mitigate a lot, including
  • Іds.org (Cyrillic)
  • аррӏе.com (Cyrillic)
  • perl.org (zenkaku, or full-width)
and so forth.

(I thought this was in the RFCs, but I'm not seeing it. Maybe I'm remembering my own thoughts on how to mitigate this particular semantic attack.)

I have advocated improving Unicode by reconstructing the encoding and including an international character set where such visual doublings could be eliminated. And separating Chinese and Japanese language encodings, and the three different Chinese encodings from each other, as well.

Nobody seems to like the idea.

It's a lot of work.

I'd be willing to do it relatively cheap! (Relatively.)

No comments:

Post a Comment