Commentary by Mark Wahl, CISA
Organizing principles for identity systems:
Issues with internationalizing domain names (20070729)
In the OpenID authentication 1.1 protocol, an end user provides their identifier URL (in either the http or https scheme) to a relying party web site they are visiting, by typing their identifier into a field the relying party site's web form. The OpenID authentication 2.0 protocol is similar, but currently also allows the end user's identifier to be an XRI.
A HTTP or HTTPS URL is typically expressed with the components
| the scheme name | http or https |
| a host id | example.com |
| an optional port number | :8080 |
| an optional path | /x.cgi |
| an optional query | ?foo=bar |
| an optional fragment | #section2 |
Currently the two most common representation choices of OpenID URLs, for a user with a userid at a identity provider organization, are
- the host id is a domain name that holds a domain component for the userid followed by the domain name for an identity provider organization; the path, query and fragment components are absent;
http://joebloggs.example.com
- the host id contains the domain name for an identity provider organization; the path contains a userid at that identity provider; the query and fragment components are absent;
http://openid.example.com/joebloggs
There are some significant differences between representing a userid as a domain name component and in a path, including
| domain name component | HTTP URI path |
|---|---|
| case-insensitive | case-sensitive |
| length limited to 255 characters (by RFC 1123) | length not limited by HTTP |
|
either an ASCII alphanumeric string [a-z0-9-]
(RFC 1034 section 3.5), or an international domain name component that is UTF-8 encoded and with its octets percent-encoded. |
must begin with a /; the strings /./ and /../ have special significance; some characters must be percent-encoded. |
The proposed standard definition of international domain names in IDNA
(RFC 3490) defines an
internationalized domain name, and the components can be
internationalized labels, which contain encoded Unicode characters
from outside of the ASCII range. Out of the Unicode 3.2 charset only a few characters cannot be used, e.g. Unicode "dots" U+3002 (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth ideographic full stop)
, space characters, control characters, private use characters, non-character code points, surrogate characters, characters inappropriate for plain text or canonical represntation, display property characters and language tag characters. IDNA performs a conversion on non-ASCII characters using the "nameprep" (RFC 3491) profile of "stringprep" (RFC 3454), to map upper case characters to lower case.
The internet draft "Proposed Issues and Changes for IDNA - An overview" by John Klensin of July 2007 discusses some of the issues that have been found with the model.
One observation in that draft is that
"Historically, many, perhaps most, of the 'names' in the DNS have just been mnemonics to identify some particular concept, object, or organization. They are typically derived from, or rooted in, some language because most people think in language-based ways. But, because they are mnemonics, they need not obey the orthographic conventions of any language: it is not a requirement that it be possible for them to be 'words'."
Another consideration is display order of the components of a domain name (left-to-right vs right-to-left), which may be different from the order in which the components are transmitted.
"Questions remain about protocol constraints implying that the overall direction of these strings will always be left-to-right (or right-to- left) for an IRI or email address, or if they even should conform to such rules. These questions also have several possible answers. Should a domain name abc.def, in which both labels are represented in scripts that are written right-to-left, be displayed as fed.cba or cba.fed? An IRI for clear text web access would, in network order, begin with 'http://' and the characters will appear as 'http://abc.def' -- but what does this suggest about the display order? When entering a URI to many browsers, it may be possible to provide only the domain name and leave the 'http://' to be filled in by default, assuming no tail (an approach that does not work for other protocols). The natural display order for the typed domain name on a right-to-left system is fed.cba. Does this change if a protocol identifier, tail, and the corresponding delimiters are specified?"
An important issue with OpenID in web browser interactions as it relates to international domain names is that the user does not type in their OpenID identifier URL in the 'address bar' of the web browser, where URLs are typically typed in, but instead they enter their URL in a form field of a web page.
- Any assistance the user's web browser may provide for typing in an international domain name in the address bar doesn't apply to form fields.
- There is no HTML 4 form INPUT attribute to indicate to the web browser that the value of an attribute should be a URI or URL. Thus, the web browser cannot provide any assistance in entering an international domain name properly. The OpenID 2.0 authentication document states that
The form field's "name" attribute SHOULD have the value "openid_identifier"
, but this is not generalizable: other services which use URIs in form fields cannot re-use openid_identifier without confusing OpenID-aware applications. - Some elements of international domain name processing are subject to 'local policy'. In this context, however, the 'local' is the software running the relying party, not the end user's web browser. As the user has not yet logged in to the relying party, the relying party doesn't know the locale of the user to be able to perform locale-specific aspects of domain name processing on the user's supplied OpenID in accordance with the user's locale.
Addressing these limitations would require changes such as
<form>
<label for="openid_identifier">OpenID</label>
<input type="uri" name="openid_identifier" title="OpenID">
<label for="openid_locale">Your language</label>
<select name="openid_locale" title="Your Language" size="1">
<option value="en" label="English" />
<option value="fr" label="French" />
...
</select>
</form>