| TOC |
|
This document defines a mechanism for transferring language tags associated with UTF-8 string values in OpenID protocols, in particular representing languages of attribute values in the OpenID Attribute Exchange protocol.
1.
Introduction
2.
Language Tag
3.
OpenID AX Attribute Value Transfer Encoding
3.1.
Example
4.
Use with other OpenID specifications
5.
Sample Implementation
6.
Security Considerations
7.
References
7.1.
Normative References
7.2.
Informative References
Appendix A.
Copyright
§
Author's Address
| TOC |
It is often desirable to be able to indicate the (human) language associated with protocol elements exchanged in an identity system. Language tags are especially useful when they can be associated with specific values that are part of a set in order to provide the receiver with a choice of values depending on the language of the user; in particular, language tags can be associated with values of an attribute. For example, LDAP implementations use the mechanism described in RFC 3866 (Zeilenga, K., “Language Tags and Ranges in the Lightweight Directory Access Protocol (LDAP),” .) [RFC3866] to transfer language tags with values in the LDAP protocol. Protocols based on XML can encode language tags using the xml:lang encoding, e.g.
<attribute name="SecretQuestion"> <value xml:lang="en-GB">What colour is your hair?</value> <value xml:lang="en-US">What color is your hair?</value> </attribute>
As OpenID is neither an XML-based protocol nor uses LDAP attribute types, a new mechanism is needed to associate language tags with values in OpenID protocols.
This document defines a mechanism by which a party in an identity system using the OpenID protocols can associate a language tag with a string. The input to the mechanism is a language tag and a string value. The output from the mechanism is a UTF-8 (Yergeau, F., “UTF-8, a transformation format of ISO 10646,” November 2003.) [RFC3629] encoding of a combination of the language tag and the value.
The initial use for this mechanism is in associating language tags with string-valued attribute values in the OpenID Attribute Exchange protocol (Hardt, D. and J. Bufu, “OpenID Attribute Exchange 1.0 - Draft 05,” April 2007.) [OpenID.attribute‑1.0].
This document does not specify mechanisms for
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.) [RFC2119].
| TOC |
A language tag is a string of characters which represent the name of a language.
Language tags are described in RFC 3066 (BCP 47) (Alvestrand, H., “Tags for the Identification of Languages,” .) [RFC3066]. Language tags are written using the characters LATIN SMALL LETTER a-z, DIGIT 0-9, and HYPHEN, e.g. fr, en-us or i-klingon, and are two or more characters long.
The encoding of a language tag uses Plane 14 characters as defined in RFC 2482 (Whistler, K. and G. Adams, “Language Tagging in Unicode Plain Text,” .) [RFC2482]. The characters described in RFC 2482 were subsequently integrated into Unicode and ISO 10646. The language tag is encoded as a string of two or more Plane 14 characters taken from the set {U+E002D,U+E0030,U+E0031,...,U+E0039,U+E0061,U+E0062,...,U+E007A}. There is one Plane 14 character for each DIGIT character and one for each LATIN SMALL LETTER character. The number of a Plane 14 character is 0xE0000 plus the number of the HYPHEN character, the DIGIT character or the LATIN SMALL LETTER character.
The Unicode plane 14 characters used in this encoding are:
U+E0001 LANGUAGE TAG U+E002D TAG HYPHEN-MINUS U+E0030 TAG DIGIT ZERO ... U+E0039 TAG DIGIT NINE U+E0061 TAG LATIN SMALL LETTER A ... U+E007A TAG LATIN SMALL LETTER Z U+E007F CANCEL TAG
For example, a language tag of "EN" would be converted to lower case latin characters ("en") and then 0xE0000 added to each character, to form the two characters
U+E0065 # TAG LATIN SMALL LETTER E U+E006E # TAG LATIN SMALL LETTER N
The bytes of the UTF-8 encoding of this two character long language tag are
F3 A0 81 A5 # TAG LATIN SMALL LETTER E F3 A0 81 AE # TAG LATIN SMALL LETTER N
| TOC |
It is assumed that the many values of attributes transferred using OpenID AX will not have language tags, but that a few attributes will have values with language tags. It is assumed that the syntax of these values are strings in the Unicode/ISO 10646 character set.
A value without a language tag is transferred as described in [OpenID.attribute‑1.0] (Hardt, D. and J. Bufu, “OpenID Attribute Exchange 1.0 - Draft 05,” April 2007.).
A combination of an attribute value with a language tag is transferred in OpenID AX with the value parameter consisting of the concatenation of the following bytes:
A CANCEL TAG character is explicitly included after the value in order to avoid problems with implementations that are not language tag aware from inadvertantly concatenating language-tagged values with other strings that are not in that language.
Implementations of the OpenID Attribute Exchange protocol which accept Store requests MUST allow the values being stored to have associated language tags, when permitted by the attribute definition.
| TOC |
Suppose a user had an attribute representing their "film préféré" (favorite movie), which has four values:
- (no language)
- 2001
- (French)
- Amélie
- (English)
- Delicatessen
- (no language)
- M
A fetch would return the following bytes for this attribute:
openid.ax.type.fav_movie=http://example.fr/schema#FilmPr??f??r?? openid.ax.count.fav_movie=4 openid.ax.value.fav_movie.1=2001 openid.ax.value.fav_movie.2=????????????Am??lie???? openid.ax.value.fav_movie.3=????????????Delicatessen???? openid.ax.value.fav_movie.4=M
(In the preceeding figure, a question mark indicates a byte for which there is not a printing ASCII character.)
The value of openid.ax.value.fav_movie.2 ("Amélie" in French), the UTF-8 encoding of ten Unicode characters, is the following 23 bytes:
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a6 # TAG LATIN SMALL LETTER F
f3 a0 81 b2 # TAG LATIN SMALL LETTER R
41 # A
6d # m
c3 a1 # eacute
6c # l
69 # i
65 # e
f3 a0 81 bf # CANCEL TAG
The value of openid.ax.value.fav_movie.3 ("Delicatessen" in English) is the following bytes:
f3 a0 80 81 # LANGUAGE TAG
f3 a0 81 a5 # TAG LATIN SMALL LETTER E
f3 a0 81 ae # TAG LATIN SMALL LETTER N
44 65 6c 69 63 61 74 65 73 73 65 6e
f3 a0 81 bf # CANCEL TAG
| TOC |
This encoding mechanism is currently not defined to operate with values transferred in the OpenID Simple Registration Extension.
| TOC |
The following Java function attaches a language tag to a value. It is implemented using UTF-16 surrogate characters.
public class LanguageTag {
public static final char SUPP_CHAR_0 = 0xDB40;
public static final char SUPP_CHAR_1 = 0xDC00;
public static final char TAG_MAX = 0x7F;
public static final char TAG_LANG = 0x01;
public static final char TAG_CANCEL = 0x7F;
/**
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static byte[] getBytes(String tagInAscii,String rest)
throws UnsupportedEncodingException {
String s = add(tagInAscii,rest);
return s.getBytes("UTF-8");
}
/**
* returns a new String consisting of the language tag wrapping the string rest.
* @param tagInAscii the ASCII letters of the language tag, e.g. "fr"
* @param rest the value being tagged
*/
public static String add(String tagInAscii,String rest) {
if (tagInAscii == null || tagInAscii.length() == 0) return rest;
String tl =tagInAscii.toLowerCase();
StringBuffer sb = new StringBuffer();
addLeadingTag(sb,tagInAscii);
sb.append(rest);
addTrailingTag(sb);
return sb.toString();
}
private static void addLeadingTag(StringBuffer sb,String tagInAscii) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_LANG;
sb.append(c0);
sb.append(c1);
int tl = tagInAscii.length();
for (int i = 0; i < tl;i++) {
char cx = tagInAscii.charAt(i);
if (cx <= 0x20 || cx >= 0x7F) continue;
c1 = (char)(SUPP_CHAR_1 + cx);
sb.append(c0);
sb.append(c1);
}
}
private static void addTrailingTag(StringBuffer sb) {
char c0 = SUPP_CHAR_0;
char c1 = SUPP_CHAR_1 + TAG_CANCEL;
sb.append(c0);
sb.append(c1);
}
}
| TOC |
The language tag representation mechanism used in this document is not known to raise any additional security concerns beyond that discussed in RFC 3066.
| TOC |
| TOC |
| [OpenID.attribute-1.0] | Hardt, D. and J. Bufu, “OpenID Attribute Exchange 1.0 - Draft 05,” April 2007. |
| [RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
| [RFC2482] | Whistler, K. and G. Adams, “Language Tagging in Unicode Plain Text,” RFC 2482. |
| [RFC3066] | Alvestrand, H., “Tags for the Identification of Languages,” RFC 3066, BCP 47. |
| [RFC3629] | Yergeau, F., “UTF-8, a transformation format of ISO 10646,” STD 63, RFC 3629, November 2003. |
| TOC |
| [RFC3866] | Zeilenga, K., “Language Tags and Ranges in the Lightweight Directory Access Protocol (LDAP),” RFC 3866. |
| TOC |
Copyright (C) Informed Control Inc. (2007). This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
| TOC |
| Mark Wahl | |
| Informed Control Inc. | |
| PO Box 90626 | |
| Austin, TX 78709 | |
| US | |
| Email: | mark.wahl@informed-control.com |