Home

Specifications

Schema

Commentary

Mark Wahl


Web Design by
Kristen Lanum

Commentary by Mark Wahl

Organizing principles for systems:
decentralized l10n (2005/2/3)

The abbreviation l10n means localization, and one of the definitions of localization is "conversion to be suitable for use in a location". A distinction, according to Wikipedia, between software localization and software internationalization, is that localization implies "the addition of specific features for use in a specific locale". Many of the typical tasks of software localization deal with the contents of the user interface, particularly the translation of messages that appear as prompts or dialogs into additional languages. Additional changes might be in the handling of input or import of data to support additional formats. One example of this might be allowing the user to select a different date format, such as Japanese Era or Ethiopian calendar year.

One way of envisioning localization in an Identity Management system is merely software localization of the components of the system with a user interface. This approach would focus on the translation of stock phrases, such as having ldap_err2string() say something appropriate for LDAP_IS_LEAF depending on the representation of a directory structure in the user's culture: the entry is a leaf, the entry is a finger, etc. However, another way of considering the localization of Identity Management is in the schema.

Many of the data objects in a typical identity management system that correspond to real-world entities represent people. The format of each of these objects is governed by by a white pages schema, a data model which describes the types of attributes contained in the object and the types of relations to other objects that are modeled in the system (e.g. a manager attribute).

Some components of an identity management deployment are schema-neutral. For example, a command-line tool which adds a series of entries from an LDIF file to a directory over LDAP does not need to understand the formats of the objects which are being added. Others components, such as provisioning system or a directory browser, need to understand the schema in order to have a useful interface, since the raw values (e.g. the dc=example,dc=com naming system for domains) coming out of the Identity Management service may not be intelligible to the target end user.

Each Identity Management system contains some fundamental assumptions about the real world. If these assumptions are violated, the software might not work properly or might not meet its goals.

For example, some of the fundamental assumptions for representing people in a particular Identity Management system might be:

Other assumptions about the system are governed by the schema, and these assumptions lead to restrictions in the kind of data which can be stored in the system. In the white pages schemas such RFC 2256 or inetOrgPerson, the assumption is made that every person has a surname. As we saw in the origin of LDAP personal naming attributes, this assumption is derived from the X.400 assumption that an electronic mail message sent to an individual will include the surname of the individual in the message address. Further restrictions affecting many current identity management systems are described in the question on multitude of identity formats. Identity data which is to be represented but does not meet the restrictions of the schema must be shoehorned in, leading to poorer data quality and a degraded user experience. As the data is often that which is a source of pride to the individual, such as their name, address, connections to other real-world entities, birthdate, or other attributes, seeing it mangled by a service is a particular frustration, and is especially frustrating when the system expects the user to mangle the data themselves prior to entry (Your name is already taken. Pick another.)

For an Identity Management system to be localized to a particular locale, it must be possible to remove restrictions which are not fundamental to the system that are inappropriate to the locale.

Furthermore, a desirable property of a general-purpose Identity Management system that supports applications developed for a particular locale is that it has an extensible schema which can incorporate additional relations, attributes and object types which are appropriate and useful in that locale. Not all systems will have this property; a special-purpose Identity Management system may be designed to be appropriate to be used only by a predetermined set of applications, and provide other benefits (e.g. simpler installation).

The choice of locales that a commercial software product or service supports are typically governed by economic factors:

This happens to correspond to the ISO, POSIX and IETF structures for names of locales, such as en-US or en_US for English as spoken in US.

In the open source environment, volunteers may contribute localizations of popular projects, often to provide support locales which are not well addressed by commercial products or are of particular interest to the volunteer community, such as Klingon or Egyptian hieroglyphs.

In general, locales like Klingon exist which correspond to artificial environments and not to a predetermined geographical 'location'. The names of these locales do not fit into the naming structure, as they do not have an official (ISO registered) language or country, and usually are forced to register using an escape convention, such as x-, or hide in a infrequently used standard locale code.

Expanding this concept, one can envisage locales which correspond to very small enclaves of use, perhaps one individual or a particular set of individuals. These locales are defined by their participant users rather than by a geography. They do not need to imply a different and private written language, the locale could use English, Klingon, or other languages or combinations of languages. Instead, the locale defines a particular set of operating conventions for software in this locale, that are driven by the requirements of the participants.

For Identity Management, this implies that the schema of the deployment, as well as aspects of the system which depend upon that schema, such as the provisioning user interface, should be determined by the choices made for that locale, which might be arbitrary and change over time. If a community decides to have for example a favoriteDrink attribute, and defines the data management expectations for this attribute (it is a user supplied string that can be displayed to any user of the system), there is no reason to suggest that this attribute will be in conflict with other requirements of an extensible or general purpose identity system, so it should be possible to implement.

As the set of potential locales in this expanded concept is quite large, there is no possibility that a single vendor could attempt to hard code or upon demand implement every possible locale. Nor would there be an external volunteer community that would be willing to take up the challenge and implement the locale. Furthermore, attempting to even coordinate this system at a single point would be taxing to that service provider, this would require a system that could scale to the size of (for example) the Yahoo! Groups environment.

For this concept to be reached, therefore, it seems to imply that the localization process is decentralized: the users of a system can modify their system to meet their needs. The approach sounds simple in practice, although going by implementation experience, it is rarely fully realized.

Some of the barriers to decentralized localization of existing Identity Management deployments include: