Home

Specifications

Schema

Commentary

Mark Wahl


Web Design by
Kristen Lanum

Commentary by Mark Wahl

Organizing principles for systems:
Some naming attribute criteria (2005/2/4)

Each universal (not limited to one specific locale or situation) identity system that attempts to include a representation of individual people, and will permit users to search to for people in that system, has to make choices for what model to use to represent those people. Typically these models need to incorporate a person's name or names, as that is what most often is used to refer to a person and used to search for them.

Some of the evaluation criteria discussed earlier regarding representation of names are:

1. Does the data format's syntax allow an individual's name and other attributes to be expressed correctly?

If the name can't even be written correctly (it is truncated, misformatted or characters cannot be encoded), then the resulting name may not match properly, and will not display property.

2. If names are encoded in a structured format, is the structure based on the locale of the individual?

For example, some systems may give a choice between two options

and individuals whose locales do not use surnames as a typical naming attribute will not be able to have their name displayed properly.

3. If multiple structures are possible, are the choices of encoding, layout or ordering of the name under the control of the individual?

Name formatting is in some locales an individual choice that signifies the attitude of the the individual regarding different cultures and their name formatting conventions. Enforcing a single encoding may misrepresent the individual's opinion, and may make the user not be locatable in the system. Similarly, an individual who has changed their name might expect their new name to be "primary" and their older names become only a cross-reference or be deleted.

4. Does the system allow for a the individual to specify different names and other attribute sthat are be used in different contexts or situations?

In many cultures, people will use different names (formal names, nicknames, familial names) in different situations. They may also be multicultural, and express their identity differently to each culture. A system that asks a user to supply "their name" in one context may cause that user embarrasment if that name is used in other contexts, or may simply not be a useful search identifier.

Some of the approaches to solving this problem have included:

A. restrict it to system-defined, controlled or opaque identifiers, such as telephoneNumber, userid or publicKey.
   Advantage: searching is efficient to implement and data may be verifiable against other systems
   Disadvantage: not particularly friendly

This system sidesteps criteria #1-#3 but criteria #4 may apply

B. the person's entry should be a freeform blob of text, perform text matching. (the Google-it approach)
   Advantage: text searching is easy to implement, no need to worry about schemas or extensibility
   Disadvantage: May be difficult for a user to distinguish someone talking about Bob from someone who is Bob.
   Disadvantage: Difficult to manage information that is structured (see: any article from 1990s mentioning "semantic web").

While they may be able to meet critiera #1, and #2 and #3 are not appropriate, systems based on this approach tend not to be able to implement criteria #4.

C. allow each person to choose their own data format (the XML as pixie dust approach)
   Advantage: still highly extensible and allows for better structure management than just plain text
   Disadvantage: if you don't know who you're looking for, you won't express the query in the right format (see: UDDI)

If the identity system doesn't understand the data format, it won't be able to meet criteria #4.

D. the traditional entity-relationship models, with a small set of predefined schemas
   Advantage: people who are logically 'nearby' are likely to share similar schema
   Disadvantage: pick the wrong schema and you may not find who you're looking for,
   although attributes which are well known show up in most of the schemas.

In theory, systems that have been implemented from this approach could meet all four criteria, although in practice they usually run up against limitations in #2, #3 and #4.

For example, most LDAP directory servers will support UTF-8, which encodes the names of people in many cultures. (There are always exceptions; some cultures were left out of early Unicode versions as there was not yet consensus of how to encode their character set, and there are also cases such as the infamous Artist formerly known as). The common schemas, such as person, inetOrgPerson, or User, however do not meet criteria #2. Layout and ordering are often hardcoded in the client and server, so criteria #3 is not met. Finally, while criteria #4 could be met using access control techniques, I haven't seen this widely implemented.