Commentary by Mark Wahl
Organizing principles for identity systems:
Schema ontologies: some considerations (2006/6/8)
In Multiple ontologies for identity data Johannes Ernst writes:
I will write about the results of the comparison that we did at NetMesh in a bit, but let me just observe that based on what we've found so far, as an industry are fairly far away from a shared ontology for identity data, across the various standards and product development efforts, that actually meets the requirements as we see them.
I missed out on which requirements Mr. Ernst were referring to in his blog post (maybe in an earlier post?). Clearly there are several cases of levels of interpretation that software /could/ implement, including:
- most basic: "blob" transfer, e.g. the values of <baz> such as "Ernst" are in UTF-8
- string syntax handling, e.g. values of <baz> such as "Ernst" are alphanumeric characters and are case and whitespace insensitive
- type conversions with or without value manipulation: e.g. "<baz>" values can be used as an LDAP "sn" attribute
- semantic understanding, e.g. <baz> is an attribute of an individual person that corresponds to the "surname" concept
(see Schema Ontologies for more information).
We looked at this problem in the late 1990s in the SP-DNA working group to try to figure out ways to make LDAP applications be independent of a given directory schema. Some deployments might use "cn" where others might use "uid", some might have people all in a single level of the directory tree, others might split up employees into an organizational chart. The approach we took for this is to have a layer of indirection through UML models that represented the concepts of the problem space (e.g. "organization" CONTAINS "employee"; "employee" SUBCLASS-OF "person" etc), and each deployment would have a configuration descriptor that would show the mapping from the common model to the schema used in their deployment.
Last year I revisited this issue in 2005/6/17 part 1, 2005/6/17 part 2 and 2005/7/14 to see what could be done to
- (1) support mapping of schema between different identity services (SAML, LDAP, InfoCard, etc all have their own idea of attributes for "personal" information)
- (2) identify a minimal interoperable subset that is common across identity systems
- (3) identify extensibility guidelines that would minimize the risk of lock-in to an inflexible schema
briefly,
- (1) - (tracked separately)
- (2) - two problems: first there is not yet a generally agreed-upon ontology framework amongst ontology researchers (everything from CyC to the ISO to dozens of industry bodies have their own upper-level ontologies and languages, no de-facto standard), second the minimal subset across existing identity systems is SO minimal that it is not really worth standardizing since too many systems have assumptions that limit their interoperability in this regard (e.g. basic issues such as restrictions on the form of a name or the uniqueness constraints (across all time, across multiple systems) of a "unique identifier" attribute, let alone questions such as whether they have a concept such as of "individual person" or not)
- (3) - IMHO the most important and currently overlooked issue - a deployment
which chooses to add attributes or subclass an object in most identity
systems is in a very difficult situation today - changing their
deployment to actually support a new attribute ranges from difficult
to impossible (hard coded) schema. See for instance
post of 2005/2/3
and history of X.500 schema
For a practical example, consider the Wikipedia entry on Patronymic
"A Russian will almost never formally address a person named Mikhail as just 'Mikhail', but rather as 'Mikhail' plus his patronymic (for instance, 'Mikhail Nikolayevich' or 'Mikhail Sergeyevich' etc). However, on informal occasions when a person is using the diminutive of a name, such as Misha for Mikhail, the patronymic is hardly ever used. "
Presumabily deployments in Icelandic or Russian cultures would like to have "Patronymic" be an attribute distinct from given name or surname. Merely adding this to a directory or database server's schema is insufficient however, as this action has no impact on provisioning, address book or other software where the semantics (not just the syntax) of the attribute are important.