Home

Specifications

Schema

Commentary

Mark Wahl


Web Design by
Kristen Lanum

Commentary by Mark Wahl

Organizing principles for systems:
Reverse engineering of schema (2005/6/17)

Oftentimes in developing and deploying Identity Management systems one is faced with a vast collection of existing schema. Today the IANA LDAP parameters registry has for instance over 500 schema elements, from just a few RFCs and Internet drafts: most specifications, both standards-track and proprietary, are not included in this registry. Furthermore many large organizations have developed their own schema extensions, sometimes with several dozen up to a hundred attributes.

There are numerous valid means of taxonomy for schema that can help in managing this growth. In this post I'll begin the discussion of one of these, based on ontologies (systems of knowledge representation).

There are three ways I can see to approach defining an ontology for the schema in a directory service such as LDAP:

One is to define an ontology representing LDAP's major protocol and data model concepts as classes (e.g. Ldap-StructuralObjectClass), with specific schema elements (e.g. person) as an instance of that class. This is certainly straightforward enough and can be done mechanically based on a reading of just RFC 2251 (the protocol) and RFC 2252 (the subschema and syntax representation), but doesn't seem to give enough of a benefit to being able to describe and manipulate these instances beyond what LDAP provides.

A second approach is also bottom-up: define an ontology with classes representing each of the object classes in LDAP (e.g. Ldap-OC-Person, Ldap-Oc-OrganizationalPerson, etc.) Again, this allows a nearly mechanical transformation of a discovered schema. But, as we saw in my previous post, the attribute names imply some relations (homePostalAddress and homeTelephoneNumber, manager and secretary) that would not be captured by this kind of ontology.

A third approach is to head in the other direction: assume there are real-world objects with properties that are being represented, through a schema darkly, as its entries and attribute values in the directory.

through a schema darkly

Naturally this approach is less amenable to automatic processing of schemas: the semantics are hidden in terse abbreviations, acronyms, and documents that are outside of the online schema itself.

However, the benefit of this approach is that if done right can enable the subsequent definition of schemas to build on the formerly implicit definitions that emerge during the analysis of existing schemas.

So in the next post I'll present some examples of schema analysis using this third approach, to prepare for the next set of questions of: