Commentary by Mark Wahl
Organizing principles for identity systems:
Client implications of Kim's fifth law (2005/2/1)
Jamie Lewis wrote:
Speaking of the fifth law, like Scott Lemon, I was a bit surprised when there wasn't more of a collective 'hallelujah' in response to its posting.
From a purely technology perspective, processing systems that follow this
law are feasable, but what worries me with #5 is both (1) the human
interface side and (2) the automated side when implementing an interface
onto a unifying system that's 'enabling interworking'. IMHO, this problem
is really only one level of indirection away from the some of the unsolved
difficulties that we faced with public key infrastructure, and there've been
several attempts since in nearby problem spaces that were unsuccessful (i.e.
not-universal) due to platform limitations. As there are no systems today that
are widely deployed and meet Kim's rules, I'm hoping that future systems that
are built according to the rules will also find ways to avoid platform-
specific limitations and the unsolved problems that dogged existing systems.
From a specification perspective, it's necessary to ensure that there's enough
guidance in what makes a system universal and usable to ensure that an identity
system that merely has code to make it be identity-neutral (e.g. just an XML
spec that has a blob field of 'put identity here') does not attempt to claim
to be a viable universal identity system.
When public key certificates based on X.509 started being piloted on
the Internet, there were difficulties not only in managing issuing and
validating the certificates (with the top-down organizational model,
difficulties with revocation and status checking etc), but also in
advising users what a certificate _means_. In the early days of
Netscape Communicator's browser, every time the user visited a new
secured site, a wizard would pop up to guide the user though deciding
if they wanted to accept the certificate or not. Various groups
started defining extensions to the certificate structure to place into
it policy and guidance statements about the certificate. Some of the
extensions were machine readable, but many were human readable, though
only barely: 5 paragraphs of inscrutable legalese is hard enough for
the average Internet user, and useless to a non-English speaker. In the
end, the browser gives up as the average user doesn't care about whether
a web site is protected at Verisign level 73 or Verisign level 77.
A similar problem I saw when JINI (a Java API for dynamically
reconfigurable services) came out. Like a directory entry, a JINI object
can have a set of attributes, and they have types and values. The
problem arises of how to describe these to the user when the user attempts
to configure an object of a kind that the user had never seen before.
(I remember one suggestion was to have the attributes object simply be a
subclass of a Java windowing system data type, so that when accessed, it
could pop up its own little configuration UI. Of course, when a
command line or a server program wanted to view or manipulate this
configuration, this would not be possible, the configuration would
suddenly become opaque.) As another example, the Universal Plug and Play
(uPNP) spec has devices announcing themselves with the URL of the
manufacturer and a set of 'control panel' icons. This tends to assume the
computer to which the device is attached has good network connectivity and
can display icons in a variety of graphical formats: a modern workstation
meets these assumptions, a phone, PDA or embedded device might not.
Let's take the problem of assurance in identity information. For example,
when someone receives an email with a vCard, the amount of assurance which
they can place in the vCard is really no more than what they can place in
the email system that delivered the vCard and their opinion of the sender.
Overall, fairly low assurance: a jokester can put "president@whitehouse.gov"
in their vCard as their email address if they want to. Not to say low
assurance is always bad, however, but if there's no email security, someone
can spoof an email that appears to be from someone else and include a vCard
that might override and replace a legitimate vCard, or similarly
impersonate someone in a social networking application, 'friend phishing'.
PKI tried to provide a means of assurance, but this tended to work only in
closed communities: a client application would need to be hardcoded with
the root certificates AND be able to distinguish between the different
root certificates from each issuer: which ones implied high assurance and which
ones low assurance. Such an application however wouldn't be able to provide
any guidance to the user on a certificate from another issuer if there's no
chain of trust to one of the root certificates it knew, at most it could say
that the certificate was 'valid', but this validity would not provide any
guarantees of assurance in the identity enclosed in the certificate
being useful for or comparable with anything outside of the certificate:
a certificate authority might be operating with a completely different
naming scheme, and a rogue certificate authority could put in any identity
into a certificate it wishes, including one that isn't appropriate for
that CA to use. The PKI lacked a standardized vocabulary and mechanisms to
provide a means of using the identity from a certificate in situations
when there wasn't an assumed structure of how all the issuing authorities
are organized, such as a top-down hierarchy.
In the design of a client application that wishes to participate in a
universal system, there are several problems, even for a comparatively simple
example of determining the assurance of an identity statement that comes in
attached to an email. First, what is the format of the identity information?
There are four likely possibilities. One possibility is that it is in a format
known to and implemented by the client application. The second is that the
format is not directly implemented by the client, but the client is able to
extract some information from it, e.g. it is an XML document that a skilled
person can do 'view source'. The third possibility is that the client
cannot extract any information from it because the format is something
completely different from any existing form of identity, e.g. it comes from
a culture which does not use the same attributes for naming, or it identifies
something other than an individual person. The fourth is that the identity
refuses to disclose itself to a client that can't fully understand it.
(For a further description of #4 see my comments to Craig Burton that are
on his blog for Jan 23).
If there's a common format, possibility #1, then implementing how to
present this interworking is straightforward though nontrivial. But when
it's the other three possibilities, then determining how to present it to
the user in such a way as to prevent the user from being confused between
two or more of these identity statements is the same kind of challenge as
confronted PKI. If the client application is well-connected to the Internet,
it might try for example try searching around the Internet for a 'driver'
for this form of identity, similar to how a Microsoft Windows client
might search for a driver for a USB device. Of course, this would
require that there be an assurance system for establishing trusted
drivers, which then tends to limit the ability of the overall system to
expand to handle new forms of identity and new statements about identity
in a platform-independent way.
But why are there a multitude of identity formats?
First, IT systems have traditionally enforced an artificial system for
naming, with assumptions like:
- individuals only have 3-4 naming attributes
"A fully evolved nomenclature consists of (in this order) laqab,
kunya, ism, patronymic (with or without further nasab), nisba(s)..."
from Arabic Nomenclature: a summary guide for beginners,
http://www.lib.umich.edu/area/Near.East/BeestonNomen.pdf
- everyone has a given name and surname
Not valid for cultures which do not make use of a surname or
family name, e.g. Icelandic or Malay.
- everyone has a single full name that is what everyone else refers to them as
Even in the US, Dr. Robert Smith would expect to be called Dr. Smith
by a patient and Bob by a family member.
- everyone in a culture has the same 'display order' for their name
"I°ve found that name order in Chinese persons is a marginally
reliable indicator of attitudes towards the West.", quoted from
http://www.crookedtimber.org/archives/001435.html
Even Amazon, for example, gets confused and implies a book is
written by BOTH Shiga Naoya and Naoya Shiga:
http://www.amazon.com/exec/obidos/tg/detail/-/0231121571/
A universal system needs to be able to incorporate names for individuals
in formats that are not subject to the above limitations.
Next, most human naming systems typically only use in names a few basic
relationship qualifiers such as son-of, descenant-of, father-of. IT systems
have encouraged other forms of qualifiers, e.g.
- UUCP bang paths: {seismo, ut-sally, ihnp4}!rice!beta!gamma!me
- PGP key signing paths, e.g.
http://www.chaosreigns.com/code/sigtrace/darxus.famous.jpg
- social networks, "You and Dean are two degrees apart and share a mutual
connection"
and this set will continue to grow.
Furthermore, one can certainly have identifiers for people other than their
name, e.g. what role an actor played in a movie. I can recognize people I've
seen before when I see them next in person or in a photograph even if I do not
remember their names, and no doubt software will soon be able to perform
the same functions, matching people who appear frequently together in
digital camera pictures as connected in some way.
Finally, what will be the 'trickle-down' from projects such as SRD's Non-Obvious
Relationship Awareness? The identity system will be able to detect and describe
relationships between individuals that are not immediately obvious to the
individuals themselves. While there are certainly privacy and 'big brother'
implications, the same kinds of algorithms can be run in a more distributed
fashion, perhaps a user can run the algorithms on their own inbox and discover
connections that are useful to them. For example, it would be fairly
straightforward for a program that would, when configured with the identities
of a few people who work in Identity Management, deduce certain relationships
based on the data in LinkedIn. For example, that "Don Bowen" is a hub. When
describing identities, new attributes can be defined based on these
connections, such as a "Six degrees of Don Bowen" game would describe how many
links you are away from Don Bowen. Now what is the implication of these
kinds of attributes for machine processing?
Tags: kim-cameron