Home

Specifications

Schema

Commentary

Mark Wahl


Web Design by
Kristen Lanum

Commentary by Mark Wahl

Organizing principles for identity systems:
Client implications of Kim's fifth law (2005/2/1)

Jamie Lewis wrote:

Speaking of the fifth law, like Scott Lemon, I was a bit surprised when there wasn't more of a collective 'hallelujah' in response to its posting.

From a purely technology perspective, processing systems that follow this
law are feasable, but what worries me with #5 is both (1) the human 
interface side and (2) the automated side when implementing an interface 
onto a unifying system that's 'enabling interworking'.  IMHO, this problem
is really only one level of indirection away from the some of the unsolved 
difficulties that we faced with public key infrastructure, and there've been 
several attempts since in nearby problem spaces that were unsuccessful (i.e.
not-universal) due to platform limitations.  As there are no systems today that
are widely deployed and meet Kim's rules, I'm hoping that future systems that 
are built according to the rules will also find ways to avoid platform-
specific limitations and the unsolved problems that dogged existing systems.  
From a specification perspective, it's necessary to ensure that there's enough 
guidance in what makes a system universal and usable to ensure that an identity
system that merely has code to make it be identity-neutral (e.g. just an XML 
spec that has a blob field of 'put identity here') does not attempt to claim 
to be a viable universal identity system.

When public key certificates based on X.509 started being piloted on 
the Internet, there were difficulties not only in managing issuing and 
validating the certificates (with the top-down organizational model, 
difficulties with revocation and status checking etc), but also in 
advising users what a certificate _means_.  In the early days of 
Netscape Communicator's browser, every time the user visited a new 
secured site, a wizard would pop up to guide the user though deciding
if they wanted to accept the certificate or not.  Various groups 
started defining extensions to the certificate structure to place into 
it policy and guidance statements about the certificate.  Some of the
extensions were machine readable, but many were human readable, though 
only barely: 5 paragraphs of inscrutable legalese is hard enough for 
the average Internet user, and useless to a non-English speaker.  In the
end, the browser gives up as the average user doesn't care about whether
a web site is protected at Verisign level 73 or Verisign level 77.

A similar problem I saw when JINI (a Java API for dynamically 
reconfigurable services) came out.  Like a directory entry, a JINI object 
can have a set of attributes, and they have types and values.  The 
problem arises of how to describe these to the user when the user attempts
to configure an object of a kind that the user had never seen before.
(I remember one suggestion was to have the attributes object simply be a 
subclass of a Java windowing system data type, so that when accessed, it
could pop up its own little configuration UI.  Of course, when a 
command line or a server program wanted to view or manipulate this 
configuration, this would not be possible, the configuration would 
suddenly become opaque.) As another example, the Universal Plug and Play 
(uPNP) spec has devices announcing themselves with the URL of the 
manufacturer and a set of 'control panel' icons.  This tends to assume the 
computer to which the device is attached has good network connectivity and 
can display icons in a variety of graphical formats: a modern workstation 
meets these assumptions, a phone, PDA or embedded device might not. 

Let's take the problem of assurance in identity information. For example, 
when someone receives an email with a vCard, the amount of assurance which
they can place in the vCard is really no more than what they can place in 
the email system that delivered the vCard and their opinion of the sender.  
Overall, fairly low assurance: a jokester can put "president@whitehouse.gov" 
in their vCard as their email address if they want to. Not to say low 
assurance is always bad, however, but if there's no email security, someone
can spoof an email that appears to be from someone else and include a vCard
that might override and replace a legitimate vCard, or similarly 
impersonate someone in a social networking application, 'friend phishing'.

PKI tried to provide a means of assurance, but this tended to work only in 
closed communities: a client application would need to be hardcoded with
the root certificates AND be able to distinguish between the different
root certificates from each issuer: which ones implied high assurance and which 
ones low assurance.  Such an application however wouldn't be able to provide 
any guidance to the user on a certificate from another issuer if there's no
chain of trust to one of the root certificates it knew, at most it could say 
that the certificate was 'valid', but this validity would not provide any 
guarantees of assurance in the identity enclosed in the certificate 
being useful for or comparable with anything outside of the certificate: 
a certificate authority might be operating with a completely different 
naming scheme, and a rogue certificate authority could put in any identity 
into a certificate it wishes, including one that isn't appropriate for
that CA to use.  The PKI lacked a standardized vocabulary and mechanisms to 
provide a means of using the identity from a certificate in situations 
when there wasn't an assumed structure of how all the issuing authorities 
are organized, such as a top-down hierarchy. 

In the design of a client application that wishes to participate in a 
universal system, there are several problems, even for a comparatively simple 
example of determining the assurance of an identity statement that comes in
attached to an email.  First, what is the format of the identity information?
There are four likely possibilities.  One possibility is that it is in a format
known to and implemented by the client application.  The second is that the 
format is not directly implemented by the client, but the client is able to
extract some information from it, e.g. it is an XML document that a skilled
person can do 'view source'.  The third possibility is that the client
cannot extract any information from it because the format is something
completely different from any existing form of identity, e.g. it comes from
a culture which does not use the same attributes for naming, or it identifies
something other than an individual person. The fourth is that the identity 
refuses to disclose itself to a client that can't fully understand it.  
(For a further description of #4 see my comments to Craig Burton that are
on his blog for Jan 23).   

If there's a common format, possibility #1, then implementing how to 
present this interworking is straightforward though nontrivial.  But when 
it's the other three possibilities, then determining how to present it to 
the user in such a way as to prevent the user from being confused between 
two or more of these identity statements is the same kind of challenge as
confronted PKI.  If the client application is well-connected to the Internet, 
it might try for example try searching around the Internet for a 'driver' 
for this form of identity, similar to how a Microsoft Windows client
might search for a driver for a USB device.  Of course, this would 
require that there be an assurance system for establishing trusted 
drivers, which then tends to limit the ability of the overall system to 
expand to handle new forms of identity and new statements about identity
in a platform-independent way.  

But why are there a multitude of identity formats?  

First, IT systems have traditionally enforced an artificial system for
naming, with assumptions like:

 - individuals only have 3-4 naming attributes

	"A fully evolved nomenclature consists of (in this order) laqab, 
	kunya, ism, patronymic (with or without further nasab), nisba(s)..."
	from Arabic Nomenclature: a summary guide for beginners, 	
	http://www.lib.umich.edu/area/Near.East/BeestonNomen.pdf

 - everyone has a given name and surname

	Not valid for cultures which do not make use of a surname or 
	family name, e.g. Icelandic or Malay.

 - everyone has a single full name that is what everyone else refers to them as

	Even in the US, Dr. Robert Smith would expect to be called Dr. Smith 
	by a patient and Bob by a family member.

 - everyone in a culture has the same 'display order' for their name

 	"I°ve found that name order in Chinese persons is a marginally 
	reliable indicator of attitudes towards the West.", quoted from
    	http://www.crookedtimber.org/archives/001435.html

	Even Amazon, for example, gets confused and implies a book is 
	written by BOTH Shiga Naoya and Naoya Shiga:  
	http://www.amazon.com/exec/obidos/tg/detail/-/0231121571/

A universal system needs to be able to incorporate names for individuals
in formats that are not subject to the above limitations.

Next, most human naming systems typically only use in names a few basic
relationship qualifiers such as son-of, descenant-of, father-of.  IT systems
have encouraged other forms of qualifiers, e.g. 

 - UUCP bang paths:  {seismo, ut-sally, ihnp4}!rice!beta!gamma!me
 - PGP key signing paths, e.g. 
   http://www.chaosreigns.com/code/sigtrace/darxus.famous.jpg
 - social networks, "You and Dean are two degrees apart and share a mutual 
   connection"

and this set will continue to grow.

Furthermore, one can certainly have identifiers for people other than their 
name, e.g. what role an actor played in a movie.  I can recognize people I've 
seen before when I see them next in person or in a photograph even if I do not 
remember their names, and no doubt software will soon be able to perform
the same functions, matching people who appear frequently together in 
digital camera pictures as connected in some way.  

Finally, what will be the 'trickle-down' from projects such as SRD's Non-Obvious 
Relationship Awareness?  The identity system will be able to detect and describe
relationships between individuals that are not immediately obvious to the 
individuals themselves.  While there are certainly privacy and 'big brother'
implications, the same kinds of algorithms can be run in a more distributed 
fashion, perhaps a user can run the algorithms on their own inbox and discover
connections that are useful to them.  For example, it would be fairly 
straightforward for a program that would, when configured with the identities 
of a few people who work in Identity Management, deduce certain relationships
based on the data in LinkedIn.  For example, that "Don Bowen" is a hub.  When
describing identities, new attributes can be defined based on these 
connections, such as a "Six degrees of Don Bowen" game would describe how many 
links you are away from Don Bowen.  Now what is the implication of these 
kinds of attributes for machine processing?

Tags: