Commentary by Mark Wahl, CISA
Organizing principles for identity systems:
Data Sharing and fault tolerance (20070909)
One topic which has not seen as wide discussion in the context of the DataSharingSummit (wiki) has been the ability for data sharing to help provide the users with fault tolerance for social networking services they rely upon. This is a problem worth addressing as currently a single hosting center outage can shut down multiple independently-operated social network services. Furthermore, that outage shut down an OpenID identity provider (OP), and thus the users of that OP were no longer able to use their OpenIDs to log into services elsewhere which were still online.
In a fault tolerant distributed system, the system as a whole continues to operate, perhaps in a degraded mode, even when one or more of the components of the system have failed. Some of the failure modes might include:
- A temporary outage of one or more services as a backhoe takes out the power or network connectivity to the hosting center.
- A component service disappears, never to return, and any data maintained in there is lost. For example, the Walmart Hub social networking site went away only a few months after it was launched.
- A component service experiences a Byzantine failure and issues erroneous data. Pamela Dingle discussed this in her post "Mystery Solved; Questions Abound" that for a few hours in July 2007
the wordpress.com staff installed software that mixed RSS feeds up for some unknown number of blog accounts, resulting in content from one persons' blog being published under the name of someone else
.
Some of the techniques worth considering would include:
- Ensure that relying parties (RPs) allow their users to associate multiple independent identities with their 'accounts' at the relying party. Just a person in the real world might carry a fallback credit card or ATM card from different issuing bank than their primary card's issuing bank in case their primary bank blocks their account, a backup identity would permit a user to continue to access their RP even when their primary identity provider (IdP) is unavailable.
- For portal sites which primarily aggregate a user's web data held by sites not affiliated with the portal, permit a page description to be exported to and held by the user on their local devices, so that the user can easily import their page description into a different portal should that become necessary.
- Many of the deployment models today assume that the user must trust their IdPs OP and RPs, and will 'just switch' to a better party should the IdP or RP misbehave. Unfortunately, these assumptions are not viable in the real world. An evil site will not advertise that it is evil. A well-intentioned site might occasionally experience errors or attacks that cause it to behave bady. A site might decide to change its policies but the user still has a large volume of data maintained there. In particular, when a site impersonates one of its users, today this is indistinguishable from the user's own behavior, and these activities can wreck a social network. Is there a way of recovering trust in a user after a service has impersonated that user?