May 11, 2000
This article was originally published in Web Review in two parts on April 7 and April 14, 2000. This is an updated version.
Nearly three years have passed since the kick-off meeting of the P3P (Platform for Privacy Preferences) working group in June 1997. If the World Wide Web Consortium is going to help us preserve privacy, it would be nice to get the protocol finalized and see some browser implementations before e-commerce sites have a complete record of all the books we read, the music we listen to, and the medicines we take. (The final comment period on the protocol ended on April 30, 2000, but the committee continues to post new drafts; this article is based on a draft dated May 10, 2000.)
But P3P may also be fading in relevance. The kick-off meeting was announced just in time for a 1997 workshop by the Federal Trade Commission on privacy, and the promise of P3P may have helped persuade the commission to endorse industry “self-regulation.” Few people remember a related initiative (the Open Profiling Standard) that was rolled out at the same time by the high-flying Netscape corporation. OPS quickly shriveled up and retreated into obscurity. Given growing public suspicion of self-regulation in the privacy sphere, P3P may well follow it.
The public has become frustrated with the failures of self-regulation, particularly the inability of TRUSTe to discover or stop several alarming practices (RealNetwork’s tracking of music downloads, the ID embedded in Microsoft Office documents, and America Online’s reuse of member data). In December 1999, the Electronic Privacy Information Center (EPIC) released one of their annual reviews of popular Web sites and found their privacy statements to be superficial and inadequate.
In the current atmosphere, calls for legislation in the U.S. are growing despite the frantic efforts of direct marketers. Meanwhile, after two years of negotiations, the United States government concluded a deal last month with the European Union requiring U.S. companies to limit the use and sharing of data strictly whenever they operate in Europe.
One readily sympathizes with the difficulties faced by those who try to formalize policies enough to turn them into a networking protocol. The P3P working group experienced tremendous pressure, caught between those who want a policy friendly to businesses and those like the European Union who want unambiguous pronouncements of protection for data. The problems faced by P3P can teach us a lot of basic lessons about the relationship between computers and society.
February 2000: the entire world watches the Web reel from a cynical exploitation of trust. It starts when a packet probes an innocent user’s computer to see whether some telnet daemon or Microsoft Office program is accepting requests. Suddenly and unknowingly, that user is running a program that sends a thousand megabits of data a second to Yahoo!’s servers. Multiply this intrusion by a few thousand, and one gets the frightening denial-of-service attacks that briefly shut down the Web’s most popular sites.
For years, therefore, Internet security folks have followed the principle: don’t trust whoever is behind the packets. A protocol means nothing except what is passed back and forth using the protocol. P3P runs against this well-established principle.
That’s a lot of promises. The history of privacy has shown that they are routinely broken. Strong assurance for privacy protection requires enforcement through laws and regulations—precisely what businesses are trying to stave off.
Attempts to make or carry out policy through technology usually come to grief. This was enunciated as far back as 1976 in Joseph Weizenbaum’s classic book Computer Power and Human Reason, and it’s part of the mission statement of Computer Professionals for Social Responsibility. Nevertheless, large commercial sites latched on to P3P as backing for their argument that technology could protect privacy and that governments need not act.
The P3P team, if you read its work carefully, rejects such sunny assumptions. A widely-publicized 1998 introduction to P3P says:
P3P is not a silver bullet; it is complemented by other technologies as well as regulatory and self-regulatory approaches to privacy.
and from the specification itself comes this exclusion:
…[P3P] does not provide a technical mechanism for making sure sites act according to their policies…P3P is complementary to laws and self-regulatory programs that can provide enforcement mechanisms.
However, the energy driving P3P from outside the W3C came from those companies that hoped to continue evading such “enforcement mechanisms.” As Internet users wise up, the appeal of P3P for these companies will wane.
One reason the P3P specification is taking so long is that it’s just trying to be too many things to too many people. This is the likely fate of any such “social protocol,” as P3P was termed in a paper by Reagle and Cranor.
Everybody has had the experience of making a reservation or filing a complaint or dealing in some other fashion with a computer that couldn’t stretch to cover the reality of the situation. Studies of the adoption of computing technology repeatedly turn up problems with organizations that try to formalize work processes (especially if computer designers believe the managers’ descriptions of those processes). So how could the P3P working group believe it could handle the infinite flexibility of trust relationships between commercial organizations and their clients?
The P3P vocabulary defines all the values of all the fields the protocol should offer. The protocol can always be extended, because—after the committee tried various awkward formats—the development of the widespread XML standard made it easy for them to settle on a fairly simple text format that is easy to add to. But for the purposes of this article, it's reasonable to discuss the current state of the protocol, because any extension would have to be worked out through a similar negotiation and standardization process.
Let’s take a look for the purpose of illustration at one element, the RECIPIENT. This indicates who will be given access to user data. The most restrictive value P3P gives is “Ourselves and/or our agents.” People who have followed the recent debate over a financial services law in the U.S. know that this value is already too broad. One of the main reasons financial institutions want to merge is so one division can get access to customer data in another division. Data can then travel from a mutual funds administrator to a bank to a life insurance company without ever leaving “Ourselves and/or our agents.”
The RECIPIENT element is less important than the PURPOSE, which allows the user to see whether data is going to be used for potentially unwanted uses such as user profiling or future product promotion. While this element summarizes how different organizations use data, its eight possible settings remain still quite general. Some users are willing to share their data with outside organizations in ways that no formal protocol can determine. You may not mind if your liquor-purchasing habits are shared with a liquor company, because you might want to get a catalog from a small company you’ve never heard of. But you would indeed mind if the same data is shared with the lawyer for your opponent during a lawsuit when he’s trying to prove you’re reckless and negligent. (My example is inspired, of course, by a real-life court case.)
Similarly, in a system useful for real-life situations, one often wants to specify a time limit (such as two months) or trigger event (such as “when I close my account with you”) after which data has to be discarded. In the most recent version of the standard, the P3P working group added a RETENTION element that provides room for negotiation, though its five possible values again are rather general. The “date” type, available to record personal information like the user's birth date, has not been put to use here.
P3P is fine tuned for sending client data to an e-commerce site that can use it for further marketing: the specification even includes such fields as job title and mobile phone number. By contrast, it has no provision for the server to send data about the Web site’s owner to the client. Before trusting a Web site with data, a client might well want to know whether it is run by a for-profit or nonprofit organization, and whether some company is funding a nonprofit site. (Thanks to CPSR member Karen Coyle, author of a Privacy and P3P FAQ, for this point.) The final version of the P3P specification just began to address this need through an element called ACCESS, which indicates whether a Web site offers contact or “other identifiable” information. A European project is also looking at ways to provide the information through a database.
Originally, P3P was expected to come with a related protocol that transferred data after the two parties agreed on how it would be used. You would click on a browser box saying, “Send my address,” and the browser would send the server your address, which you would have previously loaded into a convenient location to be found by the browser. It is assumed that the server already negotiated an agreement concerning whether the address was not to be shared with other organizations, or whatever other restrictions you called for. As we have seen already in this article, violating the agreement is easier than obeying it.
Privacy advocates saw plenty of dangers in the automation of data transfer. First, it’s hard for users to tell just what they’re agreeing to—a dialog box might say something vague like “Send the data required to complete the purchase,” and the user might have no idea that data about his bank account (to pick an example) might be sent.
Furthermore, automation would require users to enter lots of personal data and store it in a location chosen by the developers of their browser. Security experts know that the greatest gift developers can hand to intruders is to provide a single location where sensitive data can be found on a large number of computers. Scant time would pass before newsgroups would be circulating scripts that penetrated moderately insecure systems and retrieved a Social Security number from Navigator or IE. It may be possible to ameliorate the problem through encryption or by designing computer systems to wall off certain areas of the filesystem and memory.
It was thus something of a relief when the P3P working group announced that it was removing data transfer from P3P. They were driven less by the concerns I just mentioned than by implementation difficulties and perhaps by reports that commercial sites didn’t want the new mechanism; most had developed CGI scripts or other means to collect data and wanted to stick to them.
But it’s a situation of damned-if-you-do and damned-if-you-don’t. Separating data transfer from the privacy agreement has further weakened the promise of P3P. It will be harder for the client to prove that he or she placed restrictions on the data; the server administrators can say, “You made the agreement about a different set of data.” The P3P working group is trying to salvage the strength of the privacy agreement by encouraging all servers to define their forms with the same terms as the P3P vocabulary uses for data. The association, however, is backed only by a promise, or perhaps in the future by the use of electronic signatures.
A paper by two members of the P3P team, Ackerman and Cranor, admits that the protocol offers so many variables along so many dimensions that few users are going to take the time to figure it out. We know that most computer users never even change the background on their display; those that do customize their environments usually require an immediate, visible response such as a color change. Entering a dozen values with interacting behaviors whose effects will never be visible is for most users a task comparable to attaining the seventh station of enlightenment. Each attempt to make P3P more relevant and adaptable (such as allowing multiple versions of the specification) has also put new burdens of understanding on the user.
The particular solution suggested in the paper consists of intelligent agents called “critics” that would run on the user’s computer, monitor what he or she is doing, and issue warnings when it notices an activity that puts the user’s privacy at risk. Another suggestion from the P3P team is that the protocol might be built into agents that crawl the Web for products of interest to users. Thus, if you ask a search engine to find bicycles matching certain criteria, you might also ask it to exclude sites with inadequate privacy policies.
Aside from such speculative research projects, the most promising use of P3P is what the Reagle and Cranor paper calls “a recommended setting in the form of a ‘canned’ configuration file.” Well-known organizations might provide templates that users can choose from; thus, a very concerned user could install a restrictive template from EPIC while a shop-till-you-drop surfer might install one from the Direct Marketing Association. That still puts the burden on the individual to find and download the template. How will P3P satisfy laws in the European Union, where there are no privacy “preferences,” but only a standard that applies to everybody? When people download their browsers, will Netscape and Microsoft ask them what country they live in and set the default policy to respect EU laws?
Nevertheless, P3P has proven attractive to some European analysts, who see it as a technical way to let American companies get away with whatever we tolerated within our borders while protecting the privacy of Europeans. In fact, I am told they feel a need for a technological support for enforcing their laws because these have become so complex. While P3P may prove to be a useful way to adapt to differences between policies in different jurisdictions, such as use presupposes that each jurisdiction comes up with a clear and legally enforceable policy—a condition lacking in the United States.
It’s a common cliché that technology creates problems such as invasions of privacy. I tend to think that the problems come from the interests of the wealthy and powerful. When they’ve got their mind set on something, they’ll use the available technology to get what they want.
Let’s look, for instance, at the way browser cookies turned into privacy-invasive technology. When Netscape invented the cookie, most programmers who checked it out declared it innocuous. After all, the cookie would be sent back only to a site in the same domain as the site that set the cookie. And there was little user-identifying information the browser could send: just an IP address and some data about the browser and operating system.
The first step in the conversion of the cookie to a weapon of mass marketing came when companies such as DoubleClick found it profitable to start aggregating advertisers. When you looked at a Web page containing an ad for one of these DoubleClick partners—even if you didn’t click on the ad—the browser deposited a cookie in your cookie file, and that cookie referred not to that company’s domain but doubleclick.com. In short, all roads led to the DoubleClick database, which could then aggregate information about the sites you visited. And DoubleClick took another giant step by folding in a huge database of customer information when it purchased Abacus Direct. That would have allowed any information you give one of DoubleClick’s partners to be cross-referenced with the information gained by Abacus through traditional channels.
DoubleClick backed down from combining the databases after a tremendous public outcry, including a lawsuit from EPIC and investigations started by the FTC and two states.
Technology itself did create another breach in privacy, this time as a result of attempts by designers of mail user agents like Eudora and Microsoft Outlook to make it easy for a mail program to display Web pages and styled text. They opened the door to the by-now-familiar trick of embedding an email address in an image tag, so that a spammer can associate your email address with a cookie as soon as you make the mistake of clicking on the spam message. Still, the ruse is valuable only to somebody who collects data from a vast range of ads or Web sites.
In short, when the stakes get high enough, somebody with sufficient money will find a way to twist technology to his needs. Technological limitations have not limited some from going so far as to try to monitor all electronic traffic around the world.
And why are people so concerned with privacy? Usually it’s because they’re afraid that someone with power will misuse data. You’re worried your boss will fire you if he finds out what Web sites you’re visiting. You lie awake at night wondering whether your insurance company will deny you a new policy when it finds out you have a genetic disorder. And so on.
Privacy involves soft issues of dignity as well—you may feel inhibited in the park about whispering in your lover’s ear if you think you can be heard by others—but the most pressing issues deal with the interaction between privacy and the abuse of power. Privacy is no different from other areas where people abuse power, whether it’s a city councilor denying a building permit to a citizen who irks her or a lumber tycoon deciding to put thousands of acres of Indonesian rain forest to the torch. In short, you can’t get far with privacy unless you’re also willing to address abuses of power.
Andy Oram, firstname.lastname@example.org, is an editor at O’Reilly Media and a member of Computer Professionals for Social Responsibility. This article represents his views only.