Splitting Books Open:
Trends in Published and Online Technical Documentation

Andy Oram
15 September 2004

This article is based on a talk I first gave at the O’Reilly Open Source Convention, July 26, 2004. Had I turned the presentation into the kind of conventional essay high-school English teachers used to make me read and write, the result would have been so long and tedious that no one would get through it. I have therefore left the presentation as a set of expanded bullet points, reflecting the talk the way I delivered it. In fact, this choice of an informal, oral style illustrates one of the points made in the article.

Abstract/Summary
Start of article
Traditional Documentation--Where is it Going?
Community Documentation
Benefits of Traditional Documentation
The Beginnings of a Merger
Improving Community Documentation
Key Points

Abstract/Summary

While technical publishers strive to adapt to new online media and formats, online efforts at self-education by computer users are becoming a form of true grass-roots documentation. This talk discusses the strengths and weaknesses of each side--traditional books and user self-education--and suggests how they may converge. It offers suggestions for improving the educational effects of mailing lists, computing project web sites, and other community documentation.

The best traditional books possess many virtues: appropriate pacing, a knowledge of the audience, meaningful technical background, and good structure. However, these traditional books take too long to write and make too many compromises in their attempt to reach a large audience and boost sales.

In turn, community education efforts (the rich environment of mailing lists, newsgroups, chat rooms, and project web sites) offer immediate answers to questions from knowledgeable peers. However, they suffer from time wasted in searching for information, results that are unreliable, and difficulties in knowing where to start.

User education can be improved by promoting active community participants to become formal contributors, incorporating professionals into community documentation, nurturing new users, pointing people to documents, and enhancing rating systems. The Safari Bookshelf is an example of professional online documentation that can enhance user efforts. Some other current limitations of the community environment for learning are the domination of English, a difficulty in respecting cultural differences and different learning styles, and gaps in documentation.

Start of article

This major stepping stones within this talk are:

Traditional published documentation: A brief look at traditional publishers such as O’Reilly Media and where we fall short.
Users educating themselves:: A review of spontaneous, community documentation. The types of documentation I consider here go far beyond the works that mimic published documentation, such as the Linux Documentation Project.
Virtues of traditional documentation: We return to published books to consider where they add special value to the users’ education.
Virtues of community education: The same for community documentation.
Evidence of a merger: Each side--traditional publishers and project leaders in the community--recognize that the other side has something to offer, and something to learn from. The two sides are therefore converging.
How to make the most of efforts at community education: The largest section of the article, with specific and (I hope) useful suggestions for people leading or championing software projects.

The article should help its readers:

Track important trends in documentation and community
Show how users can improve their education so they make better use of technology
Eliminate much wasted time and duplicated effort

Finally, I have a hortatory goal: to recruit readers to help me create a movement to make the new efforts at documentation successful.

Traditional Documentation--Where is it Going?

My experience with books

I have spent ten years writing manuals for computer companies (not something to be particularly proud of) and then over eleven years editing O’Reilly books, particularly on free and open source software.

Frustrations

My work as an editor has been fulfilling but increasingly frustrating.

It’s a long slog until I finish a book; then I wait three months for its release. (And that’s a short time, so far as traditional publishers go.)

Our goals are often muddied by time pressures and conflicting requirements.

What Went Wrong With Books?

What is The Text: To understand why current published documentation is not meeting user’s needs, we must go beyond elementary observations such as timeliness. We have to look at the tradition I call The Text with a capital T. (Actually, two capital T’s.) The Text is a timeless, unalterable artifact to be approached with reverence.
Historical precedents: The original Text consisted of the five books of Moses. We have added many other works to this revered status: Homer, Shakespeare, and so forth. Students often analyze Shakespeare’s sonnets comma by comma. I have seen a literary critic discuss the impact of how Shakespeare spelled various words in the sonnets (because spelling was much less standardized in Shakespeare’s time than our own).
The Text is often corrupt: What’s odd about the concept of The Text is that many of these texts come down to us corrupt, or even in multiple versions. For instance, Shakespeare’s plays exist as Folios and as Quartos. Yet they are still considered unalterable and every detail matters.
When there is a Text in technical documentation: Only a few achievements in technical documentation have reached the status of The Text: Donald Knuth Art of Computer Programming, Kernighan and Ritchie’s book on C, and perhaps a few others such as Charles Petzold’s book on Windows programming. The books with words such as “Bible” in their names come nowhere near canonical status.

Requirements for Writing The Text

The work takes a long time.

You want it to be perfect, so you have to take time to write it.

The work must remain relevant for a long time.

Because you took so long to write it (and will take so long to write a revision), the work must be of interest to readers for a long time. Many books never get started because everybody involved in the project knows it will be obsolete too soon to recoup its money.

The work must sell a minimum number of copies.

The costs of editing, producing, and marketing the work lead to steep requirements for sales. Often at O’Reilly, we like a project and wish to promote it, but we can’t justify doing a book on the basis of expected sales. We like to tell the author, “We can’t afford to do your book--yet” which pleasantly leaves open future collaboration if the project takes off. But this practice delays our entry into the market if that should prove worthwhile.

The work must provide a lot of background to accommodate multiple audiences.

This is where the problems of The Text get subtle and interesting. Because the publisher is required to sell a lot of copies, the publisher and author are tempted to pad the book with extra material that may appeal to one segment of an audience or another. This resembles politicians making speeches, where you can identify the sentence aimed at the Latino audience, the sentence aimed at soccer moms, and so forth.

Many people purchase a computer book that looks interesting based on title and back cover, only to find that there are just a couple chapters of interest. If you feel as if a lot of the book was written for somebody else, you are probably right.

The Text, and therefore the publishing model we’ve had for centuries, don’t match well to fast-moving technical fields. But the tragedy of published books is compounded by some more mundane factors.

Most technical books are quite poor. And unfortunately, people in technical fields have come to assume that this is documentation’s natural state.

From their first encounter in elementary school, users expect unreadable texts. They get their first chemistry or math textbook and find such basic writing failures as:

Long explanations of arcane topics with no justification
Fussy, nit-picking distinctions that interrupt the flow of ideas
Terms used before they are defined

I’m convinced the frustration of trying to make sense out of these poor textbooks is the reason many talented students drop out of math and science--a real tragedy. Those who carry on learn to tolerate bad texts through high school, college, graduate school, and right up to when they get a job and crack open a computer book.

There are many reasons for documentation moving online and moving to the user community. One can easily cite costs, update speed, and so forth. But Aristotle, who defined four causes for things to happen, defined the final cause as the force that really drives forward change; what makes something have to happen. The final cause for the move to online community documentation is this: the latter is more successful than The Text at meeting user needs.

Community Documentation

To illustrate the range of available online documentation, I asked a few questions of my audience at this point in my talk.

Have you ever answered a query on an online forum, such as a mailing list, newsgroup, or chat?: And the accompanying question: did you think you were creating documentation? Some people say they did. At the very least, they were leaving a written record of new information.
Have you had your answer on the online forum archived?: If so, the archive makes a strong case for calling the answer documentation.
Have you searched an online forum for an answer?: This clinches the question. Someone has created useful information that you have used to solve your problem, without your knowing the person or perhaps having any relationship with the group in which he or she created the information. That’s documentation in my book. And in fact, many system administrators tell me that when they encounter a need for information, their first recourse is neither a printed book nor an official online page, but their favorite search engine.
Have you asked a question on an online forum?: And did you realize you too were contributing to documentation? I think you were. You identified missing information and mobilized a set of people to fill it.

Conclusion: documentation is in your hands.

User Education is Community Education

Community is built with one IRC chat, one newsgroup, one mailing list at a time.: When you join a group and answer someone’s question, or pose questions that other people answer, the people in the group are taking care of each other. That’s what makes it a community
User groups: Technical support groups need not be just virtual ones with an online existence. For instance, near the beginning of Linux’s spread, the hardest thing about Linux was simply getting it installed. (On some hardware, it’s still hard.) So people in major cities around the world would come together in “installfests” to help each other get Linux up and running. Of course, these groups have online components too.

In short, a community means taking care of each other, which many mailing lists, newsgroups, and chat rooms do.

Characteristics of Future Documentation

Community efforts such as mailing lists show us where technical documentation as a whole is heading.

Online: This allows it to published and updated instantaneously.
Freely distributable: It will be available to everybody in the world with Internet access. You can safely refer someone to it without worrying whether he can obtain it in his part of the world, or whether he can afford it.
Localized and topical: By these terms, I mean that documentation can be written for a specific audience with an immediate need. For instance, instead of a single huge book about MySQL, someone may write a tutorial on MySQL for Oracle programmers interested in migrating, someone else may write a tutorial on MySQL for Web administrators interested in adding dynamic content to their sites, and so forth.

Advantages of Future Documentation

It can be done in one’s spare time by someone close to the topic and the audience.: When I plan a book with an author, it’s daunting. I lay out a schedule for a year in advance and he often has to take time off from some other activity, such as teaching. In contrast, a short web page can be written by somebody with a passing interest over a weekend.
It can be as small as a question and answer: We saw this while discussing the use of mailing lists and newsgroups.
Distribution costs are minimal: No more paper, warehouses, and trucks. Just a web server and some bandwidth--and if the software is popular, a set of mirror sites to spread the bandwidth demands around.
No more enormous tomes--give people as much as they’ll read at a sitting: Why write a thousand-page book? Nobody reads a thousand pages at one sitting. Might as well give them a chunk of text they’ll read together.
No need to appeal to broad audiences: There’s no thought of padding an online document. Each person can write for the people he’s interested in helping.

So What’s Not to Like?

Lots of wasted time spent searching

While the search engines and archives offer impressive results, everyone can remember a time when they had to spend too long searching--and perhaps gave up in the end.

Can’t always trust answer

You don’t know who posted the result most of the time, or the background of a person who put up the web page. Even if the information is formally correct, you might have trouble judging if it applies to your situation.

Timeliness is particularly important, because once postings and web pages go up they tend to stay up even as the software changes. And with the low cost of storage, nobody seems concerned with reclaiming disk space any more.

You don’t know what you don’t know (not only is the answer hidden, but the fact that it is hidden is also hidden)

This is the most subtle of the problems. Many people don’t identify what they’re doing as problematic. They never think to look for alternatives to what they’re doing, and wouldn’t know what to ask.

For all these reasons, traditionally, some type of formal documentation is needed to get you started. Each learner traces a unique path through a rich learning environment. Perhaps she begins with a mailing list. Having learned of good web sites to read, she comes back to the mailing list with better questions to ask. She may also pick up a published book or take a seminar along the way.

Benefits of Traditional Documentation

Worthy traits of traditional documentation

This section will describe in somewhat abstract fashion what makes good documentation special. The following section will bring the discussion more down to earth.

Pace

By pace I mean giving the user what she wants, when she wants it. An example of successful pacing was pointed out by a technical reviewer on one of our books who said, “Whenever I read a paragraph and had a question, I found the question was answered in the next paragraph.” Now, it might have been even better if he did not have to ask the question in the first place. But clearly, this was a well-paced document.

Pace is very hard to achieve in community documentation, because few amateurs do it naturally and even professionals need the eagle eye of the editor at key points.

Audience

Audience is related to pace, because you have to understand your readers--what they know and don’t know, what questions are on their minds, how their thought processes move--in order to hand out information in the proper order.

As with pace, it’s hard for community authors to think about the needs of their audience. What I notice is that authors tend to write for other people just like them. And this can work well--one just faces the formidable tasks of finding representatives of each audience for whom you want a document, and then motivating these representatives to learn the technology and write about it.

Background

By background I mean something more than mere theory. Many people can write thousands and thousands of words on the theory of some topic--IPv6, for instance--and just bore everybody to tears. Rather, good background yokes the theory to the immediate needs of the reader. It makes clear why certain theory has to be understood and trains the reader to apply the theoretical concepts to what he is doing.

This kind of background is the hardest element of technical documentation, and is rarely found in community efforts. I spend a major chunk of my editing efforts explaining to authors what background they need and how to integrate it with their work.

Structure

Structure is similar to pace, on a larger scale. It involves such choices as putting the basic tasks one needs to know before the more complex tasks that rest on them.

Structure is kind of shot when one goes online. One finds dozens of unrelated documents with no indication which to read first. Some documents try to solve this problem by helpfully organizing links into a reasonable order. I’ll examine some solutions later in this talk.

Luckily, we are learning to live in a less and less structured world. Open source projects depend on loose associations among trusting people. Businesses are devolving and spinning off functions. Even the military is getting less rigid. If we can tolerate less structure in life, perhaps we can tolerate less structure in our information. (But one audience member suggested that in this situation we need even more structure in our information.)

In summary, good documentation offers the big picture. We’ll see exactly what it offers in the following section.

Questions answered by good books

What range of problems does this technology solve?

The book does not say simply what a technology does. It indicates where it is useful and where it is not.

How do different parts interact and alter each other’s behavior?

This is a matter of seeing the big picture, as mentioned earlier. Many topics don’t work well when considered only in pieces.

A well-known example of a topic requiring a holistic view is security, You can fix individual parts of a systems’ configuration, but unless you consider them all as a whole you’ll probably leave holes.

Another example is performance tuning. Changing individual parameters of a system in isolation is like tuning a guitar by changing each string without comparing it to the other strings; you’ll won’t get anywhere.

What are the strengths and weaknesses of different solutions?

Like other questions, this one rises above the consideration of an individual topic and involves a comparison of several.

What I am responsible for once I adopt the technology?

This is particularly interesting, because people who take on a new technology find themselves in a role that may required unexpected tasks. For instance, suppose you get a book on putting up a web site for your organization. You may become responsible for a number of related, critical tasks, such as security and maintaining the web pages of that organization, which in turn may lead to running other software such as content management systems.

How do I lay the groundwork for flexibility and reliability as my system grows?

For instance, how do I write maintainable code? How do I design a system that can grow as my organization grows?

The Beginnings of a Merger

A Step Toward the Future: The Safari Bookshelf

The Safari Bookshelf is currently O’Reilly Media’s main venture into the new world of online and dynamic documentation. A subscription service offering books in HTML, it was launched in July 2001 and soon became profitable. A number of other publishers, listed on the web site, have joined. And there is little else like it in professional documentation. A few services offer books in electronic format, but the Safari Bookshelf offers a unique combination of several elements that make it particularly valuable:

It is in HTML, making it easy to view under many different conditions, to search text, and to copy examples from.
It focuses on computer and related technical documentation.
It includes a sophisticated search interface with fields for title, author, publisher, and so forth.
Most importantly, it offers the same high-quality, professionally written and edited text as the books from which it is drawn.

But O’Reilly Media, and in particular the developers of the Safari Bookshelf, know it is not the be-all and end-all of online documentation. Because its material is identical to the printed books, it is not as useful online as a document that is designed from square one as an online document.

The online versions certainly take advantage of the medium in some simple ways, such as turning references to other parts of the document or other documents on the Internet into links. We have gradually added enhancements over time to further exploit the online medium. For instance, annotations are now allowed--and you can let other readers see your annotations. This adds an element of community participation that we would like to increase.

There is lots of work to do on the Safari Bookshelf, and lots of potential. As we earn more money on it, we can invest more in it. Other publishers can push the process along by starting competing services or--better for everyone, in my opinion--joining the Safari Bookshelf and pushing us to speed up our development.

Potential Future Roles for Publishers

Given the pressure of online documentation, there are many indications that the publishing industry will evolve radically, and that publishers--like movie studios, music studios, newspapers, and other content providers--will have to find new business models over the next decade.

I evaluate the changes that publishers could make by dividing what we do into two parts: what happens before the book is published, and what happens after.

Pre-publication support for author

This includes editing, layout, art, indexing, and technical review.

What may happen is a reversal of the current situation, where publishers take control of books and contract out to authors to write them. Instead authors may keep overall control and contract out to publishers for particular tasks. They may say, “I need figures” or “I need a proofread.”

Post-publication support for authors

This includes publicity and obtaining book reviews. Such tasks are increasingly performed by user community--just look at all the online book reviews, which have a notable impact on sales--but publishers have a lot of expertise here and may well find be appreciated as expert mediators.

Improving Community Documentation

In the final section of this article I suggest ways that project leaders and other members of software communities can improve the education they offer through online documentation and fora such as mailing lists. The topics are:

Urge active community participants to become formal contributors
Incorporate professionals into community documentation
Nurture new users; don’t repel them
Point people to documents, both professional and community-based
Enhance rating systems
Ancillary Failings of User Education

Urge active community participants to become formal contributors

Who shows a tendency to post a lot, with insight?: If you run a software project, or maintain a forum such as a mailing list for that project, stay on the look-out for people who post intelligent responses to questions and seem interested in writing up what they know. Ask them whether they’d like to do something more lasting and substantial. (Publishers sometimes find authors that way.)
Find out what motivates each writer: Why are people posting answers to questions or writing web pages about topics? Some may be consultants or trainers looking for clients. Others may just love the technology and want to see it more widely adopted. You can use these motivations as leverage.
Need to offer rewards for writing: But it would be good to find money as well. This raises the question of professional involvement, which I’ll discuss later.
The Wiki: Wikis are community web pages edited collectively; people can add, delete, and change whatever they want (although changes can be logged to control malicious defacing). I can’t say much about Wikis because they are new, but several impressive projects--notably, in the case of computer documentation, the Linux wiki--show that it should play a role in the process of creating community documentation. Some sites collect information through a Wiki, reject what they don’t like, and organize what they do like into something more formal. But I don’t see how Wikis can reproduce the traits of good books I discussed earlier, such as pace.

Incorporate professionals into community documentation

Editing, design, pictures, indexing, etc. are expensive: It’s worth getting these services from people who do them thirty-five hours a week or more, and have done them continuously for many years.
Currently only a few rigid ways to profit from contributions: Unfortunately, unless one writes a Text, one has trouble getting remunerated for this work.
Distributed payment systems are still just thought experiments: There are many interesting proposals for systems where people contribute micropayments into online funds and some committee (perhaps elected) disburses them to worthy projects. But these have not progressed beyond the proposal stage.
Whittle down what is needed to the point where authors can afford professional help: As mentioned earlier, authors may take control of the writing process and bring in professionals sparingly. It’s often valuable for an author to consult with an editor (and a review committee) at the very beginning of the project, to determine what sorts of background is needed and how to organize a document.
Sponsorship has precedents: Computer companies are investing an impressive amount of money in open source software. If documentation projects are sell organized and produce good results, they should qualify for funding too. This may seem impossibly idealistic to people who are used to seeing documentation scorned and starved, but there are precedents for it. Many companies--especially during the dot-com boom--would allow their employees to buy books related to their jobs and submit expense reports. What is this, but an indirect subsidy of the publication process?

Nurture new users; don’t repel them

The community has to take responsibility for each member’s learning: Every user has to count. Just think: the people asking questions on your mailing list may be the ones you want to hire six months from now. (When I gave the talk, one audience member said something even scarier: the people asking questions on your mailing list may be the ones who hire you six months from now.) Undoubtedly, some will be ungrateful, some will be a time sink, some will never learn--but you must give every one a chance until you know what he or she is like.
Dump RTFM from our ammunition bag: The person asking a familiar question may actually have read the manual. He may have read the README file, and the frequently asked questions list, and lots of other stuff--but just not realized that the answers in those things pertained to the question he had.
Encourage active learning in a more positive manner: It may indeed be necessary to encourage users to read more documentation, but it can be done in a respectful and supportive way. Some users need training in how to benefit from documentation.

Point people to documents

Many useful explanations are buried in newsgroups, etc.: The problem of structure, which I mentioned earlier, has to be addressed much more formally than the community has up to now. Sometimes a knowledgeable user posts a valuable piece of information--and it gets buried in an archive. Project leaders should recognize these valuable postings, extract them, and turn them into something such as web pages with a more robust presence.
Create flexible pathways through documentation: People would be very grateful to know what to read first. As a corpus of documentation builds up, someone could contribute a lot just by writing a web page that explains what type of audience each document is for, and what order they should be read in.
Make use of professionally developed documentation: I seem to be obsessed with this topic...
Volume of documents will be overwhelming: As mentioned earlier, web pages and online postings tend to stay up forever once they are created.
A guide to the guides: Some kind of portal may ultimately be the best way to provide documentation when there are so many sources and so many tiny contributions.

One project with a massive amount of contributed documentation, the Plone content management system, is trying a strategy of creating an outline for what the developers consider the ideal online documentation. The ultimate size could be thousands of pages if printed. It should help organize existing documentation and at the same time encourage users to write more, because they will know what is needed and how it fits into the whole.

Enhance rating systems

Let readers rank documents; collate their votes

This proposal is by far the most ambitious in this talk, to the point where we have no idea how to implement it. But if it becomes feasible, it can solve many of the problems in the previous section. While many specific rating systems exist--Slashdot, online book sites--these are hard to generalize into something that works for arbitrary collections of documents.

Admittedly, reputation doesn’t work well in the online world

First, it’s hard to motivate people to rate documents. In doing so, it’s even harder to avoid offering perverse incentives. Systems that “rate the raters” just beg the question at a higher level.

But reputation doesn’t work so well in the real world either

Everyone knows the experience of going to a concert or restaurant that was highly recommended and hating it. The world comes with a great big NO WARRANTY sign. So let’s try to implement online rating systems, and use them to the extent they are useful.

Willingness to tolerate bad advice varies with the subject matter

Computers provide an almost ideal subject matter for community advice-giving, because if somebody gives you bad advice you usually experience consequences no worse than throwing away the recommended file or rebooting your computer. Compare this to the needs of one audience member during my talk, who actually searched for advice about how to fix her car online. Luckily, because she found the advice on the manufacturer’s web site, she was confident it would not be dangerous.

I have also received reports that bad advice can mess up a computer system to the point where it’s almost impossible to recover. One person who told me a story of this nature said in anger and frustration, “Writers should be held responsible for what they write.” I am not comfortable with the constraints on free speech this implies. But the issue highlights the value of ratings.

Ancillary Failings of User Education

The points in the previous sections were intrinsic, I think, to the process of community documentation. A few other failings are less inherent to the process but are important and should be noted.

English favored: While lots of online fora and documents exist in other languages, people wanting information on computer use generally still have to know English to get the latest, fullest, and best information. Many projects (such as the Linux Documentation Project, the Free Software Foundation, and the GNOME and KDE desktop projects) are working hard to translate documentation and create alternatives in many languages.
Cultural differences not respected: The previous failing was just a specific instance of a broader problem. There is an implicit culture in most online groups. People discussing computer topics tend to expect that readers will understand the concepts used, will ask questions when they need help, and will stand up for themselves when criticized. But many people have more hierarchical expectations; they may need hand-holding or want to find out who is considered the local authority. They may merely want people to be polite! They should not have to give up their cultural norms just to get information.
Different learning styles not respected: The percentage of drop-outs in the traditional K-12 educational system has dropped drastically over the past few decades, even though money is very limited in the school systems. The reason for their success in this area is that the field understands much better than in the past how people have different learning styles. In the computer field, we too can do better.
Gaps, haphazard coverage: Whenever one depends on contributions, one inevitably will end up with a glut in some topics and a deficit in others.

Key Points

Think of ourselves as a community

We are all responsible for educating each other.

Leverage what we have to offer through better organization and rating

Create portals, show people what to read and what order to read it in, and give indications what the most popular documents are.

Make use of professionals

Professionals offers many advantages, once projects find the money to pay them. Try to make use of:

editing, design, and other skills
existing high-quality documentation such as the Safari Bookshelf

Encourage new users and respect diversity

They represent our future.

Author’s home page

This work is licensed under a Creative Commons Attribution 4.0 International License.

Splitting Books Open: Trends in Published and Online Technical Documentation