From Open Source Software to Open Culture: Three Misunderstandings

January 28, 2009

The open source movement started with free software. But the astonishing popularity in the 1990s of high-profile projects such as GNU/Linux, and the sudden attention that free software garnered as the critical engine behind the new Internet-based economy and culture, unlatched open source from its original computing origins and thrust it upon the greater society as meme and model for running businesses, conducting research, delivering education, and even developing government policy. A plethora of terms revolving around the open source ideal—peer production, crowdsourcing, the wisdom of crowds, prosumerism—drive the most exciting projects in these areas.

Open source is powerful and effective in all these domains. But the original practice and promise of open source software (which for the sake of this article can be considered the same as free software) is unique. The software experience cannot be ported—to use a computer programming term—whole-hog into other areas such as sharing songs or organizing public forums.

It’s worth looking at what goes into creating open source software, and what unique traits of software make the open source process work well there. The open source model plays out differently in other fields. Its power may still carry the day, but for somewhat different reasons than those created Linux and the mighty Internet utilities.

This article attempts to clear up common misunderstandings by explaining the following points:

Open source software is fundamentally about dependability, not participation
Open source software is a compendium—not a blend—of contributions
Open processes won’t automatically uncover weaknesses and errors

Open source software is fundamentally about dependability, not participation

Thanks to the thousands of programmers who have flocked to the most popular free software projects, open source has been linked inextricably in the public’s mind to peer production. Openness and participation seem to go together.

One fall-out is the big disappointment felt by earnest proponents of freedom when no one comes to write on their wiki, comment on their weblog, build new harmonies on their percussion tracks, or provide an ending to their story. Programmers could console them with the news that, on most free software projects too, nobody but the original software designer ever contributes anything. While thousands of projects benefit from strangers coming along and tossing a vegetable in the soup pot, the majority don’t enjoy such participation.

You can’t even say that group participation necessarily makes free software better. Who claims they could improve on the free software produced by Donald Knuth, author of the classic Art of Programming and creator of T_EX?

But the point of free software is not to get contributions from others. When it happens, of course, it’s wonderful. But the point of free software is to give users the confidence that it will always be available and that it will keep working. Dependability is the key selling point.

My point is underlined by the famous incident that reportedly led Richard M. Stallman to start the free software movement. (Of course, free software existed from the moment the first written computer code was invented, but the programmers did not consciously label it as a movement.) It supposedly all started with a poorly operating laser printer, a daily occurrence that most of us shrug off. When Stallman found he couldn’t change the printer, he changed history instead.

Stallman and other proponents of free software would argue with my claim here and say that the point of free software is (what else?) freedom. Yes, freedom is a wonderful thing, but we don’t want software to be free just so we can read it for fun (although many students learn a lot that way). We want it free so that we can fix the bugs, port it to new platforms, add features we need, strengthen its security, and improve its performance. Dependability is the goal toward which freedom leads.

A large number of free software projects succeed in creating useful, high-quality software used by many people, based on the work of just a small, tight-knit team.

So free software doesn’t have to involve participation. But most other forms of open culture do. What good is crowdsourcing when only you and your best friend offer an opinion? What benefit does a Creative Commons Share Alike license convey if the audience passively views your work in the same way they do a conventional copyrighted work?

Let’s look at the other half of the equation. Does mass participation require open source? Of course not. The slew of social networks such as MySpace and Facebook that have sprung up over the past few years keep a tight hold on everything you put there. (Facebook provides an API to retrieve data, but so do many other sites that are in no way open source.) People seem entirely willing to invest hard work and creative ideas into projects that are wholly owned by somebody else.

The idea of open source was engendered by a nearly unique quality of software: most large programs have two or more instantiations, only one of which is easy to read and modify. The executable program you run on your computer is usually machine code, unreadable except to those who want to hack it. (And there are many socially valuable reasons to hack it—I should say to reverse engineer it—although the public is usually aware of hacking only when they find their systems compromised by a malicious virus.)

So the version of the program from which your executable comes is the program’s source code, from which the term open source is derived. The free software movement is dedicated to ensuring that anyone with an executable program can obtain the source. And the movement fiercely battles tricks that some programmers use to provide the source in such a way that you can’t build a new executable.

Languages that dispense with the distinction between source and executable, such as BASIC, JavaScript, and Perl, are increasingly popular. You can view the source of any web page in your browser (although some functions might be hidden in inaccessible files). For code in such languages, the notion of openness and freedom reduces to the same issues as when a book is shared: they are merely a legal statement about copyright issues, because the source code is already exposed.

The distinction between source and output exists for other forms of content as well, but the gap between them is usually less of a cause for concern. Hardware is usually easier to disassemble and categorize than software executables. Documents distributed in PDF format are harder to change than the source formats from which they came, but all the content you need is still in plain view.

Sometimes documents need open source as much as software does. The major reason is also the same: dependability. For instance, if a government office stores critical data in a closed format such as Microsoft Excel, it might not be able to retrieve that data several years in the future because the format can be read only by an obsolete software application that doesn’t run on current computer equipment. This risk impels the movement to adopt the Open Document Format standard, which governments in many US states and countries have pursued to varying degrees of success.

Open document formats, like free software, also permit innovative tools from the user community.

So open source has great potential for peer production. But open source software isn’t fundamentally impelled by it. Open source efforts in most other areas take it as a given. Without participation, a declaration of openness in these areas only creates a potential for reaping benefits. The potential energy of openness goes into motion when other people come.

Open source software is a compendium—not a blend—of contributions

Open source trends in culture and public policy start with the premise that many far-flung participants, many of them with only modest amounts of skill and education, can produce a better result than a tight coterie of experts. The magic behind this axiom was researched by James Surowiecki for his widely cited book The Wisdom of Crowds. The research is compelling, but the vision for participation it lays out is quite different from what happens on open source software projects.

The canonical example of crowdsourcing, which begins The Wisdom of Crowds, is a country fair where attendees guessed the weight of a ox. The actual weight was one pound away from the averages of all guesses. This incident was not a fluke, but one representative of a principle that has led to prediction markets in all sorts of areas.

Prediction markets merge all contributions into a single result, submerging the individuality of each contributor. There are other forms of crowdsourcing, but most adhere to the same principle: processing the input to produce a unified outcome. For instance, public policy is always a compromise among many positions, and nobody gets exactly what he wants (so long as the process is really neutral).

Free software is developed very differently: the individuality of each contributor is preserved. Yes, many people can make suggestions and fix bugs. But the nitty-gritty of coding is entrusted to individuals, and the identity of each contribution is clearly preserved in the code archive.

The Linux kernel project actually leaves the rights to each code contribution in the contributor’s hands, which has serious ramifications. If the managers of the project ever decided they wanted to switch to a different license, doing so would be completely unfeasible because they’d have to find each developer and get his assent. (They currently use the GNU GPL version 2, and are happy with it even though version 3 was released. But they’d be stuck if their decision went differently.)

So a wide range of contributors can play a role in creating free software, but the result is not a blend. And you don’t want amateurs playing around in the source code, for the reasons of dependability laid out in the previous section. Open source software projects are not like wikis, allowing everyone in. Changes are tightly controlled by a set of core developers, who use elaborate voting procedures (or fiat from the lead developer) to admit new members.

For quality control, new submissions may have to meet certain technical requirements, such as passing a barrage of tests. But these are secondary; the key to getting a submission into the repository is intense scrutiny by lead developers.

Not everyone waits for approval. It’s quite common for someone to make a copy of a free source project and insert his own changes into the copy. You can be sure, however, that he keeps just as strict control over his copy as the other developers do over the original.

To some extent, contributions blend in open source software. Different developers touch it and improve sections of code. But a developer often decides to replace a whole file or set of files—not to blend in his changes, but to start over from scratch.

This tendency may actually be one of the great hidden strengths of free software. Commercial firms and conventional software departments have great difficulty upgrading old code. It takes a huge commitment of programmer time (that is, money that departments probably didn’t put in their budgets) and risks disrupting the users. This inertia is the reason for the Y2K crisis, where thousands of firms had to rush to upgrade or replace COBOL code they had left around for thirty to fifty years.

In contrast, open source projects attract volunteers who are interested in showing off their skills (as well as helping their end users) and are willing to carve out enough time to do whatever it takes. One often finds two or more developers doing entirely new projects with the goal of replacing a part of the system. One code line may ultimately win out, the developers may eventually combine their efforts, or multiple alternatives may be incorporated to give the end-users a choice.

These coding cowboys do face constraints. If other parts of the system are invoking on a module, for instance, to store information about the states of ongoing transactions, the new programmer still has to store all that information and serve it up through an interface. But she can completely rearrange what goes on inside the module, and perhaps fix a bug that was intractable before or speed up operation a hundred times.

If open source projects develop the kind of inertia that in-house projects are known for, someone eventually gets fed up and starts an entirely new project. And new projects routinely replace old ones as users discover that the new ones outstrip the old ones in stability or performance.

One of the success factors in crowdsourcing (recognized by Surowiecki and many other authors) is a degree of separation between contributors—a somewhat counter-intuitive barrier between people working on independent solutions. The habit of free software developers to go off and code up something new helps preserve this factor.

Outside of software, many projects try to recognize individual efforts. A wiki records the authors of a document, and a government feedback site may indicate who made a suggestion. But documents tend to end up as blends, and so do government policies. That’s the strength of the open process in cultural artifacts and policy. Blending has some relevance in software, too, but the balance shifts toward a combination of discrete contributions from individuals.

Music mash-ups sometimes also maintain the individuality of the constituent parts. The ability to recognize a riff, drum beat, or lyrics from another song, along with the connotations they bring, keeps these mash-ups somewhat of a compendium as well as a blend.

Open processes won’t automatically uncover weaknesses and errors

Perhaps the most misleading concept I need to address in this article is the notion that error-checking is radically enhanced by spreading document review among a large number of people. This idea probably originated in the famous line “Given enough eyeballs, all bugs are shallow,” which Eric Raymond put in his famous essay The Cathedral & the Bazaar. The essay (along with two others) was published in book form by my employer, O’Reilly Media, and also appears online.

Raymond himself offers several explanations for his observation, some of which could apply to non-programming open projects and some that probably could not. But I think the basis for his observation lies in the concepts of test coverage and of a code path, a phenomenon unique to software. To understand it, you have to think like a programmer for a minute.

Suppose you are designing a contest, and employees of your company are not allowed to participate. If an applicant signs up for the contest, your program is responsible for checking their information (which I’ll call “applicant_info”) against a list of company employees. The code is structured like this:

if ( found_in_employee_list(applicant_info) )
  {
    [Code path #1: show the applicant a rejection message]
  }
else
  {
    [Code path #2: register the applicant]
  }

In other words, if the applicant is an employee, the program executes code path #1 and does not execute code path #2. The reverse is true if the applicant is not an employee.

Sophisticated tools are available to programmers to figure out all the code paths through a program, but these grow exponentially as you add if blocks (and other control structures), so no one can cover the millions of possible paths in a typical production-ready program.

Raymond’s “all bugs are shallow” claim applies when you release a program to a huge number of testers. (Proprietary code can also be released to a large public, but it rarely receives the kind of frequent and widely-tested releases that open source programs do.)

Suppose there’s a bug in code path #1. People outside the company will never invoke the bug because their applications don’t invoke code path #1. Testers may also skip that path for lack of time. But eventually, a company employee will test it and will trigger the bug.

Open source programs get a wider variety of testers, running all kinds of platforms and doing all sorts of things. So most code paths are invoked quickly and bugs are discovered. And suppose nobody happens to execute a code path? The bugs on that path won’t be discovered. But in that case, nobody cares. The bug has no effect, and can stay in the code forever without hurting anyone. Responsible programmers consider the code path to be dead weight, best removed to cut down program size. But the main thing to worry about in an unused code path is a flaw that allows a security breach—all too common in untested code—which would sit like a land mine waiting to be triggered by a malicious user.

The system isn’t absolutely guaranteed, obviously. The Domain Name System, which has multiple open source implementations, contained a design flaw from the start that no one discovered (so far as we know) until researcher Dan Kaminsky figured it out this year. And there are many other types of bugs that don’t depend on code paths: resource limitations, problems with concurrent operations that try to modify the same location in memory, poorly designed displays, and so on. But we’d eliminate a lot of bugs if test coverage in every program was absolutely complete.

The documents exchanged on open culture and policy projects don’t adhere to the same principle. And passing them around a thousand casual readers won’t exercise them the way passing software around a thousand casual users will exercise the software.

Open inspection is still a good idea. Somebody may find an error by chance. You can rest assured that I showed this article to a number of colleagues before publishing it. But inspection is a good idea mostly because it can elicit objections and new ideas from representatives of different populations and constituencies. This is the diversity in the same sense as the groups of users that test different code paths in software, but it’s more of a basis for discussion than an ironing out of errors.

It’s amazing that an idea arising in one narrow field—open source—can inspire so many variations in other areas. But we can’t blindly apply all the lessons of the free software movement to other domains. Many attributes in these domains that proponents commonly compare to open source could easily go by more traditional terms, such as transparency, public review, or (here’s a fresh and challenging idea) honesty.

And when we speak of the wisdom of crowds, we should remember that it’s central to many initiatives in business, cultural production, and public policy, but better understood in the field of software as an option that enhances the fundamental power of open source.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Andy Oram is an editor at O’Reilly Media. This article represents his views only.

Author’s home page
Other articles in chronological order
Index to other articles