Open Sources 2.0/Beyond Open Source: Collaboration and Community/Patterns of Governance in Open Source

From WikiContent

< Open Sources 2.0 | Beyond Open Source: Collaboration and Community
Revision as of 19:06, 5 May 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
Open Sources 2.0

Steven Weber

The hardest problem facing a political community is how to increase the probability that the whole will be greater than, not less than, the sum of its parts. People join together voluntarily to solve problems because they believe that the group can do things that an individual cannot. They also believe (in some abstract sense and often implicitly) that the costs of organizing the group and holding it together will be smaller than the benefits the group gains. If you put aside for the moment the affective and emotional needs that individuals satisfy in groups and focus instead on the part of politics that is about problem solving, the bet that people make when they enter a political community is simply that "none of us is as smart as all of us"—maybe not on any particular issue or at any particular moment, but on the vast set of problems that human beings confront and try to manage over time.

It doesn't have to work out that way. Everyone has been part of a community or a company where the whole is less smart than the individuals who comprise it. Political systems often seem to suffocate under their own organizational costs—not just national governments, but smaller systems like city councils and co-op boards. And even if a community does create net benefits for at least some segment of the group, the distribution of those benefits can be so grossly unequal that most of the community members would be better off on their own. Get the balance wrong, and you can easily create situations where no one is as dumb as all of us.

These are very old problems confronting political thinkers. The rise of the Internet adds a small but significant twist by making it much easier to discover potential collaborators and pull together "affinity groups," networks of subcontractors, outsourced component makers for production systems, and the like—all of which are political communities that aim to solve some kind of problem. The core idea is joint production at a distance, the opening up of a universe of collaborative projects in which physically separated individuals contribute to the creation and refinement of a solution. Because the opportunities for creating collaborative communities have been expanded greatly by Internet technology, the boundaries and borders of existing communities are open for redefinition, and the possibility for new communities seems vast.

Which means that the stakes are high for getting it "right," or at least getting it right enough. This, I believe, is where some of the most important lessons of open source collaboration are likely to emerge. This chapter poses the question this way: if patterns of collaboration within open source communities were to become surprisingly pervasive, or pervasive in surprising places, what would this suggest about institutional design for communities of knowledge and practice in politics, outside of the realm of software or even technology per se?

To answer that question takes at least four steps. I first bound the question by limiting the argument to a class of problems most likely susceptible to open source-style principles. I then describe more precisely some of the theoretical issues at stake in group problem solving. The third section of the chapter lays out seven design issues that follow from the experience of what works (and does not work) within open source communities. The final section suggests some actionable implications. If we view the politics of problem solving through this kind of prism, what might or should we do differently?

Contents

The Empirical Problem Set: What Are We Aiming At?

Consider this proposition: some significant subset of social problems that communities confront are (or can be) structured as knowledge creation and/or problem-solving domains similar to the "problems" that the open source software community has found innovative ways to "solve." It would follow that the tools and governance principles of the open source software community, in some modified form, could yield new approaches to community organization and problem solving that build on but go beyond what is currently known about traditional institutions of formal government as well as the more informal notions of "civil society" and "communities of practice."

I think the proposition is defensible at least for a class of complex social problems that have three characteristics. The problems we are thinking about should be multi-dimensional in the sense that they call on several different realms of expertise. They should be large in scope, in the sense that they require some kind of division of labor to make progress. And they should be complex in their essence, not just in their implementation. I mean here problems that are substantively and inherently difficult to solve, not difficult only because of the failure of well-understood social or political processes to yield optimal outcomes. An example: if you want to build a new 100-story office building in Manhattan, you will need to pull together many different realms of expertise and organize a rather sophisticated division of labor. Some of the problems you confront will be idiosyncratic social and political issues—the metalworkers will strike because they know you really need them today, the neighbors will complain about the noise, and deliveries of certain materials will get "held up unexpectedly" somewhere until you pay a friendly fee to the person who can "fix" that problem. But even if you had the magic solution to all these issues, it would still be hard to create this building simply because it is a difficult engineering task to put together a mountain of steel, concrete, and glass that will hang together and stand up to wind, rain, gravity, and use over all the years that it will be there. It would still cost more and take longer than expected.

The analogy to software development should be obvious. Complex software is hard to build because it is multidimensional, because it demands a division of labor, and because the problems it is trying to solve are inherently hard. One of the earliest and still best analyses of complex software development projects is Frederick Brooks' The Mythical Man-Month.[1] Brooks' Law states one of the fundamental conclusions from Brooks' assessment: "Programming work performed increases with direct proportion to the number of programmers (N), but the complexity of a project increases by the square of the number of programmers (N2)." Brooks' Law, even if it is not precisely verifiable, is a powerful statement about the software engineering manifestation of a repeated observation on this point. It is hard to build complex systems in considerable measure because it is so hard for people to explain to each other what they are trying to do. Brooks' argument boils down to this simple but profound claim: human communication about complex, often tacit goals and objectives is imperfect. And it gets more imperfect, and at an increasing rate, as it travels between larger numbers of people. So, how do we ever get a functioning division of labor at a large scale to do things like build a New York skyscraper, or write a program with a million lines of code?

One way to manage this dilemma is to enclose the production process within a formal organization—for example, a proprietary software company. The ideal-type principles of organization here are command and control authority, hierarchical structure for decision making, and tight governance of principal-agent problems. Sustaining that kind of organization depends on maintaining control over the essential resources in the production process. In the software world, that means keeping source code secret. The open source community, by releasing source code, undermines the possibility of setting up the production system in the same way and energizes a quite different organizational model.

No one should bet on anything like a wholesale transfer of the organizational model(s) from the open source community to the nonsoftware world; that is too simplistic. What I think we should focus on instead is the means by which the open source community processes, collates, upgrades, corrects, distributes, and implements problem-solving information. In other words, think of open source as a particular kind of information processing algorithm. (It then makes sense to treat the related issues of intellectual property rights and organizational structures that are typically seen as core to the open source community as instrumental, not foundational.) What is foundational to transfer is the information processing "system" that is enacted in this community, and how the results of that process are incorporated into real solutions to practical problems.

It may seem quixotic to think about complex social problem solving in political communities as an information processing challenge. After all, we know that innovation in this setting traditionally is slow, constrained, inefficient, and frustrating. And we know, from the work of Max Weber and Joseph Schumpeter and extending into modern public choice theory in political science and management theory in business, some of the reasons why that is the case, in particular the organizational disincentives and cultural impediments to change that are inherent parts of bureaucratic culture and institutions.[2]

Clearly there are a lot of things going on in political communities besides poor information processing. But any experiment, even a thought experiment, has to start somewhere. The proposition here is that information processing is a significant impediment to problem solving in some important political situations—and that, if we can define a set of problem domains that fit this description, we can do something interesting by attacking the nature of the information processing problem first and then thinking about the organizational structure and political problem secondary to that. In other words, design the governance institutions in ways that facilitate information paths that we think will work, rather than the other way around. This is worth experimenting with in part precisely because it is the reverse of many conventional ways of thinking; and in part because we know more about the trade-offs associated with governance institutions than we do about the information processing issues.

In sum: think of the target as a set of problem-solving practices which necessarily include an information processing algorithm and the associated institutional structures and incentives that make that algorithm function in real-world settings. These practices will tap into distributed knowledge that in some cases may be present in geographically dispersed individuals or communities; in some cases may be present in separate pieces that have not been integrated into a single, useful whole; and in some cases may be implicit in relatively undefined or tacit practices that "belong" to individuals' experiences—but are for that very reason not available for use, testing, and refinement by larger groups. Primary care medicine is a good example. My doctor in Berkeley is often solving the same problems of diagnosis and treatment that a primary care doctor in Manhattan solved yesterday, but she has to re-create complex, tacit, and multifaceted knowledge that already exists elsewhere, because there is no structure within which that knowledge can be effectively shared. The bet you need to make to stay with me in this chapter is simply that an important subset of social and political problems fits in this category and might be attacked in this way.

The Theoretical Problem: How Is Knowledge Distributed?

Contemporary literature on "communities of practice" takes off from a very similar bet.[3] This literature offers a set of relatively obvious but useful design principles that appear to contribute to success. None of these principles really is well enough specified to be operational, but they are clearly worth keeping in mind as a checklist against which any system design can be compared. Roughly, they are:

  • Design for evolution (allow the community to change).
  • Open a dialog between inside and outside perspectives (tightly insulated communities tend to corrode).
  • Allow for different and bursty levels of participation (different people will participate at different levels, and any single person will participate at different levels over time).
  • Preserve both public and private community spaces (not all community interactions are public; backchannels should be available).
  • Focus on the value that is created for the people in the community.
  • Mix the familiar and the new.
  • Facilitate the creation of a rhythm (pure bursty-ness and unpredictability tend to corrode commitment).

These design principles actually presuppose quite a lot about the nature of the knowledge that the community of practice is trying to generate, organize, and share. I want to parse out some of the assumptions about that knowledge and some of the different ways it may be embedded in communities to illustrate this point.

Consider again the common saying "none of us is as smart as all of us." The operative assumption is that each one of us has bits and pieces of "good" (useful) knowledge and "bad" (wrong, irrelevant, or mistaken) knowledge about a problem. If Frederick Brooks was even partially right about the social dynamics of complex reasoning (and I think he was right), the demonstrated success of the open source process cannot simply depend on getting more people or even the "right" people to contribute to a project. It depends, crucially, on how those people communicate information to each other. Put differently, depending upon how the community selects, recombines, and iteratively moves information forward over time, the collectivity will become either very smart or very stupid.

I am just saying explicitly here what Eric Raymond implied about the open source process. It is not simply that "with more eyeballs all bugs become shallow." It depends directly on how those eyeballs are organized. And since I am treating organization as an outcome of what kind of information processing algorithm the community needs, to get to operational design principles means understanding better at least these two aspects: how knowledge is distributed in the community, and what the error correction mechanisms you can apply to that knowledge. In simpler language, who knows what, and how do you fix the mistakes?

We know from both intuition and experience that much of what a group needs to "know" to do something is in fact coded in the experiences, tacit knowledge, implicit theories, and data that is accessible to individuals. The problem for the group is that these individuals often don't know how to, aren't incentivized to, or haven't thought of sharing it with others in a mutually beneficial way. We know also that there is noise in the signal. At best, the pieces of distributed knowledge that (if they could be brought together effectively) make up a solution to a problem, are floating around in a sea of irrelevant or incorrect "knowledge."

In a changing and uncertain environment, with strategic players who sometimes have economic incentives to mislead others, and a relatively low tolerance for cascading failures that hurt human lives, the law of large numbers won't solve this problem for us. That is a complicated way of saying that we can't afford to wait for evolutionary selection. Most of evolution is wasted resources. It is extremely inefficient and slow, destroys enormous amounts of information (and protoplasm), and can't backtrack effectively. No one wants this for human systems and it's not clear that we should tolerate it. We need an engineered system.

We also know that this is a very tall hurdle to get over. Large firms commit huge resources to knowledge management, and with very few exceptions (Xerox's Eureka project is notable here) these investments underperform. These systems fail in a number of distinct ways. The most common (and probably the most frustrating) is simply that nobody uses the system, or not enough people use it to generate sufficient interest. More troubling is the failure mode in which the "wrong" people use the system—people with good intentions who happen to have bad information, or people who might be trying to game the system or intentionally insert bad information to advantage themselves over others in a manner that is either cynical or strategic, depending on how you look at it.

There are other potential failure modes, but the point is to recognize that there is no inherent ratchet-up mechanism for knowledge management. The system could deteriorate over time in several ways. People could share mistakes with each other and scale them up. People could reuse past experiences which are seen as successful in the short term or by particular individuals, but actually are failures overall from the long-term perspective of the community. You could attract the wrong "experts" into your network, or perhaps more likely use experts for the wrong purpose. And you could populate a database with garbage and produce multiplying wastes of effort and cascading failures of behavior. All of us have worked in organizations or communities that have suffered from knowledge management failures of at least one of these types.

But put the community in the background for a moment, and consider the problem from a microperspective by imagining that you are a person searching for a solution to a problem within that community. Now, how knowledge is distributed directly affects the search problem that you face. There are at least three possibilities here.

Case 1 is where you have a question, some other individual has the answer, and the problem for you is whether you can find that person and whether that person is interested in sharing with you what she knows. Case 2 is where no single person has the answer to your question; instead, pieces of the answer are known by or embedded in many people's experiences. The relevant bits of information float in a sea of irrelevant information; your problem is to separate out the bits of signal from the noise and recombine them into an answer. Case 3 is a search and discovery problem. Some of the knowledge that you need is floating around in disaggregated pieces (as in Case 2) but not all of it; you need to find and combine the pieces of what is known and then synthesize answers or add to that new knowledge from outside the community itself.

Here's where your dilemma gets deeper. You don't know to start if you are facing Case 1, 2, or 3. And it matters for what kind of search algorithm you want the system to provide for you. For example, should you use a snowball method (go to the first node in the network and ask that node where to go next)? Or some kind of rational analysis rule? Or a random walk? Or maybe you should just talk to the people you trust.

And now consider the dilemma from the perspective of the person trying to design the system to help you. She doesn't know if you are an expert or a novice; or how entrepreneurial or creative you are; or what your tolerance will be for signal-to-noise ratios; or whether you can more easily tolerate false positives or false negatives.

The history of the open source community as it navigates some of these dilemmas, some of the time, suggests a big lesson: it's impossible to "get it right" and it's not sensible to try. What is more sensible is to try to parse the uncertainties more precisely so that we can design systems to be robust. More ambitiously, to design systems that can diagnose to some degree and adapt to uncertainties as the system interacts with the community over time. A second big lesson of open source is the high value of being both explicit and transparent about the choices embedded in design principles. The next section incorporates both of these lessons into a set of seven design principles for a referee function, inspired by patterns of collaboration within open source communities, that just might make sense for a community of knowledge and practice in politics.

Design Principles for a Referee Function

Voluntarism is an important force in human affairs, and the open source software process would not work without it. But harnessing the efforts of volunteers is not enough to build a piece of software or, for that matter, anything else that is even moderately complex. As I've said elsewhere, the reason there is almost no collective poetry in the world is not because it is hard to get people to contribute words. Rather, it is because the voluntary contributions of words would not work together as a poem. They'd just be a jumble of words, the whole less than the sum of its parts.[4]

In my view, this implies that the bulk of social science research that tries to parse the motivations of open source developers, while interesting, basically aims at the wrong target. Noneconomic motivations (or at least motivations that are not narrowly defined by money in a direct sense) are a principal source of lots of human behavior, not a bizarre puzzle that requires some major theoretical innovation in social science. The harder and more interesting question is governance. Who organizes the contributions and according to what principles? Which "patches" get into the codebase and which do not? What choices are available to the people whose contributions are rejected?

The real puzzles lie in what I'll call the "referee function," the set of rules that govern how voluntary contributions work together over time.

In other words, what makes the open source process so interesting and important is not that it taps into voluntarist motivations per se, but rather, that it is evolving referee functions that channel those motivations, with considerable success, into a joint product and that it does so without relying on traditional forms of authority. No referee function is perfect, and among the variety of open source projects, we can see people experimenting with different permutations of rules. I believe I can generalize from that set of experiments to suggest seven discrete design issues that any referee system will have to grapple with. Certainly this is not a comprehensive list, and the seven principles I suggest are not sharply exclusive of each other. Each incorporates a tradeoff between different and sometimes competing values. And I am not proposing at this point where to find the "sweet spot" for any particular community or any particular problem-solving challenge; my goal is much more modest than that. The point here simply is to lay out more systematically what the relevant tradeoffs are so that experiments can explore the underlying issues that might cause groups to move or want to move the "levers" of these seven principles in one direction or another over time.

Weighting of Contributions

No problem-solving community is homogeneous (in fact, that's why it makes sense for individuals to combine forces). Not everyone is equally knowledgeable about a particular problem. Different people know different things. And they know them with different levels of accuracy or confidence. A referee system needs a means for weighting contributions and it should reflect these differences so that when information conflicts with other information, a more finely grained judgment can be made about how to resolve the conflict. Mass politics teaches us a great deal about bad ways to weight contributions (for example, by giving more credence to information coming from someone who is tall, or rich, or loud). One of the interesting insights from the open source process is the way in which relatively thin-bandwidth communication—such as email lists—facilitates removal of some of the social contextual factors in weighting which are ultimately dysfunctional. Tall, handsome men have a significant advantage in televised political debates, but not on an email list. Collaborative problem solving at a distance probably leans toward egalitarianism to start. But egalitarianism does not automatically resolve to meritocracy. The transparency of any algorithm is both desirable and risky—desirable because it makes visible whose contributions carry weight and why; and risky because, well, for exactly the same reasons.

Evaluating the Contributor Versus Evaluating the Contribution

A piece of information can in principle be evaluated on its own terms, regardless of its source. But in practice it is often easier to (partially) prequalify information based on the reputation of the person who contributes the information. Take this to an extreme—trusted people get a free ride and anything they say, goes—and you risk creating a winner-takes-all dynamic that is open to abuse. But ignore it entirely and you give up a lot of potential efficiency—after all, there is almost certainly some relevant metadata about the quality of a piece of knowledge in both what we can know about the contribution and what we can know about the contributor. eBay strongly substitutes the reputation of the person (seller or buyer) for information about what is at stake in a particular transaction. I suspect that software patches submitted to Linux from well-known developers with excellent reputations are scrutinized somewhat less closely than patches from unknown contributors, but that's only a hypothesis or a hunch at this point. We don't really have a good measure of how large, open source projects actually deal with this issue, and it would be a very useful thing to know, if someone could develop a reasonable set of measurements.

Status Quo Versus Change Bias

The notion of a refereed repository, whether it is made up of software code or social rules or knowledge about how to solve particular problems, is inherently conservative. That is, once a piece of information has passed successfully through the referee function, it gains status that other information does not have. Yet we know that in much of human knowledge (individual and collective), the process of learning is in large part really a process of forgetting—in other words, leaving behind what we thought was correct, getting rid of information that had attained special status at one time. The design issue here is just how conservative a referee function should be, how protective of existing knowledge. There are at least two distinct parameters that bear on that: the nature of the community that produces the knowledge, and the nature of the environment in which that community is operating. Consider, for example, a traditional community that is culturally biased toward the status quo, perhaps because of an ingrained respect for authority. This community might benefit from a referee function that compensates with a bias toward change. If the community is living in a rapidly shifting environment, the case for a change bias is stronger still. The parameters could point in the other direction as well. Too much churn in a repository would rapidly reduce its practical usefulness, particularly in a problem environment that is relatively stable.

Timing

Separate from the issue of status quo versus change bias is the question of timing. How urgently should information be tested, refereed, and updated? The clear analogy in democratic electoral systems is to the question of how frequently to hold elections—which is obviously a separable question from whether incumbents have a significant electoral advantage. A major design consideration here follows from a sense of just how "bursty" input and contributions are likely to be. Will people contribute at a fairly regular rate, or will they tend to contribute in short, high-activity bursts followed by longer periods of quiet? We know from the open source process that contributors want to see their work incorporated in a timely fashion, but we also know that speeding up the clock makes increasing demands on the referee. This is probably one of the most difficult design tradeoffs because it is so closely tied to levels of human effort. And it's made harder by the possibility that there may be elements of reflexivity in it—that is, a more rapidly evolving system may elicit more frequent input, and vice versa.

Granularity of Knowledge

Modular design is a central part of open source software engineering. The question is where to draw the boundaries around a module. And that is almost certainly a more complicated question for social knowledge systems than it is for engineered software. No referee function can possibly be effective and efficient against many different configurations of claims of knowing things. And there is likely to be a significant tradeoff between the generality of information, the utility of information, and the ease and precision of evaluation. Put differently, rather general knowledge is often more difficult to evaluate precisely because it makes broader claims about a problem, but it is also extremely useful across a range of issues and for many people if it is in fact valid. Highly granular and specific knowledge is often easier to evaluate, but it is often less immediately useful to as many people in as many different settings precisely because it is specific and bounded in its applicability.

System Failure Mode

All systems, technical and political, will fail and should be expected to fail. In the early stages of design and experimental implementation, failures are likely to be frequent. At least some failures and probably most will present with a confusing mix of technical and social elements. How failures present themselves, to whom, and what the respective roles of systems designers, community members, and outsiders are at that moment, are critical design challenges. In Exit, Voice, and Loyalty, Albert Hirschman distinguished three categories of response to failure—you can leave for another community (exit), you can stick with it and remain loyal, or you can put in effort to reform the system (voice). One of the most striking features of the Linux experience is that this community, by empowering exit and more or less deriding loyalty, has had the effect of promoting the use of voice. It is precisely the outcome we want—a system that fails transparently in ways that incentivize voice rather than exit (which is often extremely costly in political systems) or loyalty (which is not a learning mode).

Security

How to design and implement security functions within a referee system depends sensitively on the assumptions we make about what the system needs to guard against. In other words, what level and style of opportunism or guile on the part of potential attackers or "gamers" do we believe we ought to plan for. This is simply a way of saying that no system can be made secure against all potential challenges. Security is always a tradeoff against other considerations, in particular ease of use, privacy, and openness. And security likely becomes a greater consideration as the value that the system provides rises over time. Hackers and crackers—whether benign or malicious in their intentions—are an important part of software ecologies precisely because they test the boundaries of security and force recognition of weaknesses. Can political communities be designed to tolerate (and benefit from) this kind of stress testing on a regular basis?

What Should We Do Differently?

Political ideas, like democratic experimentalism and distributed community problem solving, share some central characteristics with the open source software process.[5] In my view, the most interesting intersections lie in the configuration of referee functions—how any system decides that some "code" is better than others and should be incorporated into an interim package, how long it should stay there, how it can be removed or modified and by whom, how it ought to be configured to interface with other code, and what happens when the system breaks down. These are constitutional questions in a profound sense, in that they reflect on the constitutive elements that make up a community (even if they are not legally enshrined in something that people call a constitution).

Open source communities are tackling all of these problems, with varying degrees of self-consciousness. The patterns and practices of collaboration within open source communities are evolving rapidly, and that provides interesting experimental insights that can travel outside the software and technology worlds. The seven design principles I laid out earlier are not optimization functions. They are explications of trade-offs; understanding them is a prerequisite to smart experimentation. So, the first thing we should do differently, or more than we do at present, is to instrument some of these experiments and tighten up the feedback loops so that we learn more quickly and more precisely about what happens when you slide the levers of these seven design principles into different configurations.

The second thing we can do differently is to get precise and transparent about the overall goal of communities of knowledge and practice in politics. Open source communities provide a very good template for a broad statement of goals. I propose that when designing a system, you ensure the following:

  • The system has effective individual incentives, organizational structures, and information technology tools...
  • To pull together distributed knowledge within communities that are trying to solve practical problems...
  • By combining pieces of knowledge into something useful in a manner that...
  • Ensures that error correction exceeds the rate of error introduction as the system "learns," while...
  • Maintaining the process over time in a sustainable, nonexploitable, and expandable way.

The challenge and opportunity here are highly general across political communities. And they are going to get more important in the future, particularly as the economics and demographics of advanced industrial countries continue to drive many of the social welfare functions that have for some time been provided by the public sector, out of that sector. Some of these functions, of course, get moved into the private sector. And social scientists have learned a great deal in the last 20 years about the upsides and downsides of what is commonly called "privatization." We argue around the margins about how to engineer the transition and we argue about the overall efficacy and desirability of the outcomes, but at a high level we do understand a fair amount about sensible governance principles and the tradeoffs they engender in the private sector setting.

We know much less about how to set up systems for moving some welfare provision functions into the civil society space, which is neither public sector nor private sector per se. Open source-style collaboration is, in a real sense, a form of technologically enabled civil society. And so the third thing we can do differently is to mine the experience of open source for lessons about how to create pragmatic, workable alternatives to privatization that can be implemented and can evolve within a developing civil society space.

Notes

  1. Frederick P. Brooks, The Mythical Man-Month: Essays on Software Engineering, 20th Anniversary Edition (Addison Wesley, 1995).
  2. Some classic readings are in Max Weber, Economy and Society (University of California Press, 1978); Essays in Sociology (Routledge & Kegan Paul, 1958); and Joseph Schumpeter, Capitalism, Socialism, and Democracy (Allen and Unwin, 1958).
  3. See, for example, Etienne Wenger, Communities of Practice: Learning, Meaning, and Identity (Cambridge University Press, 1999).
  4. Steven Weber, The Success of Open Source (Harvard University Press, 2004).
  5. See, for example, M.C. Dorf and Charles Sabel, "A Constitution of Democratic Experimentalism" (Columbia Law Review, 1998).
Personal tools