What is Open Source Software?
In this report, “open-source software” (and “OSS”) will serve as shorthand for the more inclusive term "free and open-source software." Defined very simply, we mean software that is developed in such a way that its source code is open and available online, and explicitly licensed as such. In their Guidebook, It Takes a Village: Open Source Software Sustainability, Arp & Forbes cite the Open Source Initiative’s definition—“software that can be freely accessed, shared, used, changed, and/or modified”—and argue that this “fits well with the missions of organizations dedicated to documenting, preserving, and providing access to cultural and scientific heritage.”
Various claims have been made over the past two decades about the benefits of open-source software. Some claim that OSS provides a less expensive alternative to commercial software; some claim that the quality of the end product and/or the code itself is superior to proprietary software; some hold that OSS has a better chance of thriving over the long term because it can outlive the end of its institutional host or its commercial usefulness. Some hold that in areas like science and scholarship, OSS is part of an ethical imperative to keep academic work open and in free circulation. Most of these virtues are articulated as positives—but there is also a powerful negative incentive to promote and adopt OSS: the fear of lock-in and, ultimately, dependency on a corporate vendor.
In our research, we noticed that OSS is discussed and rationalized more often in terms of its values than its actual practices, so it behooves us to dig into how projects actually practice open-source. Brian Fitzgerald in 2006 wrote of a significant shift in how open-source software projects were being considered and operated. Fitzgerald noted that the rise of successful open-source software (which he called “OSS 1.0”) was characterized by self-organized, Internet-based projects that gathered loose communities around sheer willingness to participate. Fitzgerald identified a newer mode, which he called “OSS 2.0,” characterized by “purposeful design” and institution-sponsored “vertical domains,” and much more likely to include paid developers. Fitzgerald’s distinction is relevant to our study, as most (but not all) of the projects considered here fit within his “OSS 2.0” pattern.
Joel West & Siobhán O’Mahoney in a 2008 article, “The Role of Participation Architecture in Growing Sponsored Open Source Communities,” made a further helpful distinction: between openness for the sake of transparency and openness as accessibility. West & O’Mahoney saw that institutionally sponsored projects often tended to limit accessibility—which they characterized as community members’ ability to make changes and participate fully in governance. Transparency, as a less radical virtue, meant that community members could merely see what design and development actions were being carried out.
All sponsors worked to achieve significant transparency in their open source communities, but sponsors varied considerably in the importance they placed on providing accessibility to external parties. This distinction provides a more nuanced understanding of the tension between openness and control.
In our landscape analysis, we saw projects as ranged along such an axis of accessibility. At one end were projects being developed transparently, under an open-source license and which kept their code public in a Github repository; however, no significant contributions from outside the core team were encouraged. At the other end of the spectrum were projects that put community accessibility and participation first and for which a good deal of effort is made to encourage new contributors. Most projects fall somewhere between the two poles, but the tension between openness and control that West & O’Mahoney identify is an active one for many of the projects we discuss in this report.
The tension exists naturally enough because the current landscape is shaped by a blend of individual business goals with a growing ecosystem awareness that is concerned with the health of the overall sector, in a slow movement that is at least in part related to the rise of the Open-Access (OA) movement as an ecosystem-wide agenda. The idea that the publication and circulation of science and scholarship should not be controlled by profit-seeking corporations has led in recent years to a recognition that profit-seeking corporations, while possibly ceding ground on OA itself, had an almost total lock on the technological infrastructure that runs scholarly communication and publishing. Geoff Bilder, Jennifer Lin, and Cameron Neylon, in a widely cited statement, put it bluntly:
Everything we have gained by opening content and data will be under threat if we allow the enclosure of scholarly infrastructures. We propose a set of principles by which Open Infrastructures to support the research community could be run and sustained.
Elsevier’s 2017 acquisition of bepress—an institutional repository system and company that was begun by faculty at the University of California, Berkeley—has proven to be a watershed moment in how many understand the scholarly communications ecosystem. Reporting on the bepress acquisition, Roger Schonfeld wrote:
In a move entirely consistent with its strategy to pivot beyond content licensing to preprints, analytics, workflow, and decision-support, Elsevier is now a major if not the foremost single player in the institutional repository landscape.
This moment gave an enormous boost to the idea of “community infrastructure.” SPARC’s executive director, Heather Joseph wrote that the event “sent a shockwave through the library community.” There is no doubt that the fear of enclosure—in this case of infrastructure rather than the content itself—is a key motivator today.
The fear of enclosure is certainly not the only force driving open-source development. Many funding agencies require that software developed under a grant be released as OSS in order to keep the fruits of their funding from disappearing into some corporation’s vaults. There is also the hope, at least, of increased scale: a publisher or a library, interested to develop a bespoke tool, will find it difficult to justify the cost of development and maintenance if the only user will ever be itself. For many, the idea of open source implies a shared deployment model that distributes, if not the cost, at least the value, across a larger community.
OJS: Modeling publishing operations and open-source sustainability
With its conceptual origins in the late 1990s, followed by a first release in 2002, the Public Knowledge Project’s Open Journal Systems (OJS) provides an early and lasting model for community-supported open-source infrastructure project. OJS was released as a downloadable LAMP-based web application. It was adopted by a grassroots community of journal publishers and their (often institutional) supporters, one by one, until individual installations numbered in the thousands. Today, OJS is used by roughly ten thousand active journals around the world and as such represents the most widely used piece of open-source publishing software.
Sustaining OJS over so many years has been—and remains—a challenge. The PKP has looked to support itself via blend of research and infrastructure grants, institutional subsidy, hosting and publishing-services revenues, and a large quantity of goodwill in its community. But OJS has survived (and even thrived) in an often dire-looking funding climate. It has survived because it addresses a very real need on the part of its large user base, one recognized not just by its users, but also by funding institutions, libraries, universities, and advocacy groups. OJS, having been originally conceived as a strategic intervention into the world of journal publishing, now shapes a significant portion of that world.
Less obviously, OJS has also succeeded in establishing a set of de facto standards for how a peer-review journal should be run. By modeling the workflows and functions of a journal publisher in its software, OJS made explicit what was often implicit—or exchanged only in coterie groups. The result is that an entire generation of scholars has grown up with the OJS model. It now serves as a standard and a model for other projects, either as an exemplar to emulate, or as a point of departure for new design. As Chris Kelty pointed out to us, the pursuit of technical standards is also about standardizing practices; expert human labour is key to publishing.
Whether its longevity makes it the frontrunner in its class or ripe for replacement is a matter of opinion and perspective. We should not, however, underestimate OJS’s contributions to how people think about publishing—and publishing software. In a very real sense, OJS defined the space that this report now seeks to analyze.
OJS as one project among many
Despite OJS’s status as a kind of standard model against which other projects can be compared, there are many reasons why it makes less sense to do so. While the call today for community infrastructure may bring OJS to mind for many, there are also many other projects that define their scope, goals, and approach differently than OJS and, indeed, for the most part from one another. From a design point of view, OJS represents one possibility in a wide field of initiatives to create open-source publishing systems.
In such a large and varied landscape, developers and designers have carved out very distinct problem spaces and have different perspectives about which problems need to be solved and how exactly to go about solving them. Even among projects aiming to provide a full-stack journal-publishing platform, the aims and goals, and thus design decisions, vary widely enough that what we see is not so much competition over a particular niche as a proliferation of niches.
The result is a richly faceted landscape, but not one that lends itself to easy analysis. A proliferation of niches is both a boon and a curse. It is not, for instance, practical to “pick the winners” simply by looking at features and evident merits. It is not simple to connect the dots between manifest qualities like codebase, functionality, governance, and a project’s ultimate chances for sustainability or success, because the open-source publishing landscape is a dynamic ecosystem, where the component parts—projects, funding sources, standards, and labour—exist in relation to one another and influence one another. The landscape needs to be considered as whole, and not just as the sum of its parts.
Defining scope
This report covers more than 50 projects, identified through a broad environmental scan conducted between July and December 2018. Those 50-plus projects rarely, if ever, neatly line up for comparison. They all define their own scope, goals, and measures of success. This presents challenges for our analysis, and in many cases limits us to cataloguing these projects as opposed to evaluating them against an objective measure or standard.
We defined the scope of our environmental scan mostly negatively—that is, in terms of what we decided would be out of scope for this analysis. Our approach excludes the following categories:
Closed-source. For the most part, all of the tools catalogued in this report have an accessible code repository (mostly on Github) and are released under an open-source license.
Cloud-based services. We excluded online-only publishing services that do not offer their underlying code up for adoption. There are myriad such cloud services, especially in the burgeoning ‘open science’ community. But, as data-centric projects, the ability to download and adopt/adapt their code oneself is beside the point, and as such we decided these projects would not be part of our analysis.
Research tools. There is a rapidly expanding genre of open-source software that supports workflow for researchers and labs. These tools are sometimes referred to as “research communications” tools, but we excluded these in the first place because we made a distinction between research communication and publishing as traditionally defined. Second, and more pragmatically, there are so many of these projects—any of which might be useful in a publishing context, but for the most part operating outside of such a context. We are reminded of Pluto, and the reasons why astronomers decided not to include it in the list of planets in our solar system.
Library infrastructure. We excluded digital library infrastructure such as Samvera, Islandora, and DuraSpace. These systems operate in an ecosystem of their own, and while they may in some cases underlie publishing software, their scope is outside this project.
Baling-wire DIY projects. There are innumerable ‘publishing solutions’ that involve gluing together one or more open-source tools with a handful of automation scripts (often using a conversion tool like Pandoc and leveraging Github as a content management system). While we applaud these efforts (and have built them ourselves!), such ad-hoc toolchains do not in themselves constitute OSS projects on the scale with which we are concerned here.
Dormant. We initially gathered but later culled a number of projects that have had active lives and communities but did not appear to be active in the past two or three years. While the code for these projects is still accessible, the lack of an active developer or a sustaining community suggests that supporters have moved on to other projects. In some cases, a project will have been explicitly superseded by another. In other cases, developers will have left a project behind to join another. The latter phenomenon hints at a possible lesson: the mere existence of open-source code without an active developer who cares about it is not worth much, in practice.
Arp, Laurie Gemmill, and Megan Forbes. “It Takes a Village: Open Source Software Sustainability,” LYRASIS, February 2018. https://doi.org/10.7916/D89G70BS
Fitzgerald, Brian. (2006). “The Transformation of Open Source Software.” MIS Quarterly, 30(3), 587. https://doi.org/10.2307/25148740
West, J., & O’mahony, S. (2008). “The Role of Participation Architecture in Growing Sponsored Open Source Communities.” Industry and Innovation, 15(2), 145–168. https://doi.org/10.1080/13662710801970142
West & O'Mahoney, "Participation Architecture," 157
Bilder, G, Jennifer Lin, and Cameron Neylon. “Principles for Open Scholarly Infrastructures.” Science in the Open: The Online Home of Cameron Neylon (blog), February 2015. https://cameronneylon.net/blog/principles-for-open-scholarly-infrastructures/
Schonfeld, Roger C., “Elsevier Acquires Institutional Repository Provider bepress.” The Scholarly Kitchen (blog), August 2, 2017. https://scholarlykitchen.sspnet.org/2017/08/02/elsevier-acquires-bepress/
Joseph, Heather. “Securing Community-Controlled Infrastructure: SPARC‚ Plan of Action.” College and Research Libraries News 79, no. 8 (August 2018). https://doi.org/10.5860/crln.79.8.426
LAMP stands for "Linux, Apache, MySQL, PHP," the stack of open-source tools that defined the first major wave of web platform software Wordpress, Drupal, and OJS were all designed around this stack.
See Maron, Nancy, '“Understanding the Audience of the Public Knowledge Project‚ Open Source Software.‚” BlueSky to BluePrint, March 2018. https://pkp.sfu.ca/findings-from-community-consultation-2018/
See also, for detail: https://pkp.sfu.ca/ojs/ojs-usage/ojs-map/
In his popular book, How I Killed Pluto and Why It Had It Coming (2010; Random House) astronomer Mike Brown tells the story of Pluto's 'demotion' from the status of a planet to a 'minor planet.' The discovery of Eris, and indeed, of thousands of such objects orbiting the sun, implied that either the number of planets in our solar system would grow astronomically (pun intended) or we would need a new, tighter definition of "planet." In 2006, astronomers chose the latter course.