Over the past two decades, a new breed of publishing infrastructure has emerged via open-source software. Where publishing toolchains had previously been almost entirely populated by proprietary and often bespoke software systems, we now see a proliferation of open-source projects available for adoption and integration—on a different economic and operational footing. Many such projects have been designed and developed by a single institution to suit its own particular needs, but the terms of open-source software licensing, deployment, and indeed governance mean these systems are also readily available to other institutions. At a more ambitious level, they may even form a layer of community infrastructure that rivals—or at least provides a functional alternative—to the commercial infrastructure run by a small number of for-profit entities.

That such a proliferation of open-source projects now exists is a boon, but the landscape is noisy and difficult to understand as a whole. There is no guidebook or map to this landscape—a problem the present report seeks to address. MIT Press, in its 2018 application to the Andrew W Mellon Foundation, identified the need for a "comprehensive and critical analysis of OS publishing systems in active use" that "could prove to be durable alternatives to complex and costly proprietary services." The present report is the result of that research and analysis.

Our hope is that this report will provide the university press community and other mission-focused enterprises with both an overview of the open-source landscape as well as profiles of a good number of these projects individually. Our intention is to shed light on the development and deployment of open source publishing technologies in order to aid institutions' and individuals' decision-making and project planning.

There is enormous value in the collection of open-source projects surveyed here in terms of raw functionality as well as in the ways that prototyping and the evolution of design materially change the ways in which we think about publishing and scholarly communications. At a more detailed level, this report seeks to encourage the adoption and continued development of these platforms, but also to encourage the development of the community and market environment that surrounds these efforts.

As such, while this report provides a catalogue of individual open-source publishing tools (see Part II), it also examines the ecosystem in which these tools and projects exist. If publishers are to develop or find robust, cost-beneficial alternatives to commercially obtainable services and systems, it will not be simply because free tools exist; rather, it will depend greatly on community practices and the integration of various tools into a broader interoperable context. The idea of community infrastructure is not just a collection of bits of technology, but a system in which these components can be mobilized to serve larger goals.

Method

Our team at the Canadian Institute for Studies in Publishing at Simon Fraser University began this landscape analysis in summer 2018 by assembling a master list of projects. We were helped by a number of existing lists of projects and initiatives that had been compiled by various colleagues (notably from Terry Ehling, Kevin Hawkins, Peter Suber, Adam Hyde, the Radical OA Collective, and JROST). From this beginning we needed to filter the list---in the first place for projects that fit the scope of our project: available, documented open-source software relevant to scholarly publishing. Second, we sought to identify projects that were ‘still alive’—that is, with evidence of active development. This latter criteria is somewhat difficult, because the Internet tends to flatten history---things from decades past appear alongside much more recent contributions. There is no telltale yellowing of web pages. The sifting of old, dormant projects with vibrant-sounding websites from active projects that people still care about took considerable time and attention.

By mid-winter we had assembled a list of approximately 85 projects that appeared to be active. In the early months of 2019, we did a deeper dive into these projects, locating their code repositories (almost always on Github, with a handful using Gitlab instead), tracking down details of personnel, funding, and especially, evidence of partners and collaborators. We travelled to conferences, asked questions over email, and conducted dozens of Skype, Zoom, and even old-fashioned telephone calls. By April, we had winnowed the list down to approximately 50 projects. Some we dropped because it became apparent the projects were in fact dormant; some because we decided they were out of scope for our project; some we realized were part of larger assemblages. We believe that the current list represents the field well. That said, this is a dynamic space, and our cataloguing is a snapshot of a moment in time. By the time you read this, some of the details will already be out of date.

The present report is in two parts. Part I provides some high-level analysis of the landscape and the projects within it. In the first section of Part I, we discuss the scope of this report, define some working terms, and set the larger context for the projects we survey. Next, we attempt to break down the field along a number of axes, providing some provisional categorization of the projects—from their goals and organizational structures to specific technological approaches. In the final section of Part I, we explore the prospects for sustainability, collaboration, and interoperability within the current landscape, and suggest some opportunities for new initiatives based on this analysis.

Part II of this report is a catalogue of the projects themselves. For each open-source project, we provide a summary description plus details on the host organization, the project's principal investigator or leadership, funders, partners (both strategic and development), date of original release, and current version. We also include some basic data drawn from the Github/Gitlab repositories for these projects, including development language, license, number of contributors. Our initial ambitions to conduct a full "Github audit" proved not feasible, because most of the projects surveyed are small, and with varying project management and organizational approaches—as such the metrics Github provides on activity are not useful in a comparative context.

Key themes

While the primary focus of our research, and of this report, is software and software development—functionality, code, developers, partners, and funders—the themes we have kept in mind throughout have to do with sustainability, scale, collaboration, and ecosystem integration. Through all of our research, and our investigations of dozens of projects, the question in the back of our minds is always who will care about these projects? Their project leads and PIs of course care, but beyond the inner circle of active agency... who else will care enough to fund, contribute, promote, use, and ultimately further the useful life of these projects? What are the values and mechanisms that cause people—especially external stakeholders—to care enough about these projects to keep them alive, and even thriving, going forward?

There are a great many projects here. From a distance, if one squints, some of these projects seem to cover the same ground, to provide the same functionality. Looking closer, however, the overlap is less obvious; indeed, it becomes clearer that each project is designed or evolved to fit a particular niche, to solve a specifically formulated problem. The result is a complex, multi-faceted landscape that defies easy categorization, let alone identifying "best-of-breed" applications from among several contenders.

When we were conceiving of this landscape report, we talked of its role as a gap analysis, where amid the many development initiatives, we might identify an underserved area where new development would be most valuable. But this has not proved to be an obvious outcome of this study; rather, there is a lot of functionality out there—a lot of code, a lot of thinking, and a host of very context-specific approaches to basic publishing functions.

What there isn't much of is coordination between these approaches. There isn't a good deal of interoperability between many of these projects, and there is in places definite overlap in goals (if not in specific strategies). We've noted that there aren't obvious incentives for collaboration between projects. As such, if there is a 'gap' that can be identified from the present study, it is one of co-ordination and integration between and among projects. The third section of Part I will go into more detail about this issue, but it is a theme worth raising here at the outset, and bearing in mind when considering the rest of this report.

Acknowledgements

This has been a complex and rewarding project. I would like to thank Terry Ehling and her colleagues at MIT Press and the Knowledge Futures Group (Amy Brand, Travis Rich, Catherine Ahearn, Heather Staines, and Gabe Stein) for their support and enthusiasm. Many people helped substantially with the environmental scan portion of this research; thanks so much to Kevin Hawkins, Peter Suber, Katie Shamash, the Radical Open Access Collective, and Gary Price. Thanks to the excellent people at so many software projects who took the time to talk and answer all our questions, to Josh Greenberg, Don Waters, and Patricia Hswe for their perspectives, and especially to Alex Garnett, Alison McGonagle O'Connell, Adam Hyde, Juan Alperin, Paul Shannon, and Chris Kelty for being over-and-above available to my questioning. Thanks to John MacFarlane, the developer of Pandoc and the Gitit wiki engine, for providing the tools we used to compile and write this report. Finally, this research has really been a group effort; thanks so much to Erik, Leena, Carmen, Kim, Avvai, Melody, Emma, and Ellen for being part of it!

Disclosure

The world of scholarly communications isn't a large one; many of the projects represented here are ones I've been following with interest for a very long time. As Director of the Canadian Institute for Studies in Publishing at Simon Fraser University, I am very well acquainted with the Public Knowledge Project (PKP); while I have no role with the project, members of the PKP's leadership team are my colleagues at SFU and indeed friends of mine. I hope my long history of being critical of Open Journal Systems (OJS) helps keep this report as objective as it can be. I have known Adam Hyde of the Coko Foundation for many years and have helped promote Coko and Adam's ideas generally. I am currently an advisory board member of the Rebus Foundation. Last, my financial sponsor in this project has been the MIT Press, who are a major stakeholder in PubPub. – John W. Maxwell

Introduction

Method

Key themes

Acknowledgements

Disclosure