Prospects

by John W Maxwell, Erik Hanson, Leena Desai, Carmen Tiampo, Kim O'Donnell, Avvai Ketheeswaran, Melody Sun, Emma Walter, and Ellen Michelle
Where do we go from here? Are partnerships, collaborations, and integrations the solution to create a gapless ecosystem?
Prospects
·
Contributors (10)
J
EH
L
C
& 5 more
Published
Aug 02, 2019

Beyond the individual projects in our catalogue and the individual contributions they make, we also have to consider the larger ecosystem: how these projects relate to one another (both formally and informally), how they might be sustained over time, and how the higher-level goals of furthering scholarly communications are actually addressed by individual efforts and approaches.

Two larger-scale themes seem apparent to us after looking at the details for many months. The first has to do with the problem of siloed development. Many projects we surveyed operate largely in isolation from one another. The goals of collaboration, interoperability, and integration are very secondary to the specific, internal goals of each project. Incentives for collaboration between projects are few, even though there is a general recognition that where possible, collaboration, standardization, and even common code layers can provide considerable benefit to project ambitions, functionality, and sustainability.

The second theme has to do with the organization of the community-owned ecosystem itself: what are the forces—and organizations—that serve the larger community, that mediate between individual projects, between projects and use cases, and between projects and resources. The enormous plurality of approaches and strategies is both a positive (in the sense that the scholarly project more generally treats pluralism as a good), and a negative (plurality tends to work against the scale that is needed for efficiency—and indeed sustainability in a market paradigm). Neither a chaotic plurality of disparate projects nor an efficiency-driven, enforced standard is itself desirable, but mediating between these two will require broad agreement about high-level goals, governance, and funding priorities—and perhaps some agency for integration/mediation.

Collaboration and its benefits

Funding bodies—and especially substantial government and foundation grants—have been used substantially to support the development of many of the projects in this survey. Most such funding, though, is derived from a research-funding model that prioritizes new knowledge creation. It rewards the novel, the exceptional, and the singular. There is, by contrast, relatively little available funding for long-term development, and little funding, or incentive, for collaboration across initiatives. The result is that individual projects end up competing for the same funding sources, potentially at cross-purposes, and at the risk of unsustainability.

A culture of competitiveness and prestige in funding—itself inherent in academic research funding structures—privileges innovation over stability for many projects. From a funder’s perspective, the return on investment (ROI) is more obvious where innovation is the goal than in long-term infrastructure investments. From a awardee’s perspective, the flip side of this is prestige. In Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure, Nadia Eghbal noted:

Older projects have a harder time finding contributors, because many developers prefer to work on new and exciting projects. This phenomenon has been referred to as “magpie developer” syndrome, where developers are attracted to “new and shiny” things.1

Long-term survival, though, is not shiny. It’s just hard. In a pure market-driven environment, sheer perseverance, pluck, and luck are what lead to sustainability. But if we are talking about community-supported infrastructure, what are the equivalent dynamics? What would a serious funding environment look like without competition for resources at its heart? What would project funding look like if it prioritized community governance, collaboration, and integration across a wider ecosystem?

But aren’t open-source projects collaborative by their very nature? If the code is available to all, then anyone who wants to contribute or integrate a project is free to do so. This framing, however, underplays the role of labour and active attention. For instance, OSS projects eventually end when they run out of steam—enthusiasm and energy on the part of developers and supporters, or else get swept aside by newer or better resourced projects that attract developer time and supporter attention. The transparency part of the OSS rationale suggests that, because the code remains available, in theory there are no dead projects, only dormant ones that could still be forked or reanimated, perhaps by a group of interested users. But this underestimates the scale and cost of OSS 2.0 projects. Indeed, what truly keeps OSS projects alive is communities of people who care—either as developers, supporters, or as users. Such care is not a cheap commodity; the OSS landscape is no Field of Dreams.

Partnerships and collaborations, whether among peers or among groups with aligned interests, are important to keeping more energy—and thus resources—flowing. The vast majority of projects we studied are small; their development communities are often fewer than a dozen people, and the direct interest in a project rarely extends beyond the institution that sponsors it. If the care and upkeep of projects could be extended to multiple groups, multiple institutions, then not only is there a larger and more diverse set of people who care, but opportunities for resourcing increase, and also, when one group’s priorities inevitably shift, it is less likely that a project is simply abandoned.

The Coko Foundation’s strategy is based on this idea. Coko—whose founders Adam Hyde and Kristen Ratan both had a wealth of experience with trying to sustain projects in the past—set out to build a community of interested institutions first, and then to design a set of software components that could work across use cases. Coko’s initial set of institutional parters—including eLife Sciences, Hindawi, the University of California Press, California Digital Library, EuropePMC, and the Wormbase project—represent a very diverse set of needs and use cases. But Coko’s foundational framework, PubSweet, is at work in all of these contexts.

Representatives from eLife and Hindawi spoke, at the Society of Scholarly Publishing conference in San Diego in May 2019, about how they had built very different review and editorial workflow software—with different business needs and user scenarios—on top of the common Coko codebase. This kind of collaboration both strengthens the core community and also provides more support to individual participants; for each new participant that joins the community, there is less work to be done on foundational pieces, leaving more time and resources for integration and customization.

A challenge here is in designing software for a broad, non-specific application (that can be built upon by others). Who will fund such an initiative in the first place, and who will direct the design? Coko has apparently succeeded with its PubSweet components, but at the cost of considerable community-building effort. By contrast, OJS 3 was developed with modular workflow system so that a partner developer could customize the way it works. But without PKP spending (that is, prioritizing resources) substantial time actively promoting this facility to potential partners, the software’s capacity for more specialized configuration goes mostly unused. Another example is the Texture editor, which has the potential to become a standardized JATS editing and typesetting environment, yet its ‘consortium’ consists of just two organizations. Who will direct its design going forward? Can Texture realistically become a general-purpose JATS editor under such circumstances?

A slightly different story surrounds the ProseMirror editor framework, which isn’t developed by a consortium. Rather, ProseMirror is the work of just one developer, Marijn Haverbeke. ProseMirror’s Github repository has twenty-odd contributors, but Haverbeke also runs a crowdfunding campaign that has more than 400 contributors—among them many scholarly communication organizations. ProseMirror has found its way into many other projects, including PubPub and Coko’s Wax editor. Notably, ProseMirror itself isn’t the end product for this community; ProseMirror is the framework upon which others build their products.

Collaboration in these projects does not just mean alignment around a single tool; it often means approaching development as a stack of software layers that work together, some of which might be one’s own primary concern, and others drawn from the community. But while it may make conventional sense to a downstream developer to re-use an existing modular component, who is responsible for doing the upstream work? Or for working on the generalized design and specification work for it?

It seems to us that there is an opportunity, either via funding mechanisms or by some agency for community stewardship, to provide clearer incentives for collaborative development, rather than projects proceeding from singular vision to an isolated codebase. If the goal of community-owned infrastructure is to succeed, then structural attention needs to be paid to the integration of projects, goals, and development efforts across the ecosystem. Nadia Eghbal noted that, “Not unlike technology startups, new digital infrastructure projects rely upon network effects for adoption.”2 The example of big publishers like Elsevier and technology companies like Digital Science shows that such network effects, and the integration of components across myriad workflow touchpoints, is key to succeeding in an interconnected world.3 In an interview, anthropologist Chris Kelty pointed out that since ‘infrastructure’ layers can be harder to fund than ‘applications,’ Elsevier’s focus on integration provides a major advantage.

Ecosystem integration and role(s) of service providers

The integration of various functional components needs to be seen not just from the perspective of development, but also deployment. Connecting usable software with publishers and users is not straightforward, and there are—again—a variety of approaches within the group of projects we’ve examined.

The PKP’s OJS has always embraced a DIY, download-and deploy methodology, and this has been key to a great deal of this platform’s adoption. OJS’ success in promoting Open Access (OA) publishing is partly because anyone who wanted to start an OA journal could do so, simply by installing OJS, setting up an editorial board, and publishing. Relatedly, OJS’s significant popularity in the Global South is partly due to the self-contained nature of the software; any institution capable of running a webserver became able to participate in the scholarly communications environment. This model is very much in the spirit of first-wave open-source software; indeed, OJS’s deployment model has much in common with WordPress.

At the other end of the spectrum, centralized commercial publishing and hosting platforms serve a different kind of end-user. Ubiquity Press, using the same software platform, built out a different service model around OJS, appealing to a different set of user needs. Both the Fulcrum project from Michigan and MIT’s PubPub are hosted centrally, where the open-source software platform relies on a set of services that can only be effectively delivered with the support of the host institution: preservation strategy, identifier and discovery layers, and so on.

We have of course seen myriad examples in the middle range of this spectrum and in hybrid approaches. PKP, for instance, has put considerable energy into nurturing (and educating) libraries to become local hosting and deployment services for OJS. As well, PKP Publishing Services now offers fee-based hosting and integration. A hybrid approach to deployment has served Hypothes.is as well. The download-and-go model has allowed thousands of individual users to integrate Hypothes.is with their scholarly practice, while the organization has actively pursued publishers and platforms to integrate the annotation service natively. Across the landscape we’ve surveyed are a host of perspectives on this issue, and the challenging questions it poses: If we rely on publishers to download and host themselves, will we scale the community to meaningful levels? And, conversely: If we offer centralized hosting, does that put us in market competition with organizations that would otherwise be our peers and partners?

A recurring theme in conversations with several projects has been the expectation that a layer of third-party service providers would emerge in the coming years, allowing the challenges of deployment to be mediated by commercial (or non-profit) partners who would provide hosting, customization, and integration for a service fee. Such partners would become, in effect, development partners in the software, and help expand the community of stakeholders around a project.

This sounds encouraging, but who exactly will these third parties be? One answer might be libraries and university IT service departments, as in PKP’s model. Another possibility is that commercial web-hosting providers could specialize into this market, offering scholarly publishing tools in addition to the usual WordPress or Drupal content management systems. A third possibility is a class of purpose-built providers who emerge around specific publishing communities, as Ubiquity Press did. Indeed, the rhetoric of community-owned infrastructure leads to a vision of a network of integration partners who make all these tools work like a unified network, rather than as a lot of competing projects.

As much as we like this idea, and we can imagine what this might look like once established, it is far from clear, in the summer of 2019, how we get there from here. John Chodacki and colleagues, in their guidebook, Supporting Research Communications, paint a picture of a fragmented and somewhat confused research communications ecosystem, with as many differences as commonalities, even amongst supporters.4 In such an environment, the development of a coordination and integration layer across diverse publishers and diverse functions will take effort, money, and initiative. It isn’t something that will magically emerge from the current landscape.

Encouragingly, people are talking about this. The Joint Roadmap for Open Science Tools (JROST) initiative in 2018 launched with the observation that “we are aware there are obvious synergies that are not being pursued, and likely many others still waiting to be discovered” and talked of common goals, consolidation of effort, shared governance models, and standardization.5 In summer 2018, Code for Science & Society’s Open Source Alliance for Open Scholarship (OSAOS) working group released a report of their discussions, especially outlined a possible vision for how funding could be better coordinated to support open infrastructure.6

In the spring of 2019, the launch of the Invest in Open Infrastructure (IOI) went further, adopting a more action-oriented agenda that pulls together research on scholarly infrastructure broadly, with a focus on collaboration and interoperability, and seeking solutions for funding to sustain it.7 One of the first concrete outcomes has been Educopia’s report on the 2019 Mapping the Scholarly Communication Landscape, presenting the initial results from a broad and deep “Census of Scholarly Communication Infrastructure.”

Educopia’s report8 makes a number of important calls to action, including the need for a standardized taxonomy of the functional components of scholarly infrastructure. This is a big task. The present landscape analysis will only go a small ways towards providing a common language and framework for talking about scholarly infrastructure as a whole; this is but a baby step toward what is ultimately needed. The Educopia report importantly underscores the challenges projects face in “raising and sustaining appropriate levels of funding to enable them to build and maintain services over time,” and, relatedly, the need for “scaled, leveraged efficiencies” to make development sustainable and more risk-tolerant.

To our eyes, the most important call to action made by the Educopia report is for community organization: in “guidance, mentorship, training,” in “clarity in their expressions of their purposes and goals,” and in the need to bring more stability and predictability to both the technical and financial aspects of infrastructure development.

From the perspective of our survey of the landscape of open-source publishing projects, the most important feature is scale. Almost all of the projects we examined are small—too small to gain critical developer mass as open-source projects (compared, say, to Internet infrastructure projects like Apache or Node.js or the React framework), and too niche or specialized to develop a market-based clientele that might provide meaningful revenue. OJS and Hypothes.is are the projects here with the largest scale, but neither is sufficiently either successful or mature to provide a sustainability model for other projects. Most projects are too small, too niche to be sustainable on their own, and will require extrinsic funding sources going forward. But to say that simply shifts the sustainability problem up a level; how does a government or private funding agency continue to fund myriad small projects, with new ones coming onstream all the time?

The lack of scale should not be seen as a failure to grow. Chodacki and colleagues wrote helpfully about the critical importance of trusted relationships in open scholarly communication, and how the emphasis on trust presents challenges for scalability.9 At the Force2018 conference, Adam Hyde of the Coko Foundation also commented on the need to scale Coko’s community slowly enough to maintain a sense of trust among community members.

But inability to scale can mean trouble raising revenue and hence with sustainability over time. There are two common approaches to the problem of scale. One is consolidation: let the market shake out so that it supports only a small number of projects that can take the lion’s share of available funding and thereby become at least affordable, if not self-sufficient. But this is an unpopular idea, for some obvious reasons. Consolidation like this will squeeze out innovation and adaptability. No one wants a Soviet-style, centrally planned scholarly infrastructure. Similarly, there is considerable concern around the spectre of corporate-style consolidation. Indeed, this is the scenario that led to the idea of community-owned infrastructure in the first place.

The other approach to the problem of scale is coordination and integration—which is what the open ecosystem significantly lacks currently. The opportunity at hand—for funders, for organizers and integrators, and for all actors who would further the overall goal of scholarly community-owned scholarly communication—seems to have come to rest here. How can we build incentives for collaboration and interactivity? How can we encourage, if not technical standardization per se, at least standards around APIs and module-level functions? How can we develop financial, governance, and sustainability capacity in the community, so projects have a better long-term footing? At a higher level, how can we leverage the intellectual riches that a plurality of approaches and innovators provides without being mired in a counter-productive environment in which these projects are in competition with each other for users, funding, and a chance to succeed? Competition is well and good, but if the goal is community-owned infrastructure, competition alone isn’t likely to provide it. That scholarly publishing is a classic example of “market failure” is not a new idea.10

Concluding thoughts

All of this is to restate the JROST, OSOAS, and IOI agendas, and we welcome new work and new development on these levels. If scale is a structural problem facing many of these projects, then community coordination may go some distance towards addressing it. If longer-term funding for sustainability is needed, then a mediating layer might productively function as a broker of such funding, assuming overhead costs remain low.

We hope this research begins to build a bridge between, on the one hand, thinking about these projects in terms of innovation, features, and interfaces and, on the other hand, the opportunities, and challenges, of supporting community-owned/governed infrastructure. We see a gap between the way we all talk about projects—like Manifold, OJS, Editoria, Libero, and so on—and the way we talk about the need for infrastructure. The projects do not add up to infrastructure on their own; they are all potential infrastructure components, but have not yet cohered into a comprehensive, networked environment.

In Roads and Bridges, Nadia Eghbal offered some reasoned advice for developing effective support strategies for software as infrastructure—rather than as product or research tool. Eghbal wrote, “Supporting infrastructure requires embracing the concept of stewardship rather than control.”11 Control is what firms seek in a competitive market, as a means of mitigating risk and consolidating position. If we continue to employ market-informed metaphors and models for these projects—in the idea of competition for funding, for users, for mindshare; in seed funding for innovation as analogous to venture capital; in our product focus—we miss the opportunity to make investments in infrastructure qua infrastructure. Eghbal’s “roads and bridges” wasn’t just a picturesque name; we might add schools and hospitals, and universities.

The key lesson here then might be that layers that support integration, networking, and longer-term sustainability are what need to be funded and developed at this point. If there is a gap it is not software, it’s ecosystem integration.


Bibliography and Footnotes

  1. Eghbal, Nadia., “Roads and Bridges: The Unseen Labor Behind Our Digital Infrastructure.‚” Ford Foundation, July 14, 2016. 42. https://www.fordfoundation.org/about/library/reports-and-studies/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure

  2. Eghbal, "Roads and Bridges," 45.

  3. Schonfeld, Roger C. ‚ “Strategy & Integration Among Workflow Providers.,” The Scholarly Kitchen (blog), November 7, 2017.

    https://scholarlykitchen.sspnet.org/2017/11/07/strategy-integration-workflow-providers/

  4. Chodacki, John, Patricia Cruse, Jennifer Lin, Cameron Neylon, Damian Pattinson, and Carly Strasser., “Supporting Research Communications: A Guide,” September 2018. https://www.supporters.guide/

  5. https://jrost.org/

  6. https://osaos.codeforscience.org/

  7. https://investinopen.org/

  8. Skinner, Katherine., “Mapping the Scholarly Communication Landscape,” 2019 Census.‚ Atlanta: Educopia Institute, June 20, 2019. https://educopia.org/2019-census/

  9. Chodacki et al. "Supporting Research Communications," 23.

  10. Monograph publishing as an example of the economics term "market failure" was made by Raym Crow in the 2012 AAU-ARL report "A Rational System for Funding Scholarly Monographs." John B Thompson's 2005 Books in the Digital Age makes an older but equally detailed exploration of this phenomenon.

  11. Eghbal, "Roads and Bridges," 125.

Footnotes
11
Comments
4
?
Lars Wieneke: The notion of “market” and “soviet-style central planning” confused me a bit. I guess there are actually two perspectives to consolidation. Either through market processes (offer/demand etc.) and in that case corporate actors might have the edge as you argue below. Or through selection where funding bodies actually select the happy few that will take up the lions share of funding. And in this case we will loose innovation and adaptability
?
Tim Elfenbein: One of the main benefits of looking toward cyberinfrastructure projects in the sciences is that we already have a social-science literature studying the social and technical dynamics of scholarly infrastructure, with an expanding conceptual apparatus to better describe and understand what is going on. On the tensions between projects and infrastructures, see e.g., https://repository.library.georgetown.edu/handle/10822/557392, or https://doi.org/10.1007/s10606-010-9113-z.
?
Tim Elfenbein: An analytical comment: First, we could use better ways of distinguishing and describing levels of scale. “Big” and “small” don’t get us the purchase we need to describe these projects. Second, and related, we need to think more about comparisons: the comparators used here are to other OSS projects/infrastructure. Comparing scholarly publishing infrastructure instead to cyberinfrastructure projects in the sciences (i.e., data repositories, ontologies, etc.) would open up a host of other important questions: who the “community” is for particular projects, possible governance and funding models, etc.
adam hyde: There us a lot of confusion out there right now around these issues. ‘Infrastructure’ is being used as a open semantic bucket to mean anything you want it to. Community owned is also a open-ended term. Isn’t a healthy OSS ‘community owned’ by virtue of collaboration of disparate parties around any given project? I worry that the language of ‘infrastructure’ and ‘community owned’ in current discussions is getting very muddled. There are also some legitimate reasons to be concerned about the current direction of these discussions. Will projects that aim to ‘community owned’ ‘infrastructure merely turn out to be ‘community owned’ funding sources. This could simply turn out to be a centralization of funding power and not solve any of the ‘infrastructure’ issues it is professing to address.