GROBID (or Grobid) stands for GeneRation Of BIbliographic Data. It is a machine-learning library for extracting, parsing, and re-structuring journal articles in PDF format into structured TEI-encoded documents that can then be transformed to JATS XML. Grobid represents a best-of-breed example (see https://arxiv.org/abs/1802.01168) of the shift from traditional parser-based approaches to machine-learning models for converting legacy documents to XML. Grobid is employed in the PKP's Open Typesetting Stack.
Institutional host: independent
URL: https://grobid.readthedocs.io/en/latest/Introduction/
Principal investigator: Patrice Lopez
Contact: [email protected]
Lead developer: Patrice Lopez
Funding sources:
Development partners: various
Partners:
Initial release: 2011
Version (as of June 2019): 0.5.4
URL: https://github.com/kermitt2/grobid
Language: Java
License: Apache 2.0
Last commit: 2018-04-25
Contributors: 28