LexBib bibliodata workflow overview: Difference between revisions

LexBib bibliodata workflow overview (view source)

182 bytes added , 2 years ago

12,347

edits

@@ Line 28: / Line 28: @@
 * We use the [https://grobid.readthedocs.io GROBID tool]. zotexport.py (see below) leaves a copy of all PDF in a folder, which is processed by GROBID; GROBID produces a TEI-XML representation of the PDF content. The article body (i.e. the part usually starting after the abstract and ending before the references section) is enclosed in a tag called <body>.
-* In cases where GROBID fails, we manually isolate the text body from the Zotero-produced TXT ([https://lexbib.org/blog/grobid-txt-validation/ tutorial]).
+* In cases where GROBID fails, we manually isolate the text body from the Zotero-produced TXT ([https://lexbib.org/blog/grobid-txt-validation/ tutorial]). Full texts that do not follow a standard structure, most typically because they don't contain an abstract (this is usual in book chapters), are often not propertly parsed by GROBID.
 =LexBib wikibase: Conversion into Linked Data=