LexBib bibliodata workflow overview: Difference between revisions

From LexBib
Line 13: Line 13:


* Completeness of publication metadata is manually checked. The editing team uses [https://www.zotero.org/groups/ Zotero group synchronization] ([https://lexbib.org/blog/getting-started-with-zotero/ tutorial]).
* Completeness of publication metadata is manually checked. The editing team uses [https://www.zotero.org/groups/ Zotero group synchronization] ([https://lexbib.org/blog/getting-started-with-zotero/ tutorial]).
* Every item is annotated with the first author's location. An English Wikipedia page URL (as unambiguous identifier) is placed in the Zotero "extra" field. zotexport.py (see below) maps that to the corresponding LexBib place item ([https://lexbib.org/blog/author-and-article-location-tutorial/ tutorial]).
* Every item is annotated with the first author's location; the location of the first author is a requirement for the dataset to be exported to [[Elexifinder]]. An English Wikipedia page URL (as unambiguous identifier) is placed in the Zotero "extra" field. zotexport.py (see below) maps that to the corresponding LexBib place item ([https://lexbib.org/blog/author-and-article-location-tutorial/ tutorial]).
* The Zotero "language" field (publication language) must contain a two-letter ISO-639-1, or a three-letter ISO-639-3 language code.
* The Zotero "language" field (publication language) must contain a two-letter ISO-639-1, or a three-letter ISO-639-3 language code.
* In the sources, person names (author, editor) are often disordered or incomplete. We try to validate correct name forms already in this stage. A disambiguation proper (with unambiguous ID) is not possible in Zotero.
* In the sources, person names (author, editor) are often disordered or incomplete. We try to validate correct name forms already in this stage. A disambiguation proper (with unambiguous ID) is not possible in Zotero.
* Items are annotated with Zotero tags that contain shortcodes, which are interpreted by zotexport.py. The shortcodes point either to LexBib wikibase items (Q-ID), or to pre-defined values:
* Items are annotated with Zotero tags that contain shortcodes, which are interpreted by zotexport.py. The shortcodes point either to LexBib wikibase items (Q-ID), or to pre-defined values:
** '':container Qxx'' points to a containing item (a journal issue, an edited volume)
** '':container Qxx'' points to a containing item (a [[Item:Q12|BibCollection]] item describing a journal issue, an edited volume)
** '':event Qxx'' points to a corresponding event (a conference iteration, a workshop)
** '':event Qxx'' points to a corresponding event (an item describing a conference iteration, a workshop). A property pointing to the event location is attached to the LexBib wikibase [[Item:Q6|Event]] item.
** '':abstractLanguage en'' indicates that the abstract contained in the dataset is given in English (and not in the language of the article)
** '':abstractLanguage en'' indicates that the abstract contained in the dataset is given in [[Item:Q201|English]] (and not in the language of the article)
** '':collection x'' points to an Elexifinder collection number
** '':collection x'' points to an Elexifinder collection number.
** '':type Review'' classifies the item as review article
** '':type Review'' classifies the item as [[Item:Q15|review article]].
** '':type Community'' classifies the item as piece of community communication (anniversaries, obituaries, etc.)
** '':type Community'' classifies the item as piece of [[Item:Q26|community communication]] (anniversaries, obituaries, etc.).
** '':type Report'' classifies the item as event report
** '':type Report'' classifies the item as [[Item:Q25|event report]].


==Full text TXT cleaning==
==Full text TXT cleaning==