LexBib bibliodata workflow overview: Difference between revisions

m
Line 50: Line 50:


* For Elexifinder version 2 (spring 2021), we reduced the around 5,000 different person names present by that time in the database to around 4,000 unique person items, using clustering algorithms in [http://openrefine.org Open Refine]. Persons in LexBib have up to six name variants (see query at [[Main_Page#See_what.27s_in_the_database|Main Page]]).
* For Elexifinder version 2 (spring 2021), we reduced the around 5,000 different person names present by that time in the database to around 4,000 unique person items, using clustering algorithms in [http://openrefine.org Open Refine]. Persons in LexBib have up to six name variants (see query at [[Main_Page#See_what.27s_in_the_database|Main Page]]).
* For subsequent updates, we use our own [https://github.com/wetneb/openrefine-wikibase wikibase reconciliation service with open refine]. That means, that person name literals are matched against person items existing in LexBib wikibase, where all name literals previously matched to a person items are stored. [https://github.com/elexis-eu/elexifinder/blob/master/wikibase/sparql/authorliteralsforopenrefine.rq This query] exports wikibase statements pointing to unmatched persons, and [https://github.com/elexis-eu/elexifinder/blob/master/wikibase/maintenance/newcreatorsfromopenrefine.py newcreatorsfromopenrefine.py] processes the reconciliation results, creates new items for those names that have remained unmatched, and updates the statements and the literals associated to persons.
* For subsequent updates, we use our own [https://github.com/wetneb/openrefine-wikibase wikibase reconciliation service with open refine]. That means, that person name literals are matched against person items existing in LexBib wikibase, where all name literals previously matched to a person item are stored. [https://github.com/elexis-eu/elexifinder/blob/master/wikibase/sparql/authorliteralsforopenrefine.rq This query] exports wikibase statements pointing to unmatched persons, and [https://github.com/elexis-eu/elexifinder/blob/master/wikibase/maintenance/newcreatorsfromopenrefine.py newcreatorsfromopenrefine.py] processes the reconciliation results, creates new items for those names that have remained unmatched, and updates the statements and the literals associated to persons.
* This part of the workflow will soon be simplyfied, as [http://wbstack.com wikibase.cloud] developers are about to build OpenRefine into wikibase, i.e. a wikibase.cloud wikibase will by default ship its own Open Refine instance for reconciliation of literal values (i.e. their matching to LexBib wikibase ontology entities), and for uploading reconciliation results to wikibase. This means a shortcut for the export-reconciliation-import process described above, wich still involves manual configuration of the Open Refine tool and the own reconciliation service, as well as the upload process for reconciled data.
* This part of the workflow will soon be simplyfied, as [http://wbstack.com wikibase.cloud] developers are about to build OpenRefine into wikibase, i.e. a wikibase.cloud wikibase will by default ship its own Open Refine instance for reconciliation of literal values (i.e. their matching to LexBib wikibase ontology entities), and for uploading reconciliation results to wikibase. This means a shortcut for the export-reconciliation-import process described above, wich still involves manual configuration of the Open Refine tool and the own reconciliation service, as well as the upload process for reconciled data.