DMLEX on Wikibase

Revision as of 18:11, 28 April 2024 by DavidL (talk | contribs) (→‎Sense)

A serialization of the DMLEX model, for LexBib Wikibase

This page describes how lexical resources datasets following the DMLEX model are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queries in these.

This model is, of course, heavily inspired by the DMLEX Ontology (the RDF serialization of DMLEX deploying Ontolex-Lemon).

DMLEX on Lexbib Wikibase

Global properties

  • "id": P186 (string) - the entry, sense, or form ID in the source dataset
  • "sameAs": P57 (url) - should be a proper URI
  • "listingOrder": P33 (string) - integer is converted to string
  • "langCode": P56 (item) - the IETF language code is mapped to the Wikibase item representing the language

Lexicographical Resource

Entities of this class are modelled as Q-entities of the class Lexicographical Resource.

Object properties

Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes:

This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the statements using properties that point to literal dmlex:tag properties attached to dictionary content with the corresponding controlled value entity.

Datatype properties

  • "title": P6 (string)
  • "uri": P112 (url)

Entry

Datatype properties

  • "headword" is mapped to wikibase:lemma, to which the language code corresponding to the Lexicographical Resource's "langCode" property value is attached.
  • "homographNumber": P187 (string)

Object properties, represented using Wikibase shallow reification (using qualifiers)

  • "partOfSpeech": the "tag" value of the dmlex:PartOfSpeech object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P201 (string). A "listingOrder" value is also attached as qualifier.
  • "label": the "tag" value of the dmlex:Label object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P203 (string). A "listingOrder" value is also attached as qualifier.
  • "pronunciation": the "text" value of the dmlex:Transcription object is mapped to P204 (string). The "scheme" value (an IETF language tag) is attached as qualifier using P205 (string); a P206 (item) as well, in case the literal value matches to one of the controlled values specified for the Lexicographical Resource.

Sense

a lexeme sense, on Wikibase, is by default modeled as instance of ontolex:LexicalSense. The DMLex class dmlex:Sense is mapped to this. Note: in dmlex.ttl, dmlex:Sense is declared subclass of ontolex:LexicalConcept, and not of ontolex:LexicalSense.

Inflected Form

SPARQL

Slovar slovenskih členkov (Q34165)

#title: Slovar slovenskih členkov entries

PREFIX lwb: <https://lexbib.elex.is/entity/>
PREFIX ldp: <https://lexbib.elex.is/prop/direct/>
PREFIX lp: <https://lexbib.elex.is/prop/>
PREFIX lps: <https://lexbib.elex.is/prop/statement/>
PREFIX lpq: <https://lexbib.elex.is/prop/qualifier/>
PREFIX lpr: <https://lexbib.elex.is/prop/reference/>
PREFIX lno: <https://lexbib.elex.is/prop/novalue/>

select ?lexeme ?lexeme_nr ?lemma (count (distinct ?sense) as ?num_of_senses) (count (distinct ?def) as ?num_of_defs) (count (distinct ?expl) as ?num_of_examples)
where {
  ?lexeme ldp:P207 lwb:Q34165; wikibase:lemma ?lemma; ontolex:sense ?sense.
  optional {?sense ldp:P209 ?def.} optional {?sense ldp:P208 ?expl.}
  bind (xsd:integer(strafter(str(?lexeme),"https://lexbib.elex.is/entity/L")) as ?lexeme_nr)
  filter (?lexeme_nr > 34) # this is because of bug https://phabricator.wikimedia.org/T363312
}
group by ?lexeme ?lexeme_nr ?lemma ?num_of_senses ?num_of_defs ?num_of_examples
order by ?lexeme_nr

Try it!