DMLEX on Wikibase: Difference between revisions

From LexBib
(Created page with "= A serialization of the DMLEX model, for LexBib Wikibase = This page describes how lexical resources datasets following the [https://docs.oasis-open.org/lexidma/dmlex/v1.0/csd02/dmlex-v1.0-csd02.pdf DMLEX model] are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queri...")
 
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
= A serialization of the DMLEX model, for LexBib Wikibase =
'''A serialization of the DMLEX model, for LexBib Wikibase'''


This page describes how lexical resources datasets following the [https://docs.oasis-open.org/lexidma/dmlex/v1.0/csd02/dmlex-v1.0-csd02.pdf DMLEX model] are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queries in these.
This page describes how lexical resources datasets following the [https://docs.oasis-open.org/lexidma/dmlex/v1.0/csd02/dmlex-v1.0-csd02.pdf DMLEX model] are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queries in these.
Line 5: Line 5:
This model is, of course, heavily inspired by the [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl DMLEX Ontology] (the RDF serialization of DMLEX deploying Ontolex-Lemon).
This model is, of course, heavily inspired by the [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl DMLEX Ontology] (the RDF serialization of DMLEX deploying Ontolex-Lemon).


= Lexicographical Resource =
= DMLEX on Lexbib Wikibase =


Entities of this class are modelled as Q-entities of the class [[Item:Q100|dmlex Lexicographical Resource]].
= Global properties =
 
* "id": [[Property:P186|P186]] (string) - the entry, sense, or form ID in the source dataset
* "sameAs": [[Property:P57|P57]] (url) - should be a proper URI
* "listingOrder": [[Property:P33|P33]] (string) - integer is converted to string
* "langCode": [[Property:P56|P56]] (item) - the IETF language code is mapped to the Wikibase item representing the language
 
== Lexicographical Resource ==
 
Entities of this class are modelled as Q-entities of the class [[Item:Q100|Lexicographical Resource]].
 
=== Object properties ===


Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes:
Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes:
Line 15: Line 26:
* [[Item:Q103|Label Tag]]
* [[Item:Q103|Label Tag]]
* [[Item:Q104|Label Type Tag]]
* [[Item:Q104|Label Type Tag]]
* [[Item:Q105|Part of Speech Tag]]
* [[Item:Q105|Part of Speech Tag]] - should contain a [[Property:P202|P202]] statement pointing to the Wikibase item desribing the corresponding LexInfo 3.0 POS
* [[Item:Q106|Source Identity Tag]]
* [[Item:Q106|Source Identity Tag]]
* [[Item:Q107|Transcription Scheme Tag]]
* [[Item:Q107|Transcription Scheme Tag]]


This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the corresponding literal "tag" properties attached to dictionary content with the corresponding controlled value entity.
This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the statements using properties that point to literal ''dmlex:tag'' properties attached to dictionary content with the corresponding controlled value entity ([[Lexeme:L170#S3|example]]).
 
=== Datatype properties ===
 
* "title": [[Property:P6|P6]] (string)
* "uri": [[Property:P112|P112]] (url)
 
== Entry ==
 
=== Datatype properties ===
 
* "headword" is mapped to ''wikibase:lemma'', to which the language code corresponding to the Lexicographical Resource's "langCode" property value is attached.
* "homographNumber": [[Property:P187|P187]] (string)
 
=== Object properties, represented using Wikibase shallow reification (using qualifiers) ===
 
* "partOfSpeech": the "tag" value of the ''dmlex:PartOfSpeech'' object is mapped to [[Property:P195|P195]] (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using [[Property:P201|P201]] (string). A "listingOrder" value is also attached as qualifier.
* "label": the "tag" value of the ''dmlex:Label'' object is mapped to [[Property:P195|P195]] (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using [[Property:P203|P203]] (string). A "listingOrder" value is also attached as qualifier.
* "pronunciation": the "text" value of the ''dmlex:Transcription'' object is mapped to [[Property:P204|P204]] (string). The "scheme" value (an IETF language tag) is attached as qualifier using [[Property:P205|P205]] (string); a [[Property:P206|P206]] (item) as well, in case the literal value matches to one of the controlled values specified for the Lexicographical Resource.
 
== Sense ==
 
a lexeme sense, on Wikibase, is by default modeled as instance of ''ontolex:LexicalSense''. The DMLex class ''dmlex:Sense'' is mapped to this. '''Note: in [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl dmlex.ttl], ''dmlex:Sense'' is declared subclass of ''ontolex:LexicalConcept'', and not of ''ontolex:LexicalSense''.'''
 
=== Datatype properties ===


= Entry =
* "definition" is mapped to [[Property:P209|P209]], datatype "string".
* "example" is mapped to [[Property:P208|P208]], datatype "string".


== Inflected Form ==


= Sense =
= SPARQL =
== Slovar slovenskih členkov ([[Item:Q34165|Q34165]]) ==
<sparql tryit="1">
#title: Slovar slovenskih členkov entries


PREFIX lwb: <https://lexbib.elex.is/entity/>
PREFIX ldp: <https://lexbib.elex.is/prop/direct/>
PREFIX lp: <https://lexbib.elex.is/prop/>
PREFIX lps: <https://lexbib.elex.is/prop/statement/>
PREFIX lpq: <https://lexbib.elex.is/prop/qualifier/>
PREFIX lpr: <https://lexbib.elex.is/prop/reference/>
PREFIX lno: <https://lexbib.elex.is/prop/novalue/>


= InflectedForm =
select ?lexeme ?lexeme_nr ?lemma (count (distinct ?sense) as ?num_of_senses) (count (distinct ?def) as ?num_of_defs) (count (distinct ?expl) as ?num_of_examples)
where {
  ?lexeme ldp:P207 lwb:Q34165; wikibase:lemma ?lemma; ontolex:sense ?sense.
  optional {?sense ldp:P209 ?def.} optional {?sense ldp:P208 ?expl.}
  bind (xsd:integer(strafter(str(?lexeme),"https://lexbib.elex.is/entity/L")) as ?lexeme_nr)
  filter (?lexeme_nr > 34) # this is because of bug https://phabricator.wikimedia.org/T363312
}
group by ?lexeme ?lexeme_nr ?lemma ?num_of_senses ?num_of_defs ?num_of_examples
order by ?lexeme_nr
</sparql>

Latest revision as of 18:23, 28 April 2024

A serialization of the DMLEX model, for LexBib Wikibase

This page describes how lexical resources datasets following the DMLEX model are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queries in these.

This model is, of course, heavily inspired by the DMLEX Ontology (the RDF serialization of DMLEX deploying Ontolex-Lemon).

DMLEX on Lexbib Wikibase

Global properties

  • "id": P186 (string) - the entry, sense, or form ID in the source dataset
  • "sameAs": P57 (url) - should be a proper URI
  • "listingOrder": P33 (string) - integer is converted to string
  • "langCode": P56 (item) - the IETF language code is mapped to the Wikibase item representing the language

Lexicographical Resource

Entities of this class are modelled as Q-entities of the class Lexicographical Resource.

Object properties

Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes:

This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the statements using properties that point to literal dmlex:tag properties attached to dictionary content with the corresponding controlled value entity (example).

Datatype properties

  • "title": P6 (string)
  • "uri": P112 (url)

Entry

Datatype properties

  • "headword" is mapped to wikibase:lemma, to which the language code corresponding to the Lexicographical Resource's "langCode" property value is attached.
  • "homographNumber": P187 (string)

Object properties, represented using Wikibase shallow reification (using qualifiers)

  • "partOfSpeech": the "tag" value of the dmlex:PartOfSpeech object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P201 (string). A "listingOrder" value is also attached as qualifier.
  • "label": the "tag" value of the dmlex:Label object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P203 (string). A "listingOrder" value is also attached as qualifier.
  • "pronunciation": the "text" value of the dmlex:Transcription object is mapped to P204 (string). The "scheme" value (an IETF language tag) is attached as qualifier using P205 (string); a P206 (item) as well, in case the literal value matches to one of the controlled values specified for the Lexicographical Resource.

Sense

a lexeme sense, on Wikibase, is by default modeled as instance of ontolex:LexicalSense. The DMLex class dmlex:Sense is mapped to this. Note: in dmlex.ttl, dmlex:Sense is declared subclass of ontolex:LexicalConcept, and not of ontolex:LexicalSense.

Datatype properties

  • "definition" is mapped to P209, datatype "string".
  • "example" is mapped to P208, datatype "string".

Inflected Form

SPARQL

Slovar slovenskih členkov (Q34165)

#title: Slovar slovenskih členkov entries

PREFIX lwb: <https://lexbib.elex.is/entity/>
PREFIX ldp: <https://lexbib.elex.is/prop/direct/>
PREFIX lp: <https://lexbib.elex.is/prop/>
PREFIX lps: <https://lexbib.elex.is/prop/statement/>
PREFIX lpq: <https://lexbib.elex.is/prop/qualifier/>
PREFIX lpr: <https://lexbib.elex.is/prop/reference/>
PREFIX lno: <https://lexbib.elex.is/prop/novalue/>

select ?lexeme ?lexeme_nr ?lemma (count (distinct ?sense) as ?num_of_senses) (count (distinct ?def) as ?num_of_defs) (count (distinct ?expl) as ?num_of_examples)
where {
  ?lexeme ldp:P207 lwb:Q34165; wikibase:lemma ?lemma; ontolex:sense ?sense.
  optional {?sense ldp:P209 ?def.} optional {?sense ldp:P208 ?expl.}
  bind (xsd:integer(strafter(str(?lexeme),"https://lexbib.elex.is/entity/L")) as ?lexeme_nr)
  filter (?lexeme_nr > 34) # this is because of bug https://phabricator.wikimedia.org/T363312
}
group by ?lexeme ?lexeme_nr ?lemma ?num_of_senses ?num_of_defs ?num_of_examples
order by ?lexeme_nr

Try it!