DMLEX on Wikibase: Difference between revisions
mNo edit summary |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
This model is, of course, heavily inspired by the [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl DMLEX Ontology] (the RDF serialization of DMLEX deploying Ontolex-Lemon). | This model is, of course, heavily inspired by the [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl DMLEX Ontology] (the RDF serialization of DMLEX deploying Ontolex-Lemon). | ||
= | = DMLEX on Lexbib Wikibase = | ||
Entities of this class are modelled as Q-entities of the class [[Item:Q100| | = Global properties = | ||
* "id": [[Property:P186|P186]] (string) - the entry, sense, or form ID in the source dataset | |||
* "sameAs": [[Property:P57|P57]] (url) - should be a proper URI | |||
* "listingOrder": [[Property:P33|P33]] (string) - integer is converted to string | |||
* "langCode": [[Property:P56|P56]] (item) - the IETF language code is mapped to the Wikibase item representing the language | |||
== Lexicographical Resource == | |||
Entities of this class are modelled as Q-entities of the class [[Item:Q100|Lexicographical Resource]]. | |||
=== Object properties === | |||
Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes: | Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes: | ||
Line 15: | Line 26: | ||
* [[Item:Q103|Label Tag]] | * [[Item:Q103|Label Tag]] | ||
* [[Item:Q104|Label Type Tag]] | * [[Item:Q104|Label Type Tag]] | ||
* [[Item:Q105|Part of Speech Tag]] | * [[Item:Q105|Part of Speech Tag]] - should contain a [[Property:P202|P202]] statement pointing to the Wikibase item desribing the corresponding LexInfo 3.0 POS | ||
* [[Item:Q106|Source Identity Tag]] | * [[Item:Q106|Source Identity Tag]] | ||
* [[Item:Q107|Transcription Scheme Tag]] | * [[Item:Q107|Transcription Scheme Tag]] | ||
This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the | This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the statements using properties that point to literal ''dmlex:tag'' properties attached to dictionary content with the corresponding controlled value entity ([[Lexeme:L170#S3|example]]). | ||
=== Datatype properties === | |||
* "title": [[Property:P6|P6]] (string) | |||
* "uri": [[Property:P112|P112]] (url) | |||
== Entry == | |||
=== Datatype properties === | |||
* "headword" is mapped to ''wikibase:lemma'', to which the language code corresponding to the Lexicographical Resource's "langCode" property value is attached. | |||
* "homographNumber": [[Property:P187|P187]] (string) | |||
=== Object properties, represented using Wikibase shallow reification (using qualifiers) === | |||
* "partOfSpeech": the "tag" value of the ''dmlex:PartOfSpeech'' object is mapped to [[Property:P195|P195]] (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using [[Property:P201|P201]] (string). A "listingOrder" value is also attached as qualifier. | |||
* "label": the "tag" value of the ''dmlex:Label'' object is mapped to [[Property:P195|P195]] (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using [[Property:P203|P203]] (string). A "listingOrder" value is also attached as qualifier. | |||
* "pronunciation": the "text" value of the ''dmlex:Transcription'' object is mapped to [[Property:P204|P204]] (string). The "scheme" value (an IETF language tag) is attached as qualifier using [[Property:P205|P205]] (string); a [[Property:P206|P206]] (item) as well, in case the literal value matches to one of the controlled values specified for the Lexicographical Resource. | |||
== Sense == | |||
a lexeme sense, on Wikibase, is by default modeled as instance of ''ontolex:LexicalSense''. The DMLex class ''dmlex:Sense'' is mapped to this. '''Note: in [https://github.com/oasis-tcs/lexidma/blob/master/dmlex-v1.0/specification/serializations/RDF/ontology/dmlex.ttl dmlex.ttl], ''dmlex:Sense'' is declared subclass of ''ontolex:LexicalConcept'', and not of ''ontolex:LexicalSense''.''' | |||
=== Datatype properties === | |||
* "definition" is mapped to [[Property:P209|P209]], datatype "string". | |||
* "example" is mapped to [[Property:P208|P208]], datatype "string". | |||
== Inflected Form == | |||
= | = SPARQL = | ||
== Slovar slovenskih členkov ([[Item:Q34165|Q34165]]) == | |||
<sparql tryit="1"> | |||
#title: Slovar slovenskih členkov entries | |||
PREFIX lwb: <https://lexbib.elex.is/entity/> | |||
PREFIX ldp: <https://lexbib.elex.is/prop/direct/> | |||
PREFIX lp: <https://lexbib.elex.is/prop/> | |||
PREFIX lps: <https://lexbib.elex.is/prop/statement/> | |||
PREFIX lpq: <https://lexbib.elex.is/prop/qualifier/> | |||
PREFIX lpr: <https://lexbib.elex.is/prop/reference/> | |||
PREFIX lno: <https://lexbib.elex.is/prop/novalue/> | |||
select ?lexeme ?lexeme_nr ?lemma (count (distinct ?sense) as ?num_of_senses) (count (distinct ?def) as ?num_of_defs) (count (distinct ?expl) as ?num_of_examples) | |||
where { | |||
?lexeme ldp:P207 lwb:Q34165; wikibase:lemma ?lemma; ontolex:sense ?sense. | |||
optional {?sense ldp:P209 ?def.} optional {?sense ldp:P208 ?expl.} | |||
bind (xsd:integer(strafter(str(?lexeme),"https://lexbib.elex.is/entity/L")) as ?lexeme_nr) | |||
filter (?lexeme_nr > 34) # this is because of bug https://phabricator.wikimedia.org/T363312 | |||
} | |||
group by ?lexeme ?lexeme_nr ?lemma ?num_of_senses ?num_of_defs ?num_of_examples | |||
order by ?lexeme_nr | |||
</sparql> |
Latest revision as of 18:23, 28 April 2024
A serialization of the DMLEX model, for LexBib Wikibase
This page describes how lexical resources datasets following the DMLEX model are represented on this Wikibase instance. In the following sections, we describe how the DMLEX core classes are represented on LexBib Wikibase. The aim is to present DMLEX datasets to the user on collaboratively editable entity pages, and to allow SPARQL queries in these.
This model is, of course, heavily inspired by the DMLEX Ontology (the RDF serialization of DMLEX deploying Ontolex-Lemon).
DMLEX on Lexbib Wikibase
Global properties
- "id": P186 (string) - the entry, sense, or form ID in the source dataset
- "sameAs": P57 (url) - should be a proper URI
- "listingOrder": P33 (string) - integer is converted to string
- "langCode": P56 (item) - the IETF language code is mapped to the Wikibase item representing the language
Lexicographical Resource
Entities of this class are modelled as Q-entities of the class Lexicographical Resource.
Object properties
Some properties attached to entities of this class that belong to the DMLEX Controlled Values module point to Q-items belonging to the following classes:
- Definition Type Tag
- Inflected Form Tag
- Label Tag
- Label Type Tag
- Part of Speech Tag - should contain a P202 statement pointing to the Wikibase item desribing the corresponding LexInfo 3.0 POS
- Source Identity Tag
- Transcription Scheme Tag
This full reification of DMLEX controlled values (i.e., that they are not blank nodes, but Q-entities) allows to qualify the statements using properties that point to literal dmlex:tag properties attached to dictionary content with the corresponding controlled value entity (example).
Datatype properties
Entry
Datatype properties
- "headword" is mapped to wikibase:lemma, to which the language code corresponding to the Lexicographical Resource's "langCode" property value is attached.
- "homographNumber": P187 (string)
Object properties, represented using Wikibase shallow reification (using qualifiers)
- "partOfSpeech": the "tag" value of the dmlex:PartOfSpeech object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P201 (string). A "listingOrder" value is also attached as qualifier.
- "label": the "tag" value of the dmlex:Label object is mapped to P195 (string), and, in case that string matches to one of the controlled values specified for the Lexicographical Resource, is qualified with the matching controlled value item using P203 (string). A "listingOrder" value is also attached as qualifier.
- "pronunciation": the "text" value of the dmlex:Transcription object is mapped to P204 (string). The "scheme" value (an IETF language tag) is attached as qualifier using P205 (string); a P206 (item) as well, in case the literal value matches to one of the controlled values specified for the Lexicographical Resource.
Sense
a lexeme sense, on Wikibase, is by default modeled as instance of ontolex:LexicalSense. The DMLex class dmlex:Sense is mapped to this. Note: in dmlex.ttl, dmlex:Sense is declared subclass of ontolex:LexicalConcept, and not of ontolex:LexicalSense.
Datatype properties
Inflected Form
SPARQL
Slovar slovenskih členkov (Q34165)
#title: Slovar slovenskih členkov entries
PREFIX lwb: <https://lexbib.elex.is/entity/>
PREFIX ldp: <https://lexbib.elex.is/prop/direct/>
PREFIX lp: <https://lexbib.elex.is/prop/>
PREFIX lps: <https://lexbib.elex.is/prop/statement/>
PREFIX lpq: <https://lexbib.elex.is/prop/qualifier/>
PREFIX lpr: <https://lexbib.elex.is/prop/reference/>
PREFIX lno: <https://lexbib.elex.is/prop/novalue/>
select ?lexeme ?lexeme_nr ?lemma (count (distinct ?sense) as ?num_of_senses) (count (distinct ?def) as ?num_of_defs) (count (distinct ?expl) as ?num_of_examples)
where {
?lexeme ldp:P207 lwb:Q34165; wikibase:lemma ?lemma; ontolex:sense ?sense.
optional {?sense ldp:P209 ?def.} optional {?sense ldp:P208 ?expl.}
bind (xsd:integer(strafter(str(?lexeme),"https://lexbib.elex.is/entity/L")) as ?lexeme_nr)
filter (?lexeme_nr > 34) # this is because of bug https://phabricator.wikimedia.org/T363312
}
group by ?lexeme ?lexeme_nr ?lemma ?num_of_senses ?num_of_defs ?num_of_examples
order by ?lexeme_nr