12,347
edits
No edit summary |
|||
(10 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
The vocabulary is being developed for different purposes: | The vocabulary is being developed for different purposes: | ||
* [[Elexifinder]] ''categories'' are derived from LexVoc, and are available for search and filter functions in the [[Elexifinder]] app, complementary to [[Elexifinder]] ''concepts'' (obtained through automatic wikifikation). | * [[Elexifinder]] ''categories'' are derived from LexVoc, and are available for search and filter functions in the [[Elexifinder]] app, complementary to [[Elexifinder]] ''concepts'' (obtained through automatic wikifikation, which is done using components of the Event Registry software (see describing [https://doi.org/10.1145/2567948.2577024 article]) built into [[Elexifinder]]). | ||
* Content describing indexation of [[LexBib Zotero]] bibliographical items, by mining their corresponding full texts. | * Content describing indexation of [[LexBib Zotero]] bibliographical items, by mining their corresponding full texts. | ||
* Content describing indexation of lexical-conceptual resources such as dictionaries (planned). | * Content describing indexation of lexical-conceptual resources such as dictionaries (planned). | ||
Line 9: | Line 9: | ||
LexVoc terms have English preferred and alternative lexicalizations; relations between them are represented according to the [https://www.w3.org/2004/02/skos/ W3C SKOS standard]. Lexicalizations in other languages have been drafted from BabelNet and Wikidata, and will be manually validated and completed (see [[LexVoc translation on Lexonomy]]). | LexVoc terms have English preferred and alternative lexicalizations; relations between them are represented according to the [https://www.w3.org/2004/02/skos/ W3C SKOS standard]. Lexicalizations in other languages have been drafted from BabelNet and Wikidata, and will be manually validated and completed (see [[LexVoc translation on Lexonomy]]). | ||
* Graph | * Graph view, complete, and coloured, for terms attached to LexVoc main facets: [https://lexbib.elex.is/query/#%23defaultView%3AGraph%0A%23%20SKOS%20Tree%20with%20specified%20root%20node%20%28%22facet%20node%22%29.%20Facets%20are%20LexVoc%20top-level%20concepts.%0A%23%20P72%20broader%2C%20P73%20narrower%2C%20P77%20closeMatch%0APREFIX%20lwb%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fentity%2F%3E%0APREFIX%20ldp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fdirect%2F%3E%0APREFIX%20lp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2F%3E%0APREFIX%20lps%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fstatement%2F%3E%0APREFIX%20lpq%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fqualifier%2F%3E%0APREFIX%20lpr%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Freference%2F%3E%0APREFIX%20lno%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fnovalue%2F%3E%0A%0Aselect%20%3Fs%20%3FsLabel%20%3Frgb%20%3FedgeLabel%20%3Fo%20%3FoLabel%20%28xsd%3Ainteger%28%3Fcount%29%20as%20%3Fcorpus%29%20where%20%7B%0A%20%20%0A%20%20BIND%20%28lwb%3AQ16094%20as%20%3Ffacet%29%20%23%20specify%20root%20node%20for%20the%20representation%20here%2C%20e.g.%20%22lwb%3AQ14280%22%20for%20%22Linguistic%20Property%22%0A%0A%23%20Q-IDs%20of%20LexVoc%20facets%20%0A%23%20%20%20Q-ID%09facet%20label%0A%23%20Q14280%09linguistic%20property%0A%23%20Q14285%09dictionary%20function%0A%23%20Q14290%09dictionary%20distribution%0A%23%20Q14291%09dictionary%20structure%0A%23%20Q14317%09dictionary%20use%0A%23%20Q14352%09NLP%0A%23%20Q16014%09dictionary%20linguality%0A%23%20Q16015%09lexicographical%20process%0A%23%20Q16094%09dictionary%20scope%0A%23%20Q16129%09dictionary%20access%0A%20%20%20%0A%20%20%3Fo%20ldp%3AP5%20lwb%3AQ7%20%3B%20ldp%3AP72%2a%20%3Ffacet%20%3B%20rdfs%3Alabel%20%3FoLabel%20.%20FILTER%20%28lang%28%3FoLabel%29%3D%22en%22%29%0A%20%20%7B%3Fs%20ldp%3AP5%20lwb%3AQ7%20%3B%20ldp%3AP72%2a%20%3Ffacet%20.%7D%20UNION%20%7B%3Fs%20ldp%3AP77%20%3Fo.%7D%0A%20%20%3Fs%20rdfs%3Alabel%20%3FsLabel%20.%20FILTER%20%28lang%28%3FsLabel%29%3D%22en%22%29%0A%20%20%3Fs%20%3Fp%20%3Fo%20.%20%0A%20%20filter%20%28%3Fp%20%3D%20ldp%3AP73%20%7C%7C%20%3Fp%20%3D%20ldp%3AP77%29%20%23%20skos%3Anarrower%20%2F%20skos%3AcloseMatch%20%20%20%0A%20%20%3Fedge%20wikibase%3AdirectClaim%20%3Fp%20%3B%20rdfs%3Alabel%20%3FedgeLabel%20.%0A%20%23%20OPTIONAL%20%7B%3Fs%20lp%3AP109%20%3Fcountstatement%20.%20%3Fcountstatement%20lps%3AP109%20%3Fcount%3B%20lpq%3AP84%20%22LexBib%20Oct%202021%20stopterms%22%20.%20%7D%20%0A%0A%20%20%23%20distance%20from%20facet%20node%20%28number%20of%20broader%20concepts%29%0A%20%20%7B%20select%20%3Fo%20%28count%20%28%3Fbroader%29%20as%20%3Fdistance%29%20where%20%7B%0A%20%20%20%20%20%20OPTIONAL%20%7B%3Fo%20ldp%3AP72%2B%20%3Fbroader%20.%20%7D%7D%20GROUP%20BY%20%3Fo%20%3Fdistance%20%7D%0A%0A%20%20%23%20colouring%0A%20%20BIND%20%28%0A%20%20COALESCE%28%0A%20%20%20%20IF%28%3Fs%20%3D%20%3Ffacet%20%2C%20%220000CC%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%221%22%20%2C%20%22FF9999%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%222%22%20%2C%20%22FFB266%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%223%22%20%2C%20%22FFFF99%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%224%22%20%2C%20%22CCFF99%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%225%22%20%2C%20%22CCFFE5%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%226%22%20%2C%20%22DDFFE5%22%2C%201%2F0%29%2C%0A%20%20%20%20%22FFFF99%22%0A%20%20%29%20AS%20%3Frgb%0A%29%0A%20%20%7D%20GROUP%20BY%20%3Fs%20%3FsLabel%20%3Fdistance%20%3Frgb%20%3FedgeLabel%20%3Fo%20%3FoLabel%20%3Fcount Build live]. | ||
==Sources== | |||
Sources for LexVoc have been the following: | Sources for LexVoc have been the following: | ||
Line 18: | Line 20: | ||
# The [[Item:Q14504|index]] of the volume ''Using Online Dictionaries'' ([[Item:Q3589|Müller-Spitzer 2014]]), members [https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14504&limit=500 here] | # The [[Item:Q14504|index]] of the volume ''Using Online Dictionaries'' ([[Item:Q3589|Müller-Spitzer 2014]]), members [https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14504&limit=500 here] | ||
# The [[Item:Q14512|''Linguistic Property'' branch]] of the [http://linguistics-ontology.org/ GOLD ontology], members [https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14512&limit=500 here] | # The [[Item:Q14512|''Linguistic Property'' branch]] of the [http://linguistics-ontology.org/ GOLD ontology], members [https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14512&limit=500 here] | ||
# The [https://www.lexinfo.net/ LexInfo ontology], members [https://lexbib.elex.is/ | # The [https://www.lexinfo.net/ LexInfo ontology], members [https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q15469&limit=500 here] | ||
We have merged all concepts stemming from sources (1) to (6), and set relations between them, so that terms can be represented as nodes in a single graph, with SKOS relations as edges. | We have merged all concepts stemming from sources (1) to (6), and set relations between them, so that terms can be represented as nodes in a single graph, with SKOS relations as edges. | ||
Line 24: | Line 26: | ||
In a second step, we have (7) extended the vocabulary with a [[Item:Q14510|manually revised subset]] ([https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14510&limit=500 members]) of salient term candidates, extracted from a corpus compiled using all English full texts present in the collection used for [[Elexifinder]] version 2 (Spring 2021). We have then extended the vocabulary further, using term extraction results from subsets of our English full texts. This has been done for the field of [[Item:Q15007|dictionary digitization]], belonging to [[Item:Q14318|lexicographical process]] facet. | In a second step, we have (7) extended the vocabulary with a [[Item:Q14510|manually revised subset]] ([https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14510&limit=500 members]) of salient term candidates, extracted from a corpus compiled using all English full texts present in the collection used for [[Elexifinder]] version 2 (Spring 2021). We have then extended the vocabulary further, using term extraction results from subsets of our English full texts. This has been done for the field of [[Item:Q15007|dictionary digitization]], belonging to [[Item:Q14318|lexicographical process]] facet. | ||
While (1), (2), (5) and (6) are "top-down" designed models, (3), (4) and (7) are collections that can be characterized as built "bottom-up". Terms stemming from the latter have been connected to terms stemming from the former using [[Property:P72|"skos:broader"]]. The criterion for defining broader-relations has been the article indexation point-of-view. That is, if an article mentions a term, that implies that it deals with the broaders of that term as well. This allows to refine faceted searches in [[Elexifinder]]. For example, a search result containing articles that talk about [[Item:Q15506|"grammatical case"]] can be refined to articles talking about [[Item:Q15061|"ergative case"]]. | While (1), (2), (5) and (6) are "top-down" designed models, (3), (4) and (7) are collections that can be characterized as built "bottom-up". Such a mixed strategy for ontology (or vocabulary) building has [http://doi.org/10.1017/S0269888900007797 elsewhere] been called "middle-out" approach. Terms stemming from the latter have been connected to terms stemming from the former using [[Property:P72|"skos:broader"]]. The criterion for defining broader-relations has been the article indexation point-of-view. That is, if an article mentions a term, that implies that it deals with the broaders of that term as well. This allows to refine faceted searches in [[Elexifinder]]. For example, a search result containing articles that talk about [[Item:Q15506|"grammatical case"]] can be refined to articles talking about [[Item:Q15061|"ergative case"]]. | ||
LexVoc top-level concepts ("facets" of [[Item:Q1|Lexicography]]) as they are defined today are listed below. Names of natural languages have been part of the vocabulary in LexBib experimental version 2, and articles have been indexed with language names found in their full texts ([https://data.lexbib.org/wiki/Item:Q385 example]). This facet currently is not maintained, and not used as [[Elexifinder]] category, since natural languages as search filter are available through wikification ([[Elexifinder]] ''concepts'', not ''categories''.) | LexVoc top-level concepts ("facets" of [[Item:Q1|Lexicography]]) as they are defined today are listed below. Names of natural languages have been part of the vocabulary in LexBib experimental version 2, and articles have been indexed with language names found in their full texts ([https://data.lexbib.org/wiki/Item:Q385 example]). This facet currently is not maintained, and not used as [[Elexifinder]] category, since natural languages as search filter are available through wikification ([[Elexifinder]] ''concepts'', not ''categories''.) | ||
Line 30: | Line 32: | ||
== Elexifinder Categories == | == Elexifinder Categories == | ||
Terms that serve as [[Elexifinder]] category belong to the first | Terms that serve as [[Elexifinder]] category belong to the first four [[Property:P72|skos:broader]] hierarchy levels below root. Terms deeper in the hierarchy are considered in article indexation, and so are [[Property:P77|closeMatch]] terms without own broader-hierarchy, but the assigned category visible on [[Elexifinder]] will be the corresponding broader category of the third level below a LexVoc facet, i.e. fourth level below the [[Item:Q1|root concept]]. | ||
* Graph view, [[Elexifinder]] categories only (upper | * Graph view, [[Elexifinder]] categories only (upper four hierarchy levels): [https://lexbib.elex.is/query/#%23defaultView%3AGraph%0A%23%20this%20shows%20only%20three%20levels%20of%20skos%3Abroader%20relation%20below%20Q1.%0APREFIX%20lwb%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fentity%2F%3E%0APREFIX%20ldp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fdirect%2F%3E%0APREFIX%20lp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2F%3E%0APREFIX%20lps%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fstatement%2F%3E%0APREFIX%20lpq%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fqualifier%2F%3E%0APREFIX%20lpr%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Freference%2F%3E%0APREFIX%20lno%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fnovalue%2F%3E%0A%0Aselect%20distinct%20%0A%3Fs%20%3FsLabel%20%0A%23%3Fdistance%20%0A%3Frgb%20%0A%3Ft%20%3FtLabel%0A%0Awhere%20%7B%0A%20%20%3Fs%20ldp%3AP5%20lwb%3AQ7.%0A%20%20%3Fs%20ldp%3AP72%7Cldp%3AP72%2Fldp%3AP72%20%3Ffacet.%0A%20%20%3Ffacet%20ldp%3AP131%20lwb%3AQ1.%0A%23%20%20%20UNION%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%23%20%20%20%7B%20%3Ft1%20ldp%3AP131%20lwb%3AQ1.%20%3Ft%20ldp%3AP72%20%3Ft1.%20%7D%20%0A%23%20%20%20UNION%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%0A%23%20%20%20%7B%20%3Ft1%20ldp%3AP131%20lwb%3AQ1.%20%3Ft%20ldp%3AP72%2Fldp%3AP72%20%3Ft1.%20%7D%20%0A%20%20%0A%20%20%3Fs%20ldp%3AP72%20%3Ft.%0A%20%20%3Fs%20rdfs%3Alabel%20%3FsLabel%20.%20FILTER%20%28lang%28%3FsLabel%29%3D%22en%22%29%0A%20%20%3Ft%20rdfs%3Alabel%20%3FtLabel%20.%20FILTER%20%28lang%28%3FtLabel%29%3D%22en%22%29%0A%20%20%0A%20%20%23%20distance%20from%20facet%20node%20%28number%20of%20broader%20concepts%29%0A%20%20%7B%20select%20%3Fs%20%28count%20%28%3Fbroader%29%20as%20%3Fdistance%29%20where%20%7B%0A%20%20%20%20%20%20OPTIONAL%20%7B%3Fs%20ldp%3AP72%2B%20%3Fbroader%20.%20%7D%7D%20GROUP%20BY%20%3Fs%20%3Fdistance%20%7D%0A%0A%20%20%23%20colouring%0A%20%20BIND%20%28%0A%20%20COALESCE%28%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%221%22%20%2C%20%22CCFFE5%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%222%22%20%2C%20%22FF9999%22%2C%201%2F0%29%2C%0A%20%20%20%20IF%28str%28%3Fdistance%29%3D%223%22%20%2C%20%22FFB266%22%2C%201%2F0%29%2C%0A%20%20%20%23%20IF%28str%28%3Fdistance%29%3D%223%22%20%2C%20%22FFFF99%22%2C%201%2F0%29%2C%0A%20%20%20%20%220000CC%22%0A%20%20%29%20AS%20%3Frgb%0A%20%20%29%0A%20%20%7D%20GROUP%20BY%20%0A%20%20%3Fs%20%3FsLabel%20%20%20%0A%20%20%23%3Fdistance%0A%20%20%3Frgb%20%0A%20%20%3Ft%20%3FtLabel%0A%20%20 Query]. | ||
== LexVoc Facets == | == LexVoc Facets == | ||
Line 42: | Line 44: | ||
Click on the “Graph” links to get a graph representation of the following top-level "facet" concepts and all levels of narrower concepts. | Click on the “Graph” links to get a graph representation of the following top-level "facet" concepts and all levels of narrower concepts. | ||
* Dictionary access: [https://lexbib | |||
== LexVoc development and full text indexation process == | == LexVoc development and full text indexation process == | ||
Line 59: | Line 61: | ||
Regarding inclusion of new terms: As soon as a new term (i.e. a member of [[Item:Q7|Class "Term"]], and of at least one [[Item:Q33|skos:Collection]]) is linked to a member of the main SKOS graph (i.e., the terms defined as narrowers of one LexVoc Facet) using [[Property:P72|skos:broader]] or [[Property:P77|skos:closeMatch]], it is considered in subsequent iterations of article full text indexation. On the other hand, if a term has no [[Property:P72|skos:broader]] or [[Property:P77|skos:closeMatch]] relation to another term, it is not (any more) considered for article full text indexation. An article full text indexation iteration (performed locally by [[User:DavidL]]) thus reflects the state of LexVoc at that particular point of time. | Regarding inclusion of new terms: As soon as a new term (i.e. a member of [[Item:Q7|Class "Term"]], and of at least one [[Item:Q33|skos:Collection]]) is linked to a member of the main SKOS graph (i.e., the terms defined as narrowers of one LexVoc Facet) using [[Property:P72|skos:broader]] or [[Property:P77|skos:closeMatch]], it is considered in subsequent iterations of article full text indexation. On the other hand, if a term has no [[Property:P72|skos:broader]] or [[Property:P77|skos:closeMatch]] relation to another term, it is not (any more) considered for article full text indexation. An article full text indexation iteration (performed locally by [[User:DavidL]]) thus reflects the state of LexVoc at that particular point of time. | ||
Full text indexation with LexVoc terms, version 3, | Full text indexation with LexVoc terms, version 3, has recently been done for articles written in English and Spanish. You can see in how many articles a term (i.e., at least one of the labels of that term) has been found in LexBib full texts (see below). You can also see single articles and the 10 most frequent associated LexVoc terms on a single bibliographical item's wikipage ([[Item:Q13936|example]]). | ||
In how many articles have we found LexVoc terms (narrowers of "[[Item:Q1|Lexicography]]")? [https://lexbib.elex.is/query/#PREFIX%20lwb%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fentity%2F%3E%0APREFIX%20ldp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fdirect%2F%3E%0APREFIX%20lp%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2F%3E%0APREFIX%20lps%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fstatement%2F%3E%0APREFIX%20lpq%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fqualifier%2F%3E%0APREFIX%20lpr%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Freference%2F%3E%0APREFIX%20lno%3A%20%3Chttps%3A%2F%2Flexbib.elex.is%2Fprop%2Fnovalue%2F%3E%0A%0Aselect%20%3FTerm%20%3FenPrefLabel%20%28group_concat%28%3FenAltLabel%3BSEPARATOR%3D%22%3B%22%29%20as%20%3FenAltLabels%29%20%3Fcorpus_hits%0A%0Awhere%20%7B%0A%20%20%3FTerm%20ldp%3AP5%20lwb%3AQ7%3B%0A%20%20rdfs%3Alabel%20%3FenPrefLabel.%20filter%28lang%28%3FenPrefLabel%29%3D%22en%22%29%0A%20%7B%20%3FTerm%20ldp%3AP72%2a%20%3Ffacet%20.%20%3Ffacet%20ldp%3AP131%20lwb%3AQ1.%7D%20%23%20present%20in%20narrower-broader-tree%20with%20LexVoc%20facet%20as%20broader%0A%20%20%20UNION%0A%20%7B%20%3FTerm%20ldp%3AP77%20%3FcloseMatch.%20%3FcloseMatch%20ldp%3AP72%2a%20lwb%3AQ1%20.%20%7D%20%23%20includes%20closeMatch%20items%20without%20own%20broader-rels%0A%20%0A%20OPTIONAL%20%7B%20%3FTerm%20skos%3AaltLabel%20%3FenAltLabel.%0A%20%20%20%20%20%20%20%20%20%20%20%20filter%28lang%28%3FenAltLabel%29%3D%22en%22%29%20%7D%0A%20OPTIONAL%20%7B%20%3FTerm%20lp%3AP109%20%5Blps%3AP109%20%3Fcorpus_hits%20%3B%20lpq%3AP84%20%22LexBib%20en%2Fes%2012-2021%22%5D.%7D%0A%0A%20%20%7D%20group%20by%20%3FTerm%20%3FenPrefLabel%20%3Fcorpus_hits%0A%20%20%20%20order%20by%20DESC%20%28xsd%3Ainteger%28%3Fcorpus_hits%29%299 Query]. |