LexVoc: Difference between revisions

Line 24: Line 24:
In a second step, we have (7) extended the vocabulary with a [[Item:Q14510|manually revised subset]] ([https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14510&limit=500 members]) of salient term candidates, extracted from a corpus compiled using all English full texts present in the collection used for [[Elexifinder]] version 2 (Spring 2021). We have then extended the vocabulary further, using term extraction results from subsets of our English full texts. This has been done for the field of [[Item:Q15007|dictionary digitization]], belonging to [[Item:Q14318|lexicographical process]] facet.
In a second step, we have (7) extended the vocabulary with a [[Item:Q14510|manually revised subset]] ([https://lexbib.elex.is/w/index.php?title=Special:WhatLinksHere/Item:Q14510&limit=500 members]) of salient term candidates, extracted from a corpus compiled using all English full texts present in the collection used for [[Elexifinder]] version 2 (Spring 2021). We have then extended the vocabulary further, using term extraction results from subsets of our English full texts. This has been done for the field of [[Item:Q15007|dictionary digitization]], belonging to [[Item:Q14318|lexicographical process]] facet.


While (1), (2), (5) and (6) are "top-down" designed models, (3), (4) and (7) are collections that can be characterized as built "bottom-up". Terms stemming from the latter have been connected to terms stemming from the former using [[Property:P72|"skos:broader"]]. The criterion for defining broader-relations has been the article indexation point-of-view. That is, if an article mentions a term, that implies that it deals with the broaders of that term as well. This allows to refine faceted searches in [[Elexifinder]]. For example, a search result containing articles that talk about [[Item:Q15506|"grammatical case"]] can be refined to articles talking about [[Item:Q15061|"ergative case"]].
While (1), (2), (5) and (6) are "top-down" designed models, (3), (4) and (7) are collections that can be characterized as built "bottom-up". Such a mixed strategy for ontology (or vocabulary) building has [http://doi.org/10.1017/S0269888900007797 elsewhere] been called "middle-out" approach. Terms stemming from the latter have been connected to terms stemming from the former using [[Property:P72|"skos:broader"]]. The criterion for defining broader-relations has been the article indexation point-of-view. That is, if an article mentions a term, that implies that it deals with the broaders of that term as well. This allows to refine faceted searches in [[Elexifinder]]. For example, a search result containing articles that talk about [[Item:Q15506|"grammatical case"]] can be refined to articles talking about [[Item:Q15061|"ergative case"]].


LexVoc top-level concepts ("facets" of [[Item:Q1|Lexicography]]) as they are defined today are listed below. Names of natural languages have been part of the vocabulary in LexBib experimental version 2, and articles have been indexed with language names found in their full texts ([https://data.lexbib.org/wiki/Item:Q385 example]). This facet currently is not maintained, and not used as [[Elexifinder]] category, since natural languages as search filter are available through wikification ([[Elexifinder]] ''concepts'', not ''categories''.)
LexVoc top-level concepts ("facets" of [[Item:Q1|Lexicography]]) as they are defined today are listed below. Names of natural languages have been part of the vocabulary in LexBib experimental version 2, and articles have been indexed with language names found in their full texts ([https://data.lexbib.org/wiki/Item:Q385 example]). This facet currently is not maintained, and not used as [[Elexifinder]] category, since natural languages as search filter are available through wikification ([[Elexifinder]] ''concepts'', not ''categories''.)