LexVoc

Revision as of 09:21, 12 February 2024 by DavidL (talk | contribs) (→‎LexVoc Facets)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

LexVoc Vocabulary of Lexicographic Terms

The vocabulary is being developed for different purposes:

  • Elexifinder categories are derived from LexVoc, and are available for search and filter functions in the Elexifinder app, complementary to Elexifinder concepts (obtained through automatic wikifikation, which is done using components of the Event Registry software (see describing article) built into Elexifinder).
  • Content describing indexation of LexBib Zotero bibliographical items, by mining their corresponding full texts.
  • Content describing indexation of lexical-conceptual resources such as dictionaries. Concept schemes derived from LexVoc branches are used as range of LexMeta properties, i.e., some properties describing lexical resources point to LexVoc concepts.
  • LexVoc is organized as thesaurus-like concept tree, with SKOS relations as edges, and “Lexicography” as root node. This concept tree can be seen as ramification of the “Lexicography” subject heading as listed in cross-domain library vocabularies; see e.g. “Lexicography” in LCSH, or “Lexicography” in BLL.

LexVoc concepts have been translated to a range of languages (see LexVoc_translation_on_Lexonomy), so that LexVoc can be regarded a multilingual lexicography thesaurus. Concept lexicalizations in other languages than the source have been drafted from BabelNet and Wikidata, and manually validated and completed in a community effort during the lifetime of the Elexis project.

  • A graph visualization, complete, and coloured, for terms attached to LexVoc main facets: Build on the fly.
  • A list of LexVoc concepts with English lexicalizations, Wikidata equivalents, and LexBib corpus hits (by the time being, having processed English and Spanish articles) here.

Sources

Sources for LexVoc have been the following:

  1. An updated and extended version of the index of Bibliografía Temática de la Lexicografía (Córdoba Rodríguez 2003) translated to English, members here
  2. The typology of dictionaries by Engelberg and Storrer (2016), members here
  3. The Glossary of Lexicographic Terms by Kipfer (2013), members here
  4. The index of the volume Using Online Dictionaries (Müller-Spitzer 2014), members here
  5. The Linguistic Property branch of the GOLD ontology, members here
  6. The LexInfo ontology, members here

We have merged all concepts stemming from sources (1) to (6), and set relations between them, so that terms can be represented as nodes in a single graph, with SKOS relations as edges.

In a second step, we have (7) extended the vocabulary with a manually revised subset (members) of salient term candidates, extracted from a corpus compiled using all English full texts present in the collection used for Elexifinder version 2 (Spring 2021). We have then extended the vocabulary further, using term extraction results from subsets of our English full texts. This has been done for the field of dictionary digitization, belonging to lexicographical process facet.

While (1), (2), (5) and (6) are "top-down" designed models, (3), (4) and (7) are collections that can be characterized as built "bottom-up". Such a mixed strategy for ontology (or vocabulary) building has elsewhere been called "middle-out" approach. Terms stemming from the latter have been connected to terms stemming from the former using "skos:broader". The criterion for defining broader-relations has been the article indexation point-of-view. That is, if an article mentions a term, that implies that it deals with the broaders of that term as well. This allows to refine faceted searches in Elexifinder. For example, a search result containing articles that talk about "grammatical case" can be refined to articles talking about "ergative case".

LexVoc top-level concepts ("facets" of Lexicography) as they are defined today are listed below. Names of natural languages have been part of the vocabulary in LexBib experimental version 2, and articles have been indexed with language names found in their full texts (example). This facet currently is not maintained, and not used as Elexifinder category, since natural languages as search filter are available through wikification (Elexifinder concepts, not categories.)

Elexifinder Categories

Terms that serve as Elexifinder category belong to the first four skos:broader hierarchy levels below root. Terms deeper in the hierarchy are considered in article indexation, and so are closeMatch terms without own broader-hierarchy, but the assigned category visible on Elexifinder will be the corresponding broader category of the third level below a LexVoc facet, i.e. fourth level below the root concept.

LexVoc Facets

The following concepts are directly linked to term node “Lexicography” using "is facet of". These are Elexifinder main categories, i.e. top-level concepts. Two narrower-levels below each facet are also considered as (visible) Elexifinder categories, while concepts deeper in the hierarchy are used alike in article indexation.

The definition of LexVoc facets is semantically more correct than simply linking top-level concepts to the root node “Lexicography” using "skos:broader" (as it had been done in previous versions of LexVoc), since the top-level concepts do represent different facets or "lenses" rather than narrower terms of "Lexicography". Furthermore, this model not only offers a straightforward way to define Elexifinder main categories, but also paves the ground for defining properties defining dictionary content. A lexical-conceptual resource (such as a dictionary) can be indexed with properties resembling the different facets, and having each one a defined value range that consists of the narrower terms of the corresponding facet. For example, a dictionary may be indexed using "dictionary scope" property with any value defined as narrower term of facet "dictionary scope". The definition of facets has been done with this goal in mind; accordingly, "dictionary linguality" has been taken out of "dictionary scope" and defined as a facet on its own, so that its narrowers can serve as range for property "linguality type", and so on. [Note: a data model dedicated to dictionary metadata and a workflow for content-describing indexation of dictionaries is planned for 2022.]

Click on the “Graph” links to get a graph representation of the following top-level "facet" concepts and all levels of narrower concepts.

  • Dictionary access: Graph.
  • Dictionary distribution: Graph.
  • Dictionary function: Graph.
  • Dictionary linguality: Graph.
  • Dictionary scope: Graph.
  • Dictionary structure: Graph.
  • Dictionary use: Graph.
  • Lexicographical process: Graph.
  • Linguistic property: Graph.
  • NLP: Graph.

LexVoc development and full text indexation process

LexVoc, in its actual state, is by no means complete nor finished. It is ongoing work. On Elexis LexMeet, you can contribute to the discussion about LexVoc structure, i.e. the definition of facets, and the inner organization of these. We also call for collaboration for LexVoc translation on Lexonomy (see our 2021 Euralex paper).

Regarding inclusion of new terms: As soon as a new term (i.e. a member of Class "Term", and of at least one skos:Collection) is linked to a member of the main SKOS graph (i.e., the terms defined as narrowers of one LexVoc Facet) using skos:broader or skos:closeMatch, it is considered in subsequent iterations of article full text indexation. On the other hand, if a term has no skos:broader or skos:closeMatch relation to another term, it is not (any more) considered for article full text indexation. An article full text indexation iteration (performed locally by User:DavidL) thus reflects the state of LexVoc at that particular point of time.

Full text indexation with LexVoc terms, version 3, has recently been done for articles written in English and Spanish. You can see in how many articles a term (i.e., at least one of the labels of that term) has been found in LexBib full texts (see below). You can also see single articles and the 10 most frequent associated LexVoc terms on a single bibliographical item's wikipage (example).

In how many articles have we found LexVoc terms (narrowers of "Lexicography")? Query.