Calelh

Domenge published on
4 min, 748 words

https://calelh.osca.dev

Calelh presentation

The Calelh application is the Louis Alibert's dictionary digitalized. Being numeric gives a new shine to the remarkable work of the linguist. Now its dictionary is only available in a facsimile edition, however a numeric version of the second part (entries and definitions) has been typed by to the Paul Valery University of Montpellier, there are those data that were used to populate the database.

The first part of the dictionary is the booklet and has been typed by ourselves inside the Calelh project.

Alibert's work concentrates on listing the lemmas as entries. A lemma is exploded in all the terms producted following the derivations or the compositions. Data processing starts from the lemma to develop its production in an ontologic form.

The booklet (first introductory pages)

There is plenty of information in the first pages of a dictionary.

Those pages are organized according to an ensenhador (a table of content). The site mimics the book by following the same organization.

The booklet has four parts :

  • Phonetic mutations of the lengadocian parlance exposes its linguistic terminology;
  • Morphology (how occitan popular words are elaborated);
  • How greek and latin words are used to form scientific and scolar occitan words;
  • The list of the abbreviations according to their type.

Across the booklet, text is enriched to ease the reading by highlighting and discriminating recommanded forms among the used ones. Inside, the markup language and CSS help to isolate and mark the terms for an easy and sure extraction.

types of abbreviation

  • POS for part of speech,
  • LOC for the word localization,
  • STRUCT to qualifiy the definitions structure,
  • ACCEPTION to deambiguate the different meanings,
  • META for contextual information not valorized yet.
Read More

Calelh mòde d'emplec e gramatica

Domenge published on
3 min, 599 words

Calelh mode d’emplec e gramatica

Definicions

Vedeta : la raiça de la ierarquia es l’entrada occitana.

Clausa : es un ensemble de letras que son darrièr caractèr es un ‘:’ sens cap d’espaci entre los dos.

Ex :

  • derv:
  • f:

ClausaQualificativa : Clausa que qualifica l’atribut que seguís. Es terminala. La qualificacion es sus la meteissa linha.

Ex :

  • pos: adj. per PartOfSpeech adjectiu ;
  • f: languir per en français languir.

ClausaLocalizaira : Clausa qualificativa especializada dins la localizacion, pòt èsser non terminala.

Read More

Feminine rules from Cantalausa's dictionary

Domenge published on
3 min, 465 words

Those rules are extracted from the Cantalausa's dictionary. Whenever necessary a dictionary entry is given with a abbreviation for the feminine forms due to the sparcity of the paper edition of the dictionary. It is hard to read and do not give the feminine forms their due consideration.

In a numeric dictionary there is no place for such a biaised treatment. So we have to restaure the feminine entries by establishing a optimized listing of the inferred rules (from the abbreviations).

Then we must take care of applying each rule in the right order to avoid the side effect of a rule too greedy that could override a more precise one.

Read More

Steps for building the database for Cantalausa's dictionary

Domenge published on
5 min, 878 words

Cantalausa project

The project works on two opus of the abbot Louis Combes aka Cantalausa, the dictionary and Lenga Viva.

Cantalausa dictionary

The Cantalausa's dictionary comes with a pack of .pdf files for each letter of the occitan alphabet. From each letter we take the content of the pdf in text format to end with a batch of [letter].txt in a dedicated directory.

These files will be the raw material of the process to populate the database.

Read More
#