Feminine rules from Cantalausa's dictionary

Domenge published on
3 min, 465 words

Those rules are extracted from the Cantalausa's dictionary. Whenever necessary a dictionary entry is given with a abbreviation for the feminine forms due to the sparcity of the paper edition of the dictionary. It is hard to read and do not give the feminine forms their due consideration.

In a numeric dictionary there is no place for such a biaised treatment. So we have to restaure the feminine entries by establishing a optimized listing of the inferred rules (from the abbreviations).

Then we must take care of applying each rule in the right order to avoid the side effect of a rule too greedy that could override a more precise one.

Read More

Steps for building the database for Cantalausa's dictionary

Domenge published on
5 min, 878 words

Cantalausa project

The project works on two opus of the abbot Louis Combes aka Cantalausa, the dictionary and Lenga Viva.

Cantalausa dictionary

The Cantalausa's dictionary comes with a pack of .pdf files for each letter of the occitan alphabet. From each letter we take the content of the pdf in text format to end with a batch of [letter].txt in a dedicated directory.

These files will be the raw material of the process to populate the database.

Read More
#