Steps for building the database for Cantalausa's dictionary

Domenge published on
5 min, 878 words

Cantalausa project

The project works on two opus of the abbot Louis Combes aka Cantalausa, the dictionary and Lenga Viva.

Cantalausa dictionary

The Cantalausa's dictionary comes with a pack of .pdf files for each letter of the occitan alphabet. From each letter we take the content of the pdf in text format to end with a batch of [letter].txt in a dedicated directory.

These files will be the raw material of the process to populate the database.

Read More
#