TransLiTex: A Parallel Corpus of Translated Literary Texts - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

TransLiTex: A Parallel Corpus of Translated Literary Texts

Résumé

In this paper, we present our ongoing research work to create a massively parallel corpus of translated literary texts which is useful for applications in computational linguistics, translation studies and cross-linguistic corpus studies. Using a crowdsourcing approach, we identified and collected 29 translations of Mark Twain's Adventures of Huckleberry Finn published in 23 languages including less-resourced languages. We report on the current status of the corpus, with 5 chapter-aligned translations (English-Dutch, two English-Hungarian, English-Polish and English-Russian). We evaluated the correctness of chapter alignment by computing the percentage of common words between the English version and the translated ones. Results show high percentages that vary between 43% and 64% proving the high correctness of chapter alignment.
Fichier principal
Vignette du fichier
11_W34.pdf (183.66 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01827884 , version 1 (02-07-2018)

Identifiants

  • HAL Id : hal-01827884 , version 1

Citer

Amel Fraisse, Quoc-Tan Tran, Ronald Jenn, Patrick Paroubek, Shelley Fisher Fishkin. TransLiTex: A Parallel Corpus of Translated Literary Texts. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Beijing Advanced Innovation Center for Language Resources, May 2018, Miyazaki, Japan. ⟨hal-01827884⟩
669 Consultations
349 Téléchargements

Partager

Gmail Facebook X LinkedIn More