Publications of year 2014

Publications of year 2014

Conference articles

Ergun Biçici, Qun Liu, and Andy Way. Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems. In Ninth Workshop on Statistical Machine Translation, Baltimore, MD, USA, pages 59-65, 6 2014. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Language Modeling. Abstract:

We use parallel FDA5, an efficiently parameterized and optimized parallel implementation of feature decay algorithms for fast deployment of accurate statistical machine translation systems, taking only about half a day for each translation direction. We build Parallel FDA5 Moses SMT systems for all language pairs in the WMT14 translation task and obtain SMT performance close to the top Moses systems with an average of $3.49$ BLEU points difference using significantly less resources for training and development.

@InProceedings{Bicici:ParFDA:WMT2014,
author = {Ergun Bi\c{c}ici and Qun Liu and Andy Way},
title = {Parallel {FDA5} for Fast Deployment of Accurate Statistical Machine Translation Systems},
booktitle = {{N}inth {W}orkshop on {S}tatistical {M}achine {T}ranslation},
month = {6},
year = {2014},
pages = {59--65},
address = {Baltimore, MD, USA},
url = {https://aclanthology.info/papers/W14-3303/w14-3303},
keywords = {Machine Translation, Machine Learning, Language Modeling},
pdf = {http://bicici.github.io/publications/2014/ParFDA5forFDASMT.pdf},
abstract = {We use parallel FDA5, an efficiently parameterized and optimized parallel implementation of feature decay algorithms for fast deployment of accurate statistical machine translation systems, taking only about half a day for each translation direction. We build Parallel FDA5 Moses SMT systems for all language pairs in the WMT14 translation task and obtain SMT performance close to the top Moses systems with an average of $3.49$ BLEU points difference using significantly less resources for training and development.},

}

Ergun Biçici and Andy Way. RTM-DCU: Referential Translation Machines for Semantic Similarity. In SemEval-2014: Semantic Evaluation Exercises - International Workshop on Semantic Evaluation, Dublin, Ireland, pages 487-496, 8 2014. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Performance Prediction, Semantic Similarity. Abstract:

We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs judge the quality or the semantic similarity of text by using retrieved relevant training data as interpretants for reaching shared semantics. We derive features measuring the closeness of the test sentences to the training data via interpretants, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs provide a language independent solution to all similarity tasks and achieve top performance when predicting monolingual cross-level semantic similarity (Task 3) and good results in the semantic relatedness and entailment (Task 1) and multilingual semantic textual similarity (STS) (Task 10). RTMs remove the need to access any task or domain specific information or resource.

@InProceedings{Bicici:RTM:SEMEVAL2014,
author = {Ergun Bi\c{c}ici and Andy Way},
title = {{RTM-DCU}: Referential Translation Machines for Semantic Similarity},
booktitle = {{SemEval-2014}: Semantic Evaluation Exercises - International Workshop on Semantic Evaluation},
month = {8},
year = {2014},
pages = {487--496},
address = {Dublin, Ireland},
keywords = {Machine Translation, Machine Learning, Performance Prediction, Semantic Similarity},
pdf = {http://bicici.github.io/publications/2014/RTM_STS.pdf},
url = {https://aclanthology.info/papers/S14-2085/s14-2085},
abstract = {We use referential translation machines (RTMs) for predicting the semantic similarity of text. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs judge the quality or the semantic similarity of text by using retrieved relevant training data as interpretants for reaching shared semantics. We derive features measuring the closeness of the test sentences to the training data via interpretants, the difficulty of translating them, and the presence of the acts of translation, which may ubiquitously be observed in communication. RTMs provide a language independent solution to all similarity tasks and achieve top performance when predicting monolingual cross-level semantic similarity (Task 3) and good results in the semantic relatedness and entailment (Task 1) and multilingual semantic textual similarity (STS) (Task 10). RTMs remove the need to access any task or domain specific information or resource.},

}

Ergun Biçici and Andy Way. Referential Translation Machines for Predicting Translation Quality. In Ninth Workshop on Statistical Machine Translation, Baltimore, MD, USA, pages 313-321, 6 2014. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing. Abstract:

We use referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13). We improve our RTM models with the Parallel FDA5 instance selection model, with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT14 QET (QET14) subtask, obtain improvements over QET13 results, and rank $1$st in all of the tasks and subtasks of QET14.

@InProceedings{Bicici:RTM:WMT2014,
author = {Ergun Bi\c{c}ici and Andy Way},
title = {Referential Translation Machines for Predicting Translation Quality},
booktitle = {{N}inth {W}orkshop on {S}tatistical {M}achine {T}ranslation},
month = {6},
year = {2014},
pages = {313--321},
address = {Baltimore, MD, USA},
url = {https://aclanthology.info/papers/W14-3339/w14-3339},
keywords = {Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing},
pdf = {http://bicici.github.io/publications/2014/RTMforQE.pdf},
abstract = {We use referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13). We improve our RTM models with the Parallel FDA5 instance selection model, with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT14 QET (QET14) subtask, obtain improvements over QET13 results, and rank $1$st in all of the tasks and subtasks of QET14.},

}

Internal reports

Ergun Biçici et al.. Quality Estimation for Extending Good Translations. Technical report, Dublin City Univ., 2014. Note: EU project QTLaunchPad D2.2.2: www.qt21.eu/launchpad/content/delivered. Keyword(s): Machine Translation, Performance Prediction. Abstract:

We present experiments using quality estimation models to improve the performance of statistical machine translation (SMT) systems by supplementing their training corpora or by building sentence-specific SMT models for instances predicted as having potential for improvement by the ITERPE model. The experiments with quality-informed active learning strategy select, among alternative machine translations, those which are: (i) predicted to have high quality, and thus can be added to the machine translation system training set; (ii) predicted to have low quality, and thus need to be corrected/translated by humans, with the human corrections added to the machine translation system training set. Improvement is measured by the increase in the performance of the overall machine translation systems on held-out datasets, where performance is measured by automatic evaluation metrics comparing the scores of the original machine translation system against the score of the improved machine translation system after additional material is used. The experiments with ITERPE consist in automatically grouping translation instances into different quality bands, for instance for re-translation or for post-editing [Bicici and Specia, 2014]. This method can be helpful in automatic identification of quality barriers in MT to achieve high quality machine translation.

@techreport{QTLPD2.2.2,
author = {{Ergun Bi\c{c}ici et al.}},
title = {Quality Estimation for Extending Good Translations},
institution = {Dublin City Univ.},
year = {2014},
note = {EU project QTLaunchPad D2.2.2: www.qt21.eu/launchpad/content/delivered},
keywords = {Machine Translation, Performance Prediction},
abstract = {We present experiments using quality estimation models to improve the performance of statistical machine translation (SMT) systems by supplementing their training corpora or by building sentence-specific SMT models for instances predicted as having potential for improvement by the ITERPE model. The experiments with quality-informed active learning strategy select, among alternative machine translations, those which are: (i) predicted to have high quality, and thus can be added to the machine translation system training set; (ii) predicted to have low quality, and thus need to be corrected/translated by humans, with the human corrections added to the machine translation system training set. Improvement is measured by the increase in the performance of the overall machine translation systems on held-out datasets, where performance is measured by automatic evaluation metrics comparing the scores of the original machine translation system against the score of the improved machine translation system after additional material is used. The experiments with ITERPE consist in automatically grouping translation instances into different quality bands, for instance for re-translation or for post-editing [Bicici and Specia, 2014]. This method can be helpful in automatic identification of quality barriers in MT to achieve high quality machine translation.},

}

Ergun Biçici et al.. Quality Estimation for System Selection and Combination. Technical report, Dublin City Univ., 2014. Note: EU project QTLaunchPad D2.2.1: www.qt21.eu/launchpad/content/delivered. Keyword(s): Machine Translation, Performance Prediction. Abstract:

We present experiments using state of the art quality estimation models to improve the performance of machine translation systems without changing the internal functioning of such systems. The experiments include the following approaches: (i) n-best list re-ranking, where translation candidates (segments) produced by a machine translation system are re-ranked based on predicted quality scores such as to get the best translation ranked top; (ii) n-best list recombination, where sub-segments from the n-best list are mixed using a lattice-based approach, and the complete generated segments are scored using quality predictions and then re-ranked as in (i); (iii) system selection, where translations produced by multiple machine translation systems and a human translator are sorted according to predicted quality to select the best translated segment, including the challenging case where the source of the translation (i.e., which system/human produced it) is unknown, and (iv) diagnosis of statistical machine translation systems by looking at internal features of the decoder and their correlation with translation quality, as well as using them to predict groups of errors in the translations.

@techreport{QTLPD2.2.1,
author = {{Ergun Bi\c{c}ici et al.}},
title = {Quality Estimation for System Selection and Combination},
institution = {Dublin City Univ.},
year = {2014},
note = {EU project QTLaunchPad D2.2.1: www.qt21.eu/launchpad/content/delivered},
keywords = {Machine Translation, Performance Prediction},
abstract = {We present experiments using state of the art quality estimation models to improve the performance of machine translation systems without changing the internal functioning of such systems. The experiments include the following approaches: (i) n-best list re-ranking, where translation candidates (segments) produced by a machine translation system are re-ranked based on predicted quality scores such as to get the best translation ranked top; (ii) n-best list recombination, where sub-segments from the n-best list are mixed using a lattice-based approach, and the complete generated segments are scored using quality predictions and then re-ranked as in (i); (iii) system selection, where translations produced by multiple machine translation systems and a human translator are sorted according to predicted quality to select the best translated segment, including the challenging case where the source of the translation (i.e., which system/human produced it) is unknown, and (iv) diagnosis of statistical machine translation systems by looking at internal features of the decoder and their correlation with translation quality, as well as using them to predict groups of errors in the translations.},

}

Patents, standards

Ergun Biçici. ITERPE: Identifying Translation Errors Regardless of Prediction Errors. Invention Disclosure, 2014. Note: Invention Disclosure, DCU Invent Innovation and Enterprise: www.dcu.ie/invent/.

@patent{BiciciIDF_ITERPE2014,
title={{ITERPE:} Identifying Translation Errors Regardless of Prediction Errors},
author={Ergun Bi\c{c}ici},
year={2014},
number = {Invention Disclosure},
note = "Invention Disclosure, DCU Invent Innovation and Enterprise: www.dcu.ie/invent/",

}

BACK TO INDEX

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.

Last modified: Sun Feb 5 17:37:19 2023
Author: ebicici.

This document was translated from BibT_EX by bibtex2html