Publications of year 2013

Publications of year 2013

Articles in journal, book chapters

Ergun Biçici, Declan Groves, and Josef van Genabith. Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features. Machine Translation, 27(3-4):171-192, 2013. ISSN: 0922-6567. [doi:10.1007/s10590-013-9138-4] Keyword(s): Machine Translation, Machine Learning, Performance Prediction. Abstract:

We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe exttt{mono} feature functions that are based on statistics of only one side of the parallel training corpora and exttt{duo} feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $2$nd best performance overall according to the official results of the Quality Estimation Task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model.

@article{Bicici:MTPP:MTJ2013,
author = {Ergun Bi\c{c}ici and Declan Groves and Josef van Genabith},
title = {Predicting Sentence Translation Quality Using Extrinsic and Language Independent Features},
journal = {Machine Translation},
year = {2013},
volume = {27},
number = {3-4},
pages = {171--192},
doi = {10.1007/s10590-013-9138-4},
issn = {0922-6567},
keywords = {Machine Translation, Machine Learning, Performance Prediction},
abstract = {We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs. We derive various feature functions measuring the closeness of a given test sentence to the training data and the difficulty of translating the sentence. We describe 	exttt{mono} feature functions that are based on statistics of only one side of the parallel training corpora and 	exttt{duo} feature functions that incorporate statistics involving both source and target sides of the training data. Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations. 

We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets. We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used. We show that by just looking at the test source sentences and not using the translation outputs at all, we can achieve better performance than a baseline system using SMT model dependent features that generated the translations. Furthermore, our prediction system is able to achieve the $2$nd best performance overall according to the official results of the Quality Estimation Task (QET) challenge when also looking at the translation outputs. Our representation and features achieve the top performance in QET among the models using the SVR learning model.},

}

Kashif Shah, Eleftherios Avramidis, Ergun Biçici, and Lucia Specia. QuEst - Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation. The Prague Bulletin of Mathematical Linguistics, 100:19-30, 2013. [doi:10.2478/pralin-2013-0008]

@article{Bicici:Quest:PBML2013,
author = {Kashif Shah and Eleftherios Avramidis and Ergun Bi\c{c}ici and Lucia Specia},
title = {{QuEst} - Design, Implementation and Extensions of a Framework for Machine Translation Quality Estimation},
journal = {The Prague Bulletin of Mathematical Linguistics},
year = {2013},
volume = {100},
pages = {19--30},
doi = {10.2478/pralin-2013-0008},

}

Conference articles

Ergun Biçici. Feature Decay Algorithms for Fast Deployment of Accurate Statistical Machine Translation Systems. In Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria, pages 78-84, 8 2013. [PDF] Keyword(s): Machine Translation, Machine Learning, Language Modeling. Abstract:

We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and language models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to $86\%$ reduction in the number of OOV tokens and up to $74\%$ reduction in the perplexity. We perform SMT experiments in all language pairs in the WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.

@InProceedings{Bicici:ParFDA:WMT2013,
author = {Ergun Bi\c{c}ici},
title = {Feature Decay Algorithms for Fast Deployment of Accurate Statistical Machine Translation Systems},
booktitle = {{E}ighth {W}orkshop on {S}tatistical {M}achine {T}ranslation},
month = {8},
year = {2013},
pages = {78--84},
address = {Sofia, Bulgaria},
keywords = {Machine Translation, Machine Learning, Language Modeling},
pdf = {http://bicici.github.io/publications/2013/FDAforFDA.pdf},
abstract = {We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and language models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to $86\%$ reduction in the number of OOV tokens and up to $74\%$ reduction in the perplexity. We perform SMT experiments in all language pairs in the WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.},

}

Ergun Biçici. Referential Translation Machines for Quality Estimation. In Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria, pages 343-351, 8 2013. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing. Abstract:

We introduce referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for estimating the quality of translation outputs, judging the semantic similarity between text, and evaluating the quality of student answers. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations. We develop novel techniques for solving all subtasks in the WMT13 quality estimation (QE) task (QET2013) based on individual RTM models. Our results achieve improvements over last year's QE task results (QET 2012), as well as our previous results, provide new features and techniques for QE, and rank $1$st or $2$nd in all of the subtasks.

@InProceedings{Bicici:RTM:WMT2013,
author = {Ergun Bi\c{c}ici},
title = {Referential Translation Machines for Quality Estimation},
booktitle = {{E}ighth {W}orkshop on {S}tatistical {M}achine {T}ranslation},
month = {8},
year = {2013},
address = {Sofia, Bulgaria},
pages = {343--351},
url = {https://aclanthology.info/papers/W13-2242/w13-2242},
keywords = {Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing},
pdf = {http://bicici.github.io/publications/2013/RTMforQE.pdf},
abstract = {We introduce referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for estimating the quality of translation outputs, judging the semantic similarity between text, and evaluating the quality of student answers. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations. We develop novel techniques for solving all subtasks in the WMT13 quality estimation (QE) task (QET2013) based on individual RTM models. Our results achieve improvements over last year's QE task results (QET 2012), as well as our previous results, provide new features and techniques for QE, and rank $1$st or $2$nd in all of the subtasks.},

}

Ergun Biçici and Josef van Genabith. CNGL-CORE: Referential Translation Machines for Measuring Semantic Similarity. In *SEM 2013: The Second Joint Conf. on Lexical and Computational Semantics, Atlanta, GA, USA, pages 234-240, 6 2013. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing, Artificial Intelligence. Abstract:

We invent referential translation machines (RTMs), a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for judging the semantic similarity between text. RTMs make quality and semantic similarity judgments possible by using retrieved relevant training data as interpretants for reaching shared semantics. An MTPP (machine translation performance predictor) model derives features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of acts of translation involved. We view semantic similarity as paraphrasing between any two given texts. Each view is modeled by an RTM model, giving us a new perspective on the binary relationship between the two. Our prediction model is the 15th on some tasks and 30th overall out of 89 submissions in total according to the official results of the Semantic Textual Similarity (STS 2013) challenge.

@InProceedings{Bicici:RTM_STS:SEMEVAL2013,
author = {Ergun Bi\c{c}ici and Josef van Genabith},
title = {{CNGL-CORE}: Referential Translation Machines for Measuring Semantic Similarity},
booktitle = {{*SEM 2013}: The Second Joint Conf. on Lexical and Computational Semantics},
month = {6},
year = {2013},
address = {Atlanta, GA, USA},
pages = {234--240},
url = {https://aclanthology.info/papers/S13-1034/s13-1034},
keywords = {Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing, Artificial Intelligence},
pdf = {http://bicici.github.io/publications/2013/STS_RTM.pdf},
abstract = {We invent referential translation machines (RTMs), a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for judging the semantic similarity between text. RTMs make quality and semantic similarity judgments possible by using retrieved relevant training data as interpretants for reaching shared semantics. An MTPP (machine translation performance predictor) model derives features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of acts of translation involved. We view semantic similarity as paraphrasing between any two given texts. Each view is modeled by an RTM model, giving us a new perspective on the binary relationship between the two. Our prediction model is the 15th on some tasks and 30th overall out of 89 submissions in total according to the official results of the Semantic Textual Similarity (STS 2013) challenge.},

}

Ergun Biçici and Josef van Genabith. CNGL: Grading Student Answers by Acts of Translation. In SemEval-2013: Semantic Evaluation Exercises - International Workshop on Semantic Evaluation, Atlanta, GA, USA, pages 585-591, 6 2013. [WWW] [PDF] Keyword(s): Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing. Abstract:

We invent referential translation machines (RTMs), a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for automatically grading student answers. RTMs make quality and semantic similarity judgments possible by using retrieved relevant training data as interpretants for reaching shared semantics. An MTPP (machine translation performance predictor) model derives features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of acts of translation involved. We view question answering as translation from the question to the answer, from the question to the reference answer, from the answer to the reference answer, or from the question and the answer to the reference answer. Each view is modeled by an RTM model, giving us a new perspective on the ternary relationship between the question, the answer, and the reference answer. We show that all RTM models contribute and a prediction model based on all four perspectives performs the best. Our prediction model is the 2nd best system on some tasks according to the official results of the Student Response Analysis (SRA 2013) challenge.

@InProceedings{Bicici:RTM_SRA:SEMEVAL2013,
author = {Ergun Bi\c{c}ici and Josef van Genabith},
title = {{CNGL}: Grading Student Answers by Acts of Translation},
booktitle = {{SemEval-2013}: Semantic Evaluation Exercises - International Workshop on Semantic Evaluation},
month = {6},
year = {2013},
address = {Atlanta, GA, USA},
pages = {585--591},
url = {https://aclanthology.info/papers/S13-2098/s13-2098},
keywords = {Machine Translation, Machine Learning, Performance Prediction, Natural Language Processing},
pdf = {http://bicici.github.io/publications/2013/SRA_AOT.pdf},
abstract = {We invent referential translation machines (RTMs), a computational model for identifying the translation acts between any two data sets with respect to a reference corpus selected in the same domain, which can be used for automatically grading student answers. RTMs make quality and semantic similarity judgments possible by using retrieved relevant training data as interpretants for reaching shared semantics. An MTPP (machine translation performance predictor) model derives features measuring the closeness of the test sentences to the training data, the difficulty of translating them, and the presence of acts of translation involved. We view question answering as translation from the question to the answer, from the question to the reference answer, from the answer to the reference answer, or from the question and the answer to the reference answer. Each view is modeled by an RTM model, giving us a new perspective on the ternary relationship between the question, the answer, and the reference answer. We show that all RTM models contribute and a prediction model based on all four perspectives performs the best. Our prediction model is the 2nd best system on some tasks according to the official results of the Student Response Analysis (SRA 2013) challenge.},

}

Internal reports

Ergun Biçici et al.. Definition of Interfaces. Technical report, Dublin City Univ., 2013. Note: EU project QTLaunchPad D3.2.1: www.qt21.eu/launchpad/content/delivered. Keyword(s): Machine Translation. Abstract:

The aim of this report is to define the interfaces for the tools used in the MT development and evaluation scenarios as included in the QTLaunchPad (QTLP) infrastructure. Specification of the interfaces is important for the interaction and interoperability of the tools in the developed QTLP infrastructure. In addressing this aim, the report provides: 1. Descriptions of the common aspects of the tools and their standardized data formatsÍ¾ 2. Descriptions of the interfaces for the tools for interoperability. where the tools are categorized into preparation, development, and evaluation categories including the human interfaces for quality assessment with multidimensional quality metrics. Interface specifications allow a modular tool infrastructure, flexibly selecting among alternative implementations, enabling realistic expectations to be made at different sections of the QTLP information flow pipeline, and supporting the QTLP infrastructure. D3.2.1 allows the emergence of the QTLP infrastructure and helps the identification and acquisition of existing tools (D4.4.1), the integration of identified language processing tools (D3.3.1), their implementation (D3.4.1), and their testing (D3.5.1). QTLP infrastructure will facilitate the organization and running of the quality translation shared task (D5.2.1). We also provide human interfaces for translation quality assessment with the multidimensional quality metrics (D1.1.1). D3.2.1 is a living document until M12, which is when the identification and acquisition of existing tools (D4.4.1) and the implementation of identified language processing tools (D3.4.1) are due.

@techreport{QTLPD3.2.1,
author = {{Ergun Bi\c{c}ici et al.}},
title = {Definition of Interfaces},
institution = {Dublin City Univ.},
year = {2013},
note = {EU project QTLaunchPad D3.2.1: www.qt21.eu/launchpad/content/delivered},
keywords = {Machine Translation},
abstract = {The aim of this report is to define the interfaces for the tools used in the MT development and evaluation scenarios as included in the QTLaunchPad (QTLP) infrastructure. Specification of the interfaces is important for the interaction and interoperability of the tools in the developed QTLP infrastructure. In addressing this aim, the report provides: 1. Descriptions of the common aspects of the tools and their standardized data formatsÍ¾ 2. Descriptions of the interfaces for the tools for interoperability. where the tools are categorized into preparation, development, and evaluation categories including the human interfaces for quality assessment with multidimensional quality metrics. Interface specifications allow a modular tool infrastructure, flexibly selecting among alternative implementations, enabling realistic expectations to be made at different sections of the QTLP information flow pipeline, and supporting the QTLP infrastructure. D3.2.1 allows the emergence of the QTLP infrastructure and helps the identification and acquisition of existing tools (D4.4.1), the integration of identified language processing tools (D3.3.1), their implementation (D3.4.1), and their testing (D3.5.1). QTLP infrastructure will facilitate the organization and running of the quality translation shared task (D5.2.1). We also provide human interfaces for translation quality assessment with the multidimensional quality metrics (D1.1.1). D3.2.1 is a living document until M12, which is when the identification and acquisition of existing tools (D4.4.1) and the implementation of identified language processing tools (D3.4.1) are due.},

}

Ergun Biçici et al.. Definition of Machine Translation and Evaluation Workflows. Technical report, Dublin City Univ., 2013. Note: EU project QTLaunchPad D3.1.1: www.qt21.eu/launchpad/content/delivered. Keyword(s): Machine Translation.

@techreport{QTLPD3.1.1,
author = {{Ergun Bi\c{c}ici et al.}},
title = {Definition of Machine Translation and Evaluation Workflows},
institution = {Dublin City Univ.},
year = {2013},
note = {EU project QTLaunchPad D3.1.1: www.qt21.eu/launchpad/content/delivered},
keywords = {Machine Translation},

}

Ergun Biçici et al.. Quality Estimation for Dissemination. Technical report, Dublin City Univ., 2013. Note: EU project QTLaunchPad D2.1.3: www.qt21.eu/launchpad/content/delivered. Keyword(s): Machine Translation, Performance Prediction. Abstract:

We present benchmarking experiments for the intrinsic and extrinsic evaluation of an extended version of our open source framework for machine translation quality estimation QUEST, which is described in D2.1.2. We focus on the application of quality predictions for dissemination by estimating post-editing effort. As an extrinsic task, we use quality predictions to rank alternative translations from multiple MT systems according to their estimated quality. Additionally, we experiment with a small dataset annotated for quality labels with different levels of granularity in a attempt to predict multidimensional quality metric (MQM) scores.

@techreport{QTLPD2.1.3,
author = {{Ergun Bi\c{c}ici et al.}},
title = {Quality Estimation for Dissemination},
institution = {Dublin City Univ.},
year = {2013},
note = {EU project QTLaunchPad D2.1.3: www.qt21.eu/launchpad/content/delivered},
keywords = {Machine Translation, Performance Prediction},
abstract = {We present benchmarking experiments for the intrinsic and extrinsic evaluation of an extended version of our open source framework for machine translation quality estimation QUEST, which is described in D2.1.2. We focus on the application of quality predictions for dissemination by estimating post-editing effort. As an extrinsic task, we use quality predictions to rank alternative translations from multiple MT systems according to their estimated quality. Additionally, we experiment with a small dataset annotated for quality labels with different levels of granularity in a attempt to predict multidimensional quality metric (MQM) scores.},

}

Patents, standards

Ergun Biçici. Referential Translation Machines. Invention Disclosure, 2013. Note: Invention Disclosure, DCU Invent Innovation and Enterprise: www.dcu.ie/invent/.

@patent{BiciciIDF_RTM2013,
title={Referential Translation Machines},
author={Ergun Bi\c{c}ici},
year={2013},
number = {Invention Disclosure},
note = "Invention Disclosure, DCU Invent Innovation and Enterprise: www.dcu.ie/invent/",

}

BACK TO INDEX

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.

Last modified: Sun Feb 5 17:37:19 2023
Author: ebicici.

This document was translated from BibT_EX by bibtex2html