Publications of year 2009 |
Conference articles |
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, FlexGrams, which assume that the n-1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n-1 positions. Our final model achieves 27\% perplexity reduction compared to the standard n-gram model. |
@InProceedings{yuret-bicici:2009:Short, author = {Deniz Yuret and Ergun Bi\c{c}ici}, title = {Modeling Morphologically Rich Languages Using Split Words and Unstructured Dependencies}, booktitle = {ACL-IJCNLP 2009 Conf. Short Papers}, month = {8}, year = {2009}, address = {Suntec, Singapore}, pages = {345--348}, url = {http://www.aclweb.org/anthology/P/P09/P09-2087}, keywords = {Language Modeling}, pdf = {bicici.github.io/publications/2009/MMRLACL.pdf}, ps = {bicici.github.io/publications/2009/MMRLACL.ps}, abstract = {We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, FlexGrams, which assume that the n-1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n-1 positions. Our final model achieves 27\% perplexity reduction compared to the standard n-gram model.}, }
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.
This document was translated from BibTEX by bibtex2html