Statistical Language Model for Chinese Text Proofreading
-
Graphical Abstract
-
Abstract
Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words wi and wj in linguistic environment(LE). First, the word association degree between wi and wj is defined by using the distance-weighted factor, wj is l words apart from wi in the LE, then Bayes formula is used to calculate the LE related degree of word wi, and lastly, the LE related degree is taken as criterion to predict the reasonability of word wi that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.
-
-