Statistical Language Model for Chinese Text Proofreading

ZHANG Yang-sen; CAO Yuan-da

ZHANG Yang-sen, CAO Yuan-da. Statistical Language Model for Chinese Text ProofreadingJ. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(4): 441-445.

Citation:

ZHANG Yang-sen, CAO Yuan-da. Statistical Language Model for Chinese Text ProofreadingJ. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(4): 441-445.

Citation:

ZHANG Yang-sen, CAO Yuan-da. Statistical Language Model for Chinese Text ProofreadingJ. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2003, 12(4): 441-445.

Statistical Language Model for Chinese Text Proofreading

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Statistical language modeling techniques are investigated so as to construct a language model for Chinese text proofreading. After the defects of n-gram model are analyzed, a novel statistical language model for Chinese text proofreading is proposed. This model takes full account of the information located before and after the target word wi, and the relationship between un-neighboring words w_i and w_j in linguistic environment（LE）. First, the word association degree between w_i and w_j is defined by using the distance-weighted factor, w_j is l words apart from w_i in the LE, then Bayes formula is used to calculate the LE related degree of word w_i, and lastly, the LE related degree is taken as criterion to predict the reasonability of word w_i that appears in context. Comparing the proposed model with the traditional n-gram in a Chinese text automatic error detection system, the experiments results show that the error detection recall rate and precision rate of the system have been improved.

FullText(HTML)

References (6)

Cited By

Statistical Language Model for Chinese Text Proofreading

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content