Knowledge Network Node

A Study on Removing Duplication Using N-gram Terms for Chinese TextChinese Full Text

WANG Xiao-hua,LU Xiao-kang(Institute of Computer Application Technology,Hangzhou Dianzi University,Hangzhou Zhejiang 310018,China)

Abstract: Removing duplication from Chinese text is one of important directions of natural language processing.In this paper,an algorithm based on N-gram terms and feature-mapping of a text is presented.In this algorithm,firstly,a sequence of N-Gram terms is extracted from a text as the features of that text.Secondly,all extracted N-gram terms are mapped into values by a hash function.Finally,it can be judged whether the text is duplicated or not by searching the hash values.By making use of a hash functi... More
  • DOI:

    10.13954/j.cnki.hdu.2010.01.022

  • Series:

    (I) Electronic Technology & Information Science

  • Subject:

    Computer Software and Application of Computer

  • Classification Code:

    TP301.6

Download the mobile appuse the app to scan this coderead the article.

Tips: Please download CAJViewer to view CAJ format full text.

Download: 118 Page: 26-29 Pagecount: 4 Size: 206K

Related Literature
  • Similar Article
  • Reader Recommendation
  • Associated Author