A Study on Removing Duplication Using N-gram Terms for Chinese TextChinese Full Text
WANG Xiao-hua,LU Xiao-kang(Institute of Computer Application Technology,Hangzhou Dianzi University,Hangzhou Zhejiang 310018,China)
Abstract: Removing duplication from Chinese text is one of important directions of natural language processing.In this paper,an algorithm based on N-gram terms and feature-mapping of a text is presented.In this algorithm,firstly,a sequence of N-Gram terms is extracted from a text as the features of that text.Secondly,all extracted N-gram terms are mapped into values by a hash function.Finally,it can be judged whether the text is duplicated or not by searching the hash values.By making use of a hash functi... More
- DOI:
10.13954/j.cnki.hdu.2010.01.022
- Series:
(I) Electronic Technology & Information Science
- Subject:
Computer Software and Application of Computer
- Classification Code:
TP301.6
- Mobile Reading
Read on your phone instantly
Step 1
Scan QR Codes
"Mobile CNKI-CNKI Express" App
Step 2
Open“CNKI Express”
and click the scan icon in the upper left corner of the homepage.
Step 3
Scan QR Codes
Read this article on your phone.
- Download
- Online Reading
- AI Summary

Download the mobile appuse the app to scan this coderead the article.
Tips: Please download CAJViewer to view CAJ format full text.
Download: 118 Page: 26-29 Pagecount: 4 Size: 206K
Citation Network
Related Literature
- Similar Article
- Reader Recommendation
- Associated Author
- [1]聚类算法研究及其在网点规划中的应用[J]. 刘慧玲. 电脑编程技巧与维护. 2021(07)
- [2]基于聚类算法的订单分批策略研究[J]. 秦馨,赵剑道,任楠. 制造业自动化. 2021(01)
- [3]基于距离的最大聚类数探索算法的探讨[J]. 宋铭利,高新科. 矿山机械. 2006(09)
- [4]一种基于相对密度的快速聚类算法[J]. 孙凌燕,杨明,任建斌. 微电子学与计算机. 2009(12)
- [5]覆盖聚类算法的应用研究[J]. 朱永红. 计算机技术与发展. 2007(01)
- [6]密度峰值聚类算法综述[J]. 陈叶旺,申莲莲,钟才明,王田,陈谊,杜吉祥. 计算机研究与发展. 2020(02)
- [7]多种聚类算法性能的比较分析[J]. 纪汉霖,李兆信. 计算机技术与发展. 2020(08)
- [8]一种基于引力的聚类算法[J]. 张天伍,詹自熬. 河南科学. 2009(01)
- [9]基于聚类和分段优化的蚁群算法[J]. 冀俊忠,黄振,刘椿年. 北京工业大学学报. 2008(04)
- [10]覆盖聚类算法[J]. 赵姝,张燕平,张铃,张媛,陈传明. 安徽大学学报(自然科学版). 2005(02)