Efficient Search for Plagiarism on the Web

Malcolm, J. and Lane, P.C.R. (2008) Efficient Search for Plagiarism on the Web. pp. 206-211. ISSN 1997-7697

Copy

Understanding the characteristics of written English allows Internet search for the source of a document to be carried out efficiently. There is a Zipfian distribution of word frequencies in natural language, with some words common and many words rare. If we take a group of three words, the rarity of most of these triples is extreme. This can be exploited to detect web pages similar to a given target document: while a Google search for some triples from the target may return many hits, other triples will only be found in a few documents on the Internet. These documents may well be similar to the target, and are certainly worth examining more closely. Initial experiments show that this approach is very promising, and it is being implemented in a software tool called WebFerret.

Item Type	Article
Uncontrolled Keywords	plagiarism; search engines; ferret; natural language processing
Divisions	?? sbu_scs ?? ?? ri_st ??
Date Deposited	18 Nov 2024 11:40
Last Modified	18 Nov 2024 11:40

[error in script]

picture_as_pdf

picture_as_pdf: kuwait-v09.pdf
: Available under Creative Commons: 4.0

View

Download

Atom

BibTeX

OpenURL ContextObject in Span

OpenURL ContextObject

Dublin Core

MPEG-21 DIDL

EndNote

HTML Citation

METS

MODS

RIOXX2 XML

Reference Manager

Refer

ASCII Citation

Export

Downloads