22/07-11
-
Press releases
New plagiarism detection system to clamp down on thieves and cheats
Experts at the University of Surrey have developed a new computer system to analyse and flag-up highly similar content across sets of documents.
The new system is significantly quicker than known rivals and can process thousands of documents to detect cases of plagiarised content in a matter of minutes.
The experts believe this is now possible although the approach, explored in part in a Surrey PhD thesis by Neil Cooke, is being kept under wraps while a patent application is progressed.
The speed of analysis makes the system suited to the kinds of very large scale plagiarism detection that would be needed, for example, to detect the leakage of Intellectual Property (IP) onto the internet or into other organisations.
IP theft has been characterised as a £9.2 billion problem in the UK alone that is “greatly assisted by an ‘insider’" according to a recent report from Detica, a specialist security firm which is part of BAE Systems, and Cabinet Office.
The detection of leaked Intellectual Property presents an interesting challenge because you’d want to be able to search for things without revealing the set of queries that you want to use.
The experts believe this is now possible although the approach is being kept under wraps while a patent application is progressed.
The software could also be used to detect plagiarism as might occur when students cheat in their assessments, or when authors reuse their own content or, worse, the content of others.
The software was recently tested in an internationally competitive plagiarism detection task and came fourth.
The plagiarism detection task, running for a third year as part of the 5th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN´11), involves identifying the precise extent of passages plagiarized from source documents and inserted into other documents either as they are or with some attempts made to modify the plagiarised text.
This task, referred to as external detection, involves both source and suspicious documents being provided, with last year’s competition involving the search for some 68,558 plagiarism cases across 27,073 documents.
Dr Lee Gillam, Lecturer in the Department of Computing, said: “Our aim in undertaking the competition was to show that the novel approach being employed by Surrey could cope quickly with the volume of competition data, which itself provides a challenge to other competitors, and still attain very competitive detection performance."
In the previous competition, three of the top 4 competitors reported using between 8 and 32 processor cores, with one still taking some 40 hours to process the data. There are some overheads in dealing with the competition data that we can reduce, but the core plagiarism detection analysis takes just 12 minutes using just one processor core on one machine running in the Amazon Cloud and using similar approaches we could bring that time down a way further.
The competing team hopes to attend the PAN’11 workshop in Amsterdam in September, and discuss their approach, at least in as much detail as they can without disclosing how it works, with their competitors.
Notes for Editors:
For results and more information about PAN’11, see: www.uni-weimar.de
PAN’11 also includes tasks for correctly identifying authors, detection plagiarism based on changes in writing style, and identifying vandalised Wikipedia entries.
The overall cost to the UK economy from cyber crime is £27bn per year, according to
The first joint Government and industry report on UK-based cybercrime, from the Office of Cyber Security & Information Assurance in the Cabinet Office and information intelligence experts Detica, puts the overall cost of cyber crime at some £27bn, and the £9.2bn due to Intellectual Property theft represents a substantial share of this: www.cabinetoffice.gov.uk
The team are grateful to Amazon Web Services (AWS) for providing a supporting grant for this research and for competition use of both EC2 and EBS services.
Company
University of Surrey
Guildford
Surrey, GU2 7XH,
England
+44 (0)1483 686141