The Scamseek Project - Text Mining for Financial Scams on the Internet

  • Herke, Maria (PhD Student)
  • Patrick, Jon (Primary Chief Investigator)
  • Matthiessen, Christian, (Primary Chief Investigator)

Project: Research

Description

The Scamseek project has a $2.2million budget to build a surveillance tool for identifying financial scams on the Internet. It is funded by the Capital Markets CRC, The Australian Securities and Investment Commission and the participating universities. This is Australia’s largest research project in language technology. The project now has two phases. Phase 1, called ScamAlert, aims to perform document classification of internet pages. There are two principle types of documents of concern. Those that give financial advice by unregistered advisors, and illegal investment schemes. The system has two major features. Firstly, documents of known scams are analysed by linguists to identify the features that make them distinctive. Secondly, machine-learning strategies are used to analyse the documents to derive other features that may be useful in classification and to extract named entities. The results of the linguistic and machine learning investigations are combined to create a unified document classifier. The classifier is fed by a web spider that performs a 24hour/7day week search of the Internet for potential scam sites.

Phase 2 aimed to widen the scope of materials to be investigated and improve the classifiers to perform at higher standards.
Short titleScamSeek
StatusFinished
Effective start/end date2/09/0230/06/04