Government Technology

Web Data Mining: Automatic Distinguishing Personal and Commercial Opinions


Distinguishing Personal and Commercial
Distinguishing Personal and Commercial

May 21, 2009 By

Information engineers in India and Japan have been working on a way for software to distinguish between personal web pages and commercial pages they say are designed to fool consumers. After all, if you are reading reviews of a product, for instance, how do you know that those are actually legitimate?

Writing in a forthcoming issue of the International Journal of Business Intelligence and Data Mining, Takahiro Hayashi of Niigata University, and colleagues, explain that their approach extracts subjective expressions from web pages. The system then scores them by degree of subjectivity and provides the reader with an indication of whether the website content expresses personal opinions or marketing speak about a product or service.

Part of the problem they are trying to solve, they say, is that personal homepages, personal blogs, web forum sites and smaller customer opinion sites are regarded as personal pages and generally don't appear high in the search engine results pages (SERPs). Finding genuine personal opinions surveys is much harder than finding commercially biased sites, they say.

The system they have developed relies on the fact that marketing copywriters and advertisers tend not to report negative comments about a product or service. In contrast, the personal opinions of users of the product or service will be littered with both positive and negative comments depending on their standpoint.

These various types of expressions can be extracted from a web page and fed into the researchers' algorithm, which determines a weighted and categorized ratio of negative to positive expressions. This provides the basic indicator of whether or not a page is commercial or personal automatically.

The team has evaluated the performance of their system using 1200 web pages collected from four categories: product, tourist spot, restaurant, and movie. In all categories, they found that their method is much more effective in finding personal opinion pages than a general search engine like Google, which typically ranks personal web pages lower.

Photo by Jessamyn West. CC Attribution-Noncommercial-Share Alike 2.0 Generic

 


| More

Comments

Add Your Comment

You are solely responsible for the content of your comments. We reserve the right to remove comments that are considered profane, vulgar, obscene, factually inaccurate, off-topic, or considered a personal attack.

In Our Library

White Papers | Exclusives Reports | Webinar Archives | Best Practices and Case Studies
Cybersecurity in an "All-IP World" Are You Prepared?
In a recent survey conducted by Public CIO, over 125 respondents shared how they protect their environments from cyber threats and the challenges they see in an all-IP world. Read how your cybersecurity strategies and attitudes compare with your peers.
Maintain Your IT Budget with Consistent Compliance Practices
Between the demands of meeting federal IT compliance mandates, increasing cybersecurity threats, and ever-shrinking budgets, it’s not uncommon for routine maintenance tasks to slip among state and local government IT departments. If it’s been months, or even only days, since you have maintained your systems, your agency may not be prepared for a compliance audit—and that could have severe financial consequences. Regardless of your mission, consistent systems keep your data secure, your age
Best Practice Guide for Cloud and As-A-Service Procurements
While technology service options for government continue to evolve, procurement processes and policies have remained firmly rooted in practices that are no longer effective. This guide, built upon the collaborative work of state and local government and industry executives, outlines and explains the changes needed for more flexible and agile procurement processes.
View All

Featured Papers