A personalized recommender agent for the world wide web--a semantic perspective

Date

2010-12

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

[Honolulu] : [University of Hawaii at Manoa], [December 2010]

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

Web personalization aims to provide useful Internet information and avoid information overload. Most web personalization studies focus on a single website. Recommending pages across websites is challenging due to a variety of concerns, such as dynamic web and diverse interests, conflict of interest, shared communication protocols required, and model reusability. This work explores the potential of augmenting Wikipedia's categories with page keywords as an agent system for semantic user modeling to recommend pages across websites. The recommender agent focuses on modeling individual web users' topical interests, using the content-based usage analysis at the client-side. The system also promotes serendipity ─ novel and interesting information ─ as a major factor in our recommendations by considering the coverage of a user's interests via the Diversity Index using the categorical topology. This dissertation's main research question is "Does our recommender based on Wikipedia's content provide topically relevant recommendations, promoting serendipity, of pages from different websites in a selected domain?" Three sub-questions were investigated sequentially in the computer science domain. The investigation examined the system's core components separately, and tuned up individual components before investigating the overall performance of recommendations. 1. Can our content model correctly identify the topics of a web page? 2. Does our content-based user model semantically capture a user's interests? 3. Does Diversity Index measure the coverage of a user's interests in the computer science domain? This study evaluated the system's performance on recommendations regarding topicality and serendipity with 25 professionals as participants in the computer science domain. Results indicate that our system's performance is slightly better than the pure content-based vector space model (VSM) regarding topicality, and significantly better regarding serendipity. A further investigation reveals that our system is able to identify serendipitous recommendations that VSM may fail to recommend. The system's superior performance in serendipity is possibly due to the augmentation of Wikipedia's categories with keywords as well as the utilization of the categories' topology. This work is significant for four reasons. First, it emphasizes the convergence between content modeling and user modeling by means of augmenting Wikipedia's content and usage mining. Second, using the semantics (vocabulary, categorical association, etc.) of Wikipedia for user modeling considering serendipity is worthwhile as the factor is not addressed extensively in the literature. The model is deliberately constructed as a research platform based on heuristic information extraction on keywords and allows for more heuristics. Third, a user's topical interests are modeled using Wikipedia's categories, which yield a simple model that can be interoperated among different websites. The model, with its simplicity, is at the client side, allowing more user control and reducing privacy concerns. Fourth, a methodology is supplied to researchers for further development of similar recommender agents.

Description

Ph.D. University of Hawaii at Manoa 2010.
Includes bibliographical references.

Keywords

personalization, ontology, Wikipedia

Citation

Extent

Format

Geographic Location

Time Period

Related To

Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Computer Science.

Related To (URI)

Table of Contents

Rights

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.