SMU Logo

 
 


       
 

"Combining Probability Model and Web Mining Model:
A Framework for Proper Name Transliteration
"

 
   
 

Speaker:

ZHOU Yilu
PhD Candidate
Department of Management
Information Systems
University of Arizona

 

Date:

06 March 2006 (Monday)  
 

Time:

3:30 pm to 5:30 pm  
 

Venue:

Meeting Room 4.4, Level 4
School of Information Systems


  Abstract  

 

The World Wide Web has become the biggest knowledge repository. There are Web pages in almost every popular language. However, language boundaries prevent information sharing and discovery across countries. Web search engines, such as Google, were adding multilingual capabilities to search for foreign documents. Proper names, such as organizations, company names, product names, person names, play an important role in search queries. However, they are often foreign names which are often translated phonetically, referreds to as transliteration. Previous transliteration models can be categorized into three approaches: a rule-based approach, a machine learning approach and a statistical approach. In this research, we propose a generic framework for proper name translation which combines an enhanced Hidden Markov Model (HMM) statistical model and a Web mining model. We improved the traditional statistical-based transliteration in three areas: 1) incorporate phonetic transliteration knowledge base; 2) incorporate a bigram and a trigram HMM; 3) incorporate a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on two different language pairs, English-Arabic pair and English-Chinese pair. For English/Arabic transliteration, we found that a combination of bigram and trigram HMM method performed the best. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 46%. For English/Chinese transliteration, we found that a combination of the bigram and the trigram HMM method performed the best. The trigram model out-performed bigram in English/Chinese transliteration. The Web mining approach again improved the performance by 12%. Overall, our framework achieved a precision of 86%-93% when 8 best transliterations were considered. Our results are encouraging and show promise of successful transliteration techniques to multilingual Web retrieval.

 
  About the Speaker  
 

Yilu Zhou is a doctoral candidate in the Department of Management Information Systems at the University of Arizona, where she is also a research associate of the Artificial Intelligence Lab. Her research interests include multilingual knowledge discovery, web mining and human computer interaction. She received a B.S. in Computer Science from Shanghai Jiaotong University.

 

We look forward to welcome you at this Research Talk.

© Copyright 2005 by Singapore Management University, School of Information Systems. All Rights Reserved.