| Research Areas
Data management and business intelligence
Overview
The digital information of an enterprise can contain structured data (e.g. relational database), semi-structured data (e.g. a web site), and unstructured data (e.g. audio stream and natural text). Moreover, the data may be captured or delivered over mobile, context delivery network, or peer-to-peer platforms. The Data management and Business Intelligence lab investigates techniques for storing and retrieving those diverse data types in a secure and scalable manner. We also work on data mining and knowledge discovery mechanisms that turn the data collections into knowledge that can be exploited for business advantage.
Faculty Members
MOURATIDIS Kyriakos, Assistant Professor
PANG Hwee Hwa, Associate Professor
SHEN Jialie, Assistant Professor
ZHENG Baihua, Assistant Professor
Research Staff and Students
CHEN Jinmiao, Post-Doctoral Fellow
GAO Yunjun, Post-Doctoral Fellow
LIN Yimin, PhD Student
PANG Chen, Research Engineer
ZHANG Jilian, PhD Student
Research Projects
1. Learning Causal Dependencies for Context-Aware Recommenders
2. Location-Based Information Services
3. Data Mining for Business Intelligence
4. Continuous Query Monitoring
5. Advanced Query Processing on Large Multimedia Collections
Learning Causal Dependencies for Context-Aware Recommenders
Description:
Acquiring context parameter readings poses unique challenges to context-aware recommenders. Besides limitations in resources that make minimization of context acquisition a practical need, missing and erroneous context values are also a concern. This project aims to overcome those challenges through automatically learning the causal dependencies between context parameters; those dependencies are then exploited to minimize the content set to be discovered for a particular user and recommendation task, and also to compensate for missing and erroneous inputs.
Selected Publications:
[1] Ghim-Eng Yap, Ah-Hwee Tan, HweeHwa Pang, “Discovering and Exploiting Causal Dependencies for Robust Mobile Context-Aware Recommenders”, IEEE Transactions on Knowledge and Data Engineering, Volume 19, Number 7, July 2007, 977-992.
2] Ghim-Eng Yap, Ah-Hwee Tan, HweeHwa Pang, “Learning Causal Models for Noisy Biological Data Mining: An Application to Ovarian Cancer Detection”, Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI-07), Vancouver, British Columbia, July 2007.
[3] Ghim -Eng Yap, Ah- Hwee Tan, HweeHwa Pang, “Dynamically Optimized Context in Recommender Systems”, Proceedings of the 6th International Conference on Mobile Data Management (MDM'05), Ayia Napa, Cyprus, May 2005, 265-272.
[4] Ghim -Eng Yap, Ah- Hwee Tan, HweeHwa Pang, “Discovering Causal Dependencies in Mobile Context-Aware Recommenders”, Proceedings of the 7th International Conference on Mobile Data Management (MDM'06), Nara, Japan, May 2006.

Location-Based Information Services
Description:
Information is only useful when available at the right time, right place. Location-based information services have to consider where the user request is issued, as well as the limited wireless bandwidth and resource available at mobile devices. The project aims for efficient data dissemination approaches which can deliver the right information to the right users quickly without consuming too much bandwidth or incurring too expensive computation at client side.
Selected Publications:
[1] Baihua Zheng, Dik Lun Lee, “Information Dissemination via Wireless Broadcast”, Communications of the ACM, Vol. 48, No. 5, May 2005, 105 -110.
[2] Baihua Zheng, Wang-chien Lee, Dik Lun Lee, “Spatial Queries in Wireless Broadcast Systems”, ACM Wireless Networks, 10(6), December, 2004, 723-736.
[3] Wang-chien Lee, Baihua Zheng, “DSI: A Fully Distributed Spatial Index for Wireless Data Broadcast”, Proceedings of the 25 th International Conference on Distributed Computing Systems (ICDCS'05), Columbus, OH, USA, June, 2005, 349-358.

Data Mining for Business Intelligence
Description:
Knowing patterns of relationship in a social network is very useful for businesses to exploit relationships to sell products, for individuals who wish to network with others, and for law enforcement agencies to investigate collaborations among criminals. After all, it is not just what you know, but also whom you know, that matters. However, finding out who is related to who on a large scale is a complex problem. Recent advances in technology have allowed more data about activities of individuals to be collected. Such data may be mined to reveal associations between these individuals.
Selected Publications:
[1] Hady W. Lauw, Ee-Peng Lim, HweeHwa Pang, “TUBE (Text-cUBE) for Discovering Documentary Evidence of Associations among Entities”, Proceedings of the ACM Symposium on Applied Computing, Seoul, Korea, March 2007.
[2] Hady W. Lauw, Ee-Peng Lim,6 June, 2008 from Spatio-Temporal Events”, Computational & Mathematical Organization Theory, 11(2), July 2005, 97-118.

Continuous Query Monitoring
Description:
Traditional database systems are designed to answer transient, snapshot queries over persistent data. However, the evolution of wireless communications, positioning devices (e.g. GPS) and sensor technologies has recently given rise to a new data processing model. In this model, multiple long-running queries require continuous evaluation as the data dynamically change. These queries are called continuous , and they arise in location-based services (e.g. "keep me updated about who are the 10 SMU students that are closest to my location as I walk along Orchard road"), network traffic monitoring (e.g. "monitor the 100 users that cause the highest network overhead"), online decision support systems (e.g. "continuously report the 5 most interesting stocks according to my investment criteria"), and many other domains. The aim of this project is to study continuous queries and to design algorithms for their efficient processing.
Selected Publications:
[1] Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias, "Continuous Monitoring of Top- k Queries over Sliding Windows", ACM Conference on Management of Data (SIGMOD), 2006.
[2] Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias, "Conceptual Partitioning: An Efficient Method for Continuous Nearest Neighbor Monitoring", ACM Conference on Management of Data (SIGMOD), 2005.
[3] Kyriakos Mouratidis, Dimitris Papadias, Spiridon Bakiras, Yufei Tao, "A Threshold-based Algorithm for Continuous Monitoring of k Nearest Neighbors", IEEE Transactions on Knowledge and Data Engineering (TKDE), 17(11), 1451-1464, 2005.

Advanced Query Processing on Large Multimedia Collections
Description:
The proliferation of various multimedia data and associated applications over the past decade has led to an increasingly colorful lifestyle. Having been applied for wide range of purposes such as entertainment, education and psychology, unique characteristics of multimedia data pose huge challenge for informative retrieval, knowledge discovery and data management. Distinguished from standard alphanumeric data, multimedia data has much more complex structure which might involve spatial and/or temporal dependency. Also, from representation point of view, multimedia data could be huge and contain rich high level semantic meaning in general. Consequently, to efficiently manage and assess multimedia information under a real life application environment, there is an urgent need for technological advances in data structure for efficient organization, intelligent content representation and system performance evaluation.
Selected Publications:
[1] Bin Cui, Ling Liu, Calton Pu, Jialie Shen and Kian-Lee Tan. "QueST: Querying Music Databases by Acoustic and Textual Features". Proceedings of the 15th ACM International Conference on Multimedia ( ACM MM'07 ), Augsburg , Germany , 2007.
[2] Jialie Shen and John Shepherd. "Efficient Benchmarking of Content-based Image Retrieval via Resampling". Proceedings of the 14th ACM International Conference on Multimedia ( ACM MM'06 ), Santa Barbara , USA , October, 2006.
[3] Jialie Shen, John Shepherd, and Anne H.H. Ngu. "Towards Effective Content Based Music Retrieval with Multiple Acoustic Feature Combination". IEEE Transactions on Multimedia ( IEEE TMM ), vol.8, issue 6, 2006.
[4] Jialie Shen, Bin Cui, John Shepherd, and Kian-Lee Tan. "Towards Efficient Automated Singer Identification in Large Music Databases". Proceedings of the 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR'06 ), Seattle, USA, August, 2006

|