My research addresses the challenges inherent in the detection and extraction of relevant information from the unlimited resources available on the Web, most of which is irrelevant to the user's particular need or interest. The report from the most recent workshop on research directions for Digital Libraries, supported by the National Science Foundation, states that "humankinds' ability to generate and collect data exceeds our ability to organize, manage and effectively use it. This trend is unlikely to abate without continued research and development. This challenge becomes even more critical when Web users, who are better equipped with advanced Web technologies, especially Web 2.0 technologies, become not only information consumers, but also information producers.
My research goal is to design and develop methodologies and systems that enable the user to effectively retrieve relevant information regardless of their personal experience or language proficiency. I approach my research from an interdisciplinary perspective. My education and training have been primarily in information technology and computer science; however, the type of problems and the solutions that I have chosen to explore have the goal of supporting human users and including those users as major factors in the solutions. I believe that there can be harmony in the interaction between humans and information systems. Machines bring speed, scale, and consistency; whereas humans bring intention, intelligence and flexibility. The harmony is achieved not through making either users or information systems smarter, but through enhancing the synergy between human users and the systems. My work develops and builds on this synergy for more effective information retrieval. My position as an associate professor in the School of Information Sciences, as well as an affiliate associate professor in the Intelligent System Program, both at the University of Pittsburgh, provide me with a unique and exceptional opportunity to conduct my research, and reach my goal to improve information retrieval for all users.
Working with our CMU partner, the objective of this project is to develop new technologies for Adaptive Filtering (AF).
GALE Project: Adaptive, Robust, and Distilled Information Access
Funded by DARPA 2005-2010
We propose to develop and demonstrate an integrated set of techniques for building and adaptive, robust and
distilled information access module for the Distillation Engine. Our
module will be able to handle electronic newswire, transcriptions from
broadcast news, telephone conversations, and talk shows, and web
documents from newsgroups and weblogs, no matter whether they are
originally in English or in Chinese, Arabic or surprise languages.
DiLearn Project: A
Digital Library for Learning Digital Library
Funded by University of Pittsburgh, Provost's Innovation in
Project Duration: 2005-2006
This project will
provide students with an interactive, integrated and active
learning environment. The goals of the proposed project are: 1) to
review and organize current research and practice activities in the
DL area with the aim of helping students to quickly understand and
master the core concepts, challenges, and approaches of the field;
2) to design and develop a digital library that stores the outcome
of our study in goal 1, and more importantly, to provide easy access
to the stored information; 3) to evaluate rigorously the usefulness
of the developed DL in the context of teaching DL courses; and 4) to
raise awareness of the advantages of teaching courses with the help
of a DL system in different departments and universities. The
proposed DL system can improve the quality of the studentís learning
experience and increase the effectiveness of that experience.
Evidence Combination in Enterprise Searches Project
Funded by School of Information Sciences Dean's Research Award
Project Duration: 2005-2006
The goals of this project are 1) to explore the
problem space in multiple evidence identification and combination for improving search effectiveness.
2) using both CLEF and TREC experiments as the realistic evaluation frameworks to test the ideas and systems developed
for evidence identification and combination.
Smart Intermediary (SIM) Project
Having reference librarians or human intermediaries in the
information retrieval process is one of the big success stories in the
development of libraries and library science. However, with wide use of
Internet and the Web, more and more searches are performed without the help of human
intermediaries. The goal of the SIM project is to transform the knowledge
people developed for reference process into modern automatic retrieval process,
and design a retrieval system that would be a useful assistant when working
with librarians, and a smart automatic intermediary when acting alone.
Interactive Cross-Language Information Access (ICLIA)
Even though the advancement of computer and network
technology makes it possible for people to access information globally, our abilities are still dragged down by the fact that
most of us lack of proficiency
in foreign languages. Research on Cross-Language Information Retrieval (CLIR)
helps developing algorithms for identifying relevance information in foreign
languages, but many more difficult research problems are to be solved before a
useful interactive cross-language information access system can be delivered. The
ICLIA research project aims at attacking those problems, and developing tools
based on natural language processing techniques to facilitate people to access information
regardless of languages, and eventually to develop an efficient, user friendly
cross-language information access system.
High Accuracy Retrieval from Documents (HARD)
project duration: 2003-2005
With the Web as the document collection having vast amount
of information for almost every possible topic, the key for the success of a retrieval
process is not to identify the relevant information, but to make the relevant
information easy accessible to the user. HARD project tries to address this problem by inviting user
in the loop. Our research interests in this project are mainly concentrated on
using the HARD framework to explore various means to interact with users during
the search process, and identify chunks of text (called passages) rather than
the whole documents to satisfy users' information need.