Project Details


This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).The traditional separation between the 'structure-only' Database world and the 'text-only' Information Retrieval world is fading. Databases now routinely include text components while documents are being augmented with structural information. The goal of this project is to design novel techniques and develop tools to efficiently query and retrieve relevant information in a heterogeneous data environment where flexibility in conditions on both the content, the structure of the data, and the response to a query is desirable. The first main contribution of the project is the design of quality scoring mechanisms that unify content and structure score in an integrated fashion. The scoring techniques take into account the similarity between the query and the answer to assign scores. The second main contribution of the project is the development of heterogeneous data index structures and query processing algorithms to efficiently identify exact and approximate query answers and provide the answers in the order of relevance to a query. The work resulting from this project will be evaluated through an in-depth study of the impact of the scoring strategies on answer quality and performance experiments on the query processing techniques. The results of this project are expected to enable users to identify the data that best fits their needs, in a variety of heterogeneous data environments, without requiring some preexisting knowledge of the underlying data schema or content. This project integrates research and education through curriculum development, student advising, and outreach to women in Computer Science. Results of this project, including publications, data sets and software will be made available on the project website (
Effective start/end date7/15/096/30/14


  • National Science Foundation (National Science Foundation (NSF))


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.