Holistic schema matching for web query interfaces

Authors: 
Su, W; Wang, J; Lochovsky, F
Author: 
Su, W
Wang, J
Lochovsky, F
Year: 
2006
Venue: 
Proc. EDBT, LNCS 3896
URL: 
http://www.springerlink.com/index/7G5N755552M9N356.pdf
Citations: 
45
Citations range: 
10 - 49

One significant part of today’s Web is Web databases, which can dynamically provide information in response to user queries. To help users submit queries to different Web databases, the query interface matching problem needs to be addressed. To solve this problem, we propose a new complex schema matching approach, Holistic Schema Matching (HSM). By examining the query interfaces of real Web databases, we observe that attribute matchings can be discovered from attribute-occurrence patterns. For example, First Name often appears together with Last Name while it is rarely co-present with Author in the Books domain. Thus, we design a count-based greedy algorithm to identify which attributes are more likely to be matched in the query interfaces. In particular, HSM can identify both simple matching i.e., 1:1 matching, and complex matching, i.e., 1:n or m:n matching, between attributes. Our experiments show that HSM can discover both simple and complex matchings accurately and efficiently on real data sets.