Evaluating Instance-based Matching of Web Directories

Authors: 
Massmann, S. ; Rahm, E.
Author: 
Massmann, S
Rahm, E
Year: 
2008
Venue: 
11th International Workshop on the Web and Databases (WebDB 2008)
URL: 
http://dbs.uni-leipzig.de/file/massmann,rahm.FINAL.2.pdf
Citations: 
9
Citations range: 
1 - 9
AttachmentSize
massmann,rahm.FINAL_.2.pdf188.18 KB

Web directories such as Yahoo or Google Directory semantically categorize many websites and are heavily used to find relevant websites in a particular domain of interest. Mappings between different web directories can be useful to integrate the information of different directories and to improve query and search results. The creation of such mappings is a challenging match task due to the large size and heterogeneity of web directories. Our study
evaluates to what degree current match technology can be used to automatically determine directory mappings. We further propose specific instance-based match techniques utilizing the URL, name and description of the categorized websites. We evaluate the instance-based approaches for different similarity measures and study their combination with metadata-based approaches.