informatik.uni-leipzig.de

Quickmig: automatic schema matching for data migration projects

Authors: 
Drumm, C; Schmitt, M; Do, HH; Rahm, E
Year: 
2007
Venue: 
Proceedings of the sixteenth ACM conference on Conference on

Caravela: Semantic Content Management with Automatic Information Integration and Categorization

Authors: 
Aumueller, David; Rahm, Erhard
Year: 
2007

Semantic web content management poses much manual work onto
the community. To reduce this labour we have devised Caravela, a generic
approach to dynamic content integration and automatic categorization. Content
and documents of different types can be integrated from diverse semi-structured
sources and categorized along multiple dimensions. Automatic linking provides
dynamic categorizations at no user cost. We illustrate our approach by an online
bibliography categorizing scientific research publications.

A caravela is a small, highly manoeuvrable ship, used for exploration.

Evaluating Instance-based Matching of Web Directories

Authors: 
Massmann, S. ; Rahm, E.
Year: 
2008
Venue: 
11th International Workshop on the Web and Databases (WebDB 2008)

Web directories such as Yahoo or Google Directory semantically categorize many websites and are heavily used to find relevant websites in a particular domain of interest. Mappings between different web directories can be useful to integrate the information of different directories and to improve query and search results. The creation of such mappings is a challenging match task due to the large size and heterogeneity of web directories. Our study

Analyzing the Evolution of Life Science Ontologies and Mappings

Authors: 
Hartung, M.; Kirsten, T.; Rahm, E.
Year: 
2008
Venue: 
Data Integration in the Life Sciences (DILS) 2008

Ontologies are heavily developed and used in life sciences and undergo
continuous changes. However, the evolution of life science ontologies
and references to them (e.g., annotations) is not well understood and has received
little attention so far. We therefore propose a generic framework for analyzing
both the evolution of ontologies and the evolution of ontology-related
mappings, in particular annotations referring to ontologies and similarity
(match) mappings between ontologies. We use our framework for an extensive

Instance-Based Matching of Large Life Science Ontologies

Authors: 
Kirsten, T.; Thor, A.; Rahm, E.
Year: 
2007
Venue: 
Proc. of Data Integration in the Life Sciences (DILS), 2007

Ontologies are heavily used in life sciences so that there is increasing value to match different ontologies in order to determine related conceptual categories. We propose a simple yet powerful methodology for instance-based ontology matching which utilizes the associations between molecular-biological objects and ontologies. The approach can build on many existing ontology as-sociations for instance objects like sequences and proteins and thus makes heavy use of available domain knowledge.

A Versioning and Evolution Framework for RDF Knowledge Bases

Authors: 
Auer, S; Herre, H
Year: 
2006
Venue: 
Proceedings of Ershov Memorial Conference 2006

We present an approach to support the evolution of online,
distributed, reusable, and extendable ontologies based on the RDF data
model. The approach works on the basis of atomic changes, basically additions or deletions of statements to or from an RDF graph. Such atomic
changes are aggregated to compound changes, resulting in a hierarchy
of changes, thus facilitating the human reviewing process on various levels of detail. These derived compound changes may be annotated with

Matching large Schemas: Approaches and evaluation

Authors: 
Do, H.-H.; Rahm, E.
Year: 
2007
Venue: 
Information Systems

Current schema matching approaches still have to improve for large and complex schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces.

Instance Matching with COMA++

Authors: 
Engmann, D.; Massmann, S.
Year: 
2007
Venue: 
BTW 2007 Workshop: Model Management und Metadaten-Verwaltung

Schema matching is the process of identifying semantic
correspondences between schemas. COMA++ is a matching prototype which uses
several characteristics of schemas to determine similarities between them, for
example the names and data types of the schema elements and structural
information. In this paper we propose two instance-based matchers for COMA++
to gain a further quality improvement. The features of the matchers and first results
are described.

An Online Bibliography on Schema Evolution

Authors: 
Rahm, Erhard; Bernstein, Philip A.
Year: 
2006
Venue: 
Sigmod Record

We briefly motivate and present a new online bibliography on schema evolution, an area which has recently gained much interest in both research and practice.

Schema and ontology matching with COMA++

Authors: 
Aumueller, David; Do, Hong Hai; Massmann, Sabine; Rahm, Erhard
Year: 
2005
Venue: 
SIGMOD Conference 2005: 906-908

We demonstrate the schema and ontology matching tool COMA++. It extends our previous prototype COMA utilizing a composite approach to combine different match algorithms [3]. COMA++ implements significant improvements and offers a comprehensive infrastructure to solve large real-world match problems. It comes with a graphical interface enabling a variety of user interactions. Using a generic data representation, COMA++ uniformly supports schemas and ontologies, e.g. the powerful standard languages W3C XML Schema and OWL.

Matching Large XML Schemas

Authors: 
Rahm, E.; Do, H.H.; Massmann, S.
Year: 
2004
Venue: 
Sigmod Record 33(4), 2004

Current schema matching approaches still have to improve for very large and complex schemas. Such schemas are increasingly written in the standard language W3C XML schema, especially in E-business applications. The high expressive power and versatility of this schema language, in particular its type system and support for distributed schemas and name-spaces, introduce new issues. In this paper, we study some of the important problems in matching such large XML schemas.

Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching

Authors: 
Melnik, Sergey; Garcia-Molina, Hector; Rahm, Erhard
Year: 
2002
Venue: 
18th International Conference on Data Engineering (ICDE), 2002

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results.

Developing Metadata-Intensive Applications with Rondo

Authors: 
Melnik, Sergey; Rahm, Erhard; Bernstein, Philip A.
Year: 
2003
Venue: 
Journal of Web Semantics, 2003

The future of the Semantic Web depends on whether or not we succeed to integrate reliably thousands of online applications, services, and databases. These systems are tied together using mediators, mappings, database views, and transformation scripts. Model-management aims at reducing the amount of programming needed for the development of such integrated applications. We present a first complete prototype of a generic model-management system, in which high-level operators are used to manipulate models and mappings between models.

Generic Schema Matching With Cupid

Authors: 
Madhavan, J.; Bernstein, P. A.; Rahm, E.
Year: 
2001
Venue: 
VLDB, 2001

Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches.

Comparison of Schema matching Evaluations

Authors: 
Do, H.; Melnik, S.; Rahm, E.
Year: 
2002
Venue: 
Proc. GI-Workshop Web and Databases, 2002 (LNCS 2593, 2003)

Recently, schema matching has found considerable interest in both research and practice. Determining matching components of database or XML schemas is needed in many applications, e.g. for E-business and data integration. Various schema matching systems have been developed to solve the problem semi-automatically. While there have been some evaluations, the overall effectiveness of currently available automatic schema matching systems is largely unclear.

Supporting Executable Mappings in Model Management

Authors: 
Melnik, S.; Bernstein, P.; Halevy, A.; Rahm, E.
Year: 
2005
Venue: 
SIGMOD, 2005

Model management is an approach to simplify the programming
of metadata-intensive applications. It offers developers powerful
operators, such as Compose, Diff, and Merge, that are applied to
models, such as database schemas or interface specifications, and
to mappings between models. Prior model management solutions
focused on a simple class of mappings that do not have executable
semantics. Yet many metadata applications require that mappings
be executable, expressed in SQL, XSLT, or other data transformation
languages.
In this paper, we develop a semantics for model-management

COMA - A System for Flexible Combination of Schema Matching Approaches

Authors: 
Do, Hong Hai; Rahm, E.
Year: 
2002
Venue: 
VLDB 2002: 610-621

Schema matching is the task of finding semantic correspondences between elements of two schemas. It is needed in many database applications, such as integration of web data sources, data warehouse loading and XML message mapping. To reduce the amount of user effort as much as possible, automatic approaches combining several match techniques are required. While such match approaches have found considerable interest recently, the problem of how to best combine different match algorithms still requires further work.

Rondo: A Programming Platform for Generic Model Management

Authors: 
Melnik, S.; Rahm, E.; Bernstein, P.A.
Year: 
2003
Venue: 
Proc. SIGMOD 2003, pp. 193-204

Model management aims at reducing the amount of programming needed for the development of metadata-intensive applications. We present a first complete prototype of a generic modelmanagement system, in which high-level operators are used to manipulate models and mappings between models. We define the key conceptual structures: models, morphisms, and selectors, and describe their use and implementation. We specify the semantics of the known model-management operators applied to these structures, suggest new ones, and develop new algorithms for implementing the individual operators.

Data Warehouse Scenarios for Model Management

Authors: 
Bernstein, P.A.; Rahm, E.
Year: 
2000
Venue: 
Proc. 19. Intl. Conf. on Conceptual Modeling (ER), LNCS, Springer-Verlag, Oct. 2000

Model management is a framework for supporting meta-data related applications where models and mappings are manipulated as first class objects using operations such as Match, Merge, ApplyFunction, and Compose. To demonstrate the approach, we show how to use model management in two scenarios related to loading data warehouses. The case study illustrates the value of model management as a methodology for approaching meta-data related problems. It also helps clarify the required semantics of key operations.

Syndicate content