The recovery of a schema mapping: bringing exchanged data back

Authors: 
Arenas, M; Perez, J; Riveros, C
Author: 
Arenas, M
Perez, J
Riveros, C
Year: 
2009
Venue: 
ACM Transactions on Database
URL: 
http://portal.acm.org/citation.cfm?id=1620585.1620589
Citations: 
50
Citations range: 
50 - 99
AttachmentSize
Arenas2009Therecoveryofaschemamappingbringingexchangeddataback.pdf398.75 KB

A schema mapping is a specification that describes how data from a source schema is to be mapped to a target schema. Once the data has been transferred from the source to the target, a natural question is whether one can undo the process and recover the initial data, or at least part of it. In fact, it would be desirable to find a reverse schema mapping from target to source that specifies how to bring the exchanged data back.

In this article, we introduce the notion of a recovery of a schema mapping: it is a reverse mapping, M′ for a mapping M, that recovers sound data with respect to M. We further introduce an order relation on recoveries. This allows us to choose mappings that recover the maximum amount of sound information. We call such mappings maximum recoveries. We study maximum recoveries in detail, providing a necessary and sufficient condition for their existence. In particular, we prove that maximum recoveries exist for the class of mappings specified by FO-to-CQ source-to-target dependencies. This class subsumes the class of source-to-target tuple-generating dependencies used in previous work on data exchange. For the class of mappings specified by FO-to-CQ dependencies, we provide an exponential-time algorithm for computing maximum recoveries, and a simplified version for full dependencies that works in quadratic time. We also characterize the language needed to express maximum recoveries, and we include a detailed comparison with the notion of inverse (and quasi-inverse) mapping previously proposed in the data exchange literature. In particular, we show that maximum recoveries strictly generalize inverses. We finally study the complexity of some decision problems related to the notions of recovery and maximum recovery.