Schema Evolution in Wikipedia: toward a Web Information System Benchmark

Curino, Carlo A.; Moon, Hjun J.; Tanca, Letizia; Zaniolo, Carlo
Curino, C
Moon, H
Tanca, L
Zaniolo, C
Citations range: 
10 - 49
curino-schema-evolution.pdf521.88 KB

Evolving the database that is at the core of an Information System represents a difficult maintenance problem
that has only been studied in the framework of traditional information systems. However, the problem is likely
to be even more severe in web information systems, where open-source software is often developed through
the contributions and collaboration of many groups and individuals. Therefore, in this paper, we present an in-
depth analysis of the
evolution history of the Wikipedia database and its schema; Wikipedia is the best-known
example of a large family of web information systems built using the open-source software MediaWiki. Our
study is based on: (i) a set of Schema Modification Operators that provide a simple conceptual representation
for complex schema changes, and (ii) simple software tools to automate the analysis. This framework allowed
us to dissect and analyze the 4.5 years of Wikipedia history, which was short in time, but intense in terms of
growth and evolution. Beyond confirming the initial hunch about the severity of the problem, our analysis
suggests the need for developing better methods and tools to support
graceful schema evolution. Therefore,
we briefly discuss documentation and automation support systems for database evolution, and suggest that the
Wikipedia case study can provide the kernel of a schema evolution benchmark for testing and improving such systems.