Efficient Combination of Ranked Result Sets in Multi-Feature ApplicationsWolf-Tilo Balke
Efficient Combination of Ranked Result Sets in Multi-Feature Applications
1st Examiner: Professor Dr. W. Kießling
2nd Examiner: Professor Dr. U. Güntzer
Applications like multimedia databases or enterprise-wide information management systems have to meet the challenge of efficiently retrieving best matching objects from vast collections of data. For instance in image retrieval queries can be based on the similarity of objects, using several feature attributes like shape, texture, color or text. Such multi-feature queries return a ranked result set instead of exact matches. Besides, the user wants to see only the k top-ranked objects. In the recent years combining algorithms have been proposed to cope with this essentially different retrieval model.
Generally speaking, we distinguish three environments for the combination of ranked results. In homogeneous environments the various features are used on a set of objects that can be identified by a common key. The quasi-homogeneous environment uses features on different collections of data that share some common, standardized attributes. The last and rather rare case are heterogeneous environments, where objects from different collections have to be compared using a complex function.
We present a new combining algorithm called Quick-Combine for combining multi-feature result lists in (quasi-) homogeneous environments, guaranteeing the correct retrieval of the k top-ranked results. For score aggregation virtually any combining function can be used, including weighted queries. Compared to common algorithms we have developed an improved termination condition in tuned combination with a heuristic control flow adopting itself narrowly to the particular score distribution. Top-ranked results can be computed and output incrementally. We show that we can dramatically improve performance, in particular for non-uniform score distributions. Benchmarks on practical data indicate efficiency gains by a factor of 30. For very skewed data observed speed-up factors are even larger. These performance results scale through different database sizes and numbers of result sets to combine.
Also for heterogeneous environments we present an innovative algorithm called Stream-Combine for processing multi-feature queries on heterogeneous data sources. This algorithm can guarantee the correct retrieval of the k top-ranked results without using any random accesses. Stream-Combine implements sophisticated heuristics and therefore is self-adapting to different data distributions and to the specific kind of the combining function. Furthermore we present a new retrieval strategy that will essentially speed up the output of relevant objects.
As benchmarks on practical data promise that our combining algorithms can dramatically improve performance, we also want to discuss interesting applications of the combination of ranked result sets in different areas. The applications for the optimization in ranked query models are manifold. Generally speaking we believe that all kinds of federated searches in database or portal technology can be supported like e.g. content-based retrieval, knowledge management systems or multi-classifier combination.