Increasingly, data is no longer only made available on demand (pull), but is actively sent from a source as a data stream (push). Application areas here are traffic management, logistics, medical monitoring, production control ("Industrie4.0") or energy management. In all of these applications, data must be integrated, transformed, analysed and results must be provided in a timely manner, and detected problems must be reported immediately.
The monitoring and analysis of continuous data streams still too often takes place by means of manually created programmes. Too much development time is spent on standard processing steps such as integrating sources or merging and aggregating data, instead of focusing on the actual problem of "extracting information from data". Instead, DSMS allow these tasks to be performed in main memory using declarative queries/rules. In analogy to the realisation already gained in the 1970s that managing data with the help of a database management system offers a clear advantage over storing it in the file system, a rethink is also taking place in the area of processing data streams. By using semantically more abstract concepts within the framework of a query language, it is possible to create applications much more quickly and maintain them more easily. Domain experts are thus enabled to formulate problems themselves and initiate the solution instead of waiting for developers to realise them. The DSMS then takes care of optimisation and scaling issues.
Many DSMSs that exist in research and on the market are designed as universal tools that often only support certain data models (usually only the relational one) and are insufficiently customisable in their core components, such as the scheduler that regulates the execution of queries. Our experience has shown that many problems in the data stream context, however, require an adaptation of the system. This can be, on the one hand, the data model (e.g. JSON, XML or RDF ), the type of data processing (e.g. priority-based scheduling or out-of-order, priority-controlled processing of important elements), but especially also the question of which operations are available on the data and which source types must be known to the system.
For this reason, we have been developing in IS in the Odysseus project since 2007 an OSGi-based framework of the same name for the creation of data stream management systems, whose plug-in-based architecture allows in a particularly simple way to adapt individual components and even to exchange the underlying data model or to address different data models simultaneously within one request. The main goal of Odysseus was and is the development of a software platform for the evaluation of data stream processing methods.
For the integration of external sources, Odysseus offers a modular and extendable adapter framework in which different levels of processing (transport layer such as TCP, File, MessageBus, protocol layer such as CSV, JSON, ByteBuffer) are separated from each other and become suitable wrappers through combination.Furthermore, there is an operator model designed for extensibility, in which the processing is taken over by the framework and only the actual processing logic (e.g. for a filter) has to be realised.Odysseus has developed over the years into an ideal platform for addressing different problems of data stream processing. It has been used, for example, to create driver assistance systems, surveillance in a domestic context and in the field of maritime security. Current research questions include the linking of recommender functionalities, distributed execution in a peer-2-peer network and the provision of data stream processing mechanisms as services.
Further information, downloads or documentation can be found on the project page: http://odysseus.informatik.uni-oldenburg.de