Am Montag, den 27. April 2026, um 10:00 Uhr hält
Daniel Lukats Universität Oldenburg
im Rahmen seiner beabsichtigten Dissertation einen Vortrag mit dem Titel
Concept Drift Detection in Unlabeled Data Streams
Der Vortrag findet online statt: https://studconf.uol.de/rooms/mpz-niv-bz5-40a/join
Der Vortrag erfolgt in englischer Sprache.
Abstract:
In long-running data streams, an issue called concept drift may arise. Concept drift denotes changes in the feature distribution P(X) or the posterior distribution P(Y | X), such that the performance of classifiers employed on the data stream may deteriorate severely. Besides detecting concept drifts for the adaptation of classifiers, concept drifts themselves may be worth detecting, e.g., in marine domains where changes such as ocean fronts or algal blooms can be interpreted as concept drifts which merit further action.
Although various concept drift detectors have been suggested in the literature, most detectors are supervised: They require and assume the availability of the true label immediately after predicting it. This assumption is hard to justify, since labeling a potentially infinite data stream in real time can become prohibitively expensive.
While a number of unsupervised approaches to concept drift detection exist, few have been demonstrated on real-world applications. Instead, these methods are commonly evaluated on artificially created problems without ground truth information about concept drift. Thus, proxy metrics are commonly used in the literature. However, these proxies are flawed, as they often reward frequent false positives more than correct detection.
Based on this interpretation of the state of the art, this work discusses the following research questions: How can unsupervised concept drift detectors detect concept drift on continuous features? How can the predictive quality and the timeliness of concept drift detectors be evaluated in an unbiased fashion?
To this end, existing metrics such as the F1 Score are adapted for use in concept drift detection. Then a system encompassing an unsupervised concept drift detector and optional components to address common real-world properties such as seasonal data, trend and noisy features is evaluated. These research questions are discussed on synthetic data streams and in the context of oceanographic data streams with a case study on the detection of algal blooms in the Baltic Sea.
Betreuer: Prof. Dr.-Ing. Axel Hahn