In the "Data Science" working group, we deal with scalable technologies for storing and processing data to make large amounts of data manageable and enable extracting knowledge from them. Our focus is on approaches for continuous analysis and visualization, especially of data from the Web context.
Core areas of our research include:
- Continuous Web Analytics & Real-User Monitoring: Technical measurement of page load time on the Web; quantification of user satisfaction; relationship between technical Web performance and business-critical metrics in e-commerce; synthetic vs. natural traffic distinction (bot detection); anomaly detection; data cleansing; duplicate detection; data visualization (continuous dashboarding); schema design and evolution
- Web Caching & Dynamic Data: Cache coherence in globally distributed caching infrastructures; provision of consistency guarantees using expiration-based caches; accelerated delivery of dynamic content on the Web; efficient detection and invalidation of stale cache entries
- NoSQL & Cloud Data Management: Decision support for choosing suitable systems based on concrete application requirements or requirement profiles; active mechanisms (such as triggers, change notifications); modeling of 1:n and n:m relationships in systems without support for joins; consolidation of specialized data management systems behind a uniform interface; query languages and their expressive power in comparison with SQL; multitenant cloud services and their provision (especially Backend- and DataBase-as-a-Service)
- Real-Time Databases & Database Benchmarking: Development of the real-time database system InvaliDB with the ability to continuously update query results (cf. Firebase, Meteor, RethinkDB); interfaces for processing result changes (expressiveness vs. ease-of-use); measurement of non-functional properties such as scalability and fault tolerance; quantification of data staleness; validation of benchmarks