Prof. Dr. Heike Wehrheim 


Marion Bramkamp

Uhlhornsweg 84 — Room A3 2-208

+49 441 798-2426

Topics for Thesis

We offer various topics for Bachelor and Master thesis, some of which can be found below. You are also welcome to contact one of the group members to ask for other topics.

Identity Transformations for Software Verification Testing


Software verification tools such as CPAchecker are becoming increasingly effective in proving the correctness of C programs or detecting programming errors. Albeit their effectiveness, there are still surprisingly small programs where these sophisticated verifiers fail or time out. The cause of failure in most cases is still unclear today and can possibly be attributed to a complex interaction of programming language features (e.g. nested loops together with pointers and arrays). 
Without a clear intuition, advancing existing verifiers becomes highly difficult.


During the thesis, the first steps towards uncovering the root cause of difficulties for common verification tools should be taken. The starting point are simplistic
C programs with a clear and human-verifiable verification goal. Then, the verification complexity should be iteratively increased by so called identity transformations.
An identity transformation injects code that do not change 
the program semantic but increase the verification difficulty (possibly by introducing new language elements). An example for such a transformation could be: 

int a = 5; -- T --> int a; int b = 5; a = b; 

It is possible to choose this topic for a bachelor or master thesis.

In the context of a bachelor thesis, the following goals should be achieved:
- the development of general framework to apply identity transformation on
   C code (e.g. via String matching)
- the implementation of a set of simple identity transformations
   that support central features of the C language
- Producing and evaluating on a small generated benchmark for the software verification tool CPAchecker

In the context of a master thesis, the goals are extended to:
- the development and implementation of more complex identity transformations
   (e.g. by transforming the program AST)
- a larger set of hand-crafted benchmarks used for transformations
- a large-scale evaluation on a generated benchmark with at least three state-of-the-art verification tools

Contact: ,

Classification of online verification features

The goal of software verification is to prove or disprove whether a program satisfies a given specification. In doing so, many approaches explore the state space of the program or generate predicates to abstract the state space. In the case of a faulty program, however, many existing techniques tend to waste time exploring correct parts of the program.  Adjusting the search strategy or switching to a different verification algorithm could be beneficial in these cases. With the help of machine learning, we aim to design an algorithm that detects these situations during verification.

Collecting relevant data and generating a meaningful dataset is an essential part of data-driven algorithms. Therefore, the goal of this work is to automatically generate a dataset for learning verification behaviors.The goal of the thesis is to develop and implement a technique to analyze the search behavior or the predicates generated by a verification algorithm, identify relevant events, and store the collected information in a suitable format.The developed approach shall then be integrated into the verification framework CPAchecker[1], which also provides the necessary verification algorithms. Ultimately, the novel technique will be used to generate a dataset, while the performance and quality of the generation process will be evaluated. Simple decision rules for predicting the probability of success will also be defined.

It is also possible to work on the topic for a master's thesis. In this case the data generation is only the first part. In a second part, a machine learning algorithm shall be developed to predict the success probability of the current verification. The learning algorithm will then be evaluated in terms of accuracy and runtime.


Testing WEKA ML Algorithms for Balanced Data Usage


In recent times, there has been an increased usage of Machine Learning (ML) models in several application domains. With the availability of data along with the advancement of ML algorithms are now making it possible to use such models in critical decision making software systems. As these models are frequently being used in such systems, quality assurance of such models naturally comes into question. This has prompted many researchers to delve into the topic of validating and verifying ML models with respect to properties like fairness, monotonicity, robustness etc. While most of such works focus on the ML models, surprisingly there are fewer works focusing on the learning algorithm which is used beforehand to `learn' from training data and generate such ML models.


In a previous work [1], we develop a metamorphic testing tool TiLe (in Python) for ML algorithms. 
The main idea of this testing framework is to apply several metamorphic transformations on the training data and find out whether the learned model before and after applying such transformations is the same. One example of such a transformation is randomly permuting the training data instances. 

The main goal of this thesis would be to use TiLe to validate the ML algorithms from WEKA library to find out whether they are sensitive to such transformations on the training data. First of all, the tool would be extended for working with the ML algorithms in Java. Then several other possible metamorphic transformations (for e.g. pre-processing on training data) would be considered. Finally, in contrast to the previous work, where we often use random testing to find out model equivalence, an improved validation technique (already developed) would be incorporated in the tool. 
Please note, this topic can only be considered for Bachelors thesis.

1. Arnab Sharma and Heike Wehrheim. Testing machine learning algorithms for balanced data usage. In 12th IEEE Conference on Software Testing, Validation and Verification, ICST, pages 125–135, 2019.

Contact: Arnab Sharma

(Changed: 2021-12-23)