# Topics for Theses

## Contact

**Prof. Dr. Claus Möbus **

Room: A02 2-226

Tel: +49 441 / 798-2900

claus.moebus@uol.de

-------------------------------------------

**Secretary**

**Manuela Wüstefeld **

Room: A02 2-228

Tel: +49 441 / 798-4520

manuela.wuestefeld@uol.de

-------------------------------------------

# Topics for Theses

## Topics for Bachelor and Master Theses

Bachelor or Master theses in **probabilistic modelling, machine learning **or** applied artificial intelligence** are led by me and other researchers.

The work begins and ends with a lecture. In the initial presentation you introduce the topic and the milestone plan. This is done in my research seminar "Probabilistic Modeling" (inf 533 and/or inf534). You should register via Stud.IP for this seminar and also qualify for a certificate of successful participation. This can be achieved by the (successful) presentation and the written milestone description.

In the final presentation you summarize the results of the work (if necessary with a demonstrator). This takes place in the corresponding seminar ("upper seminar", etc.) of the co-investigator of the dissertation. Depending on the special field of the thesis (e.g. probabilistic robotics, computational intelligence, machine learning, business intelligence) the following colleagues (Prof. Fränzle, Prof. Kramer, Prof. Sauer) are recommended as co-examiners.

Interested students should consult this schedule ("Leitfaden") and are invited to contact me via email:** **

**Prof. Dr. Claus Möbus **

________________________________________________________________________________________________________________________________

## Algorithm for Generating the Smallest Sigma-Algebra \(\sigma(\varepsilon)\) from a Set System \(\varepsilon\)

"A σ-algebra ... is a set system in measure theory, i.e. a set of sets. A σ-algebra is characterized by its closedness with respect to certain set-theoretic operations. σ-algebras play a central role in modern stochastics and integration theory, since they appear there as definition domains for measures and contain all sets to which one assigns an abstract volume or probability, respectively" (Wikipedia, 2021/03/07)

Every primer (Behrends, 2013, p.11; Hable, 2015, p.9; Halpern, 2017, p.14ff; Hübner, 2009, p.17) on stochastics contains the definition of the measure-theoretic concept of sigma-algebra. The algebra contains the sets which define those events which can be measured by probabilities. After the definition typical textbooks (e.g. Hable, 2015, p.10) present some trivial examples with countable sets (e.g. Omega = {1, 2, 3, 4}) or some very abstract sometimes counterintuitive examples.

In practical applications one is interested only in a special set \(\varepsilon\) of events which should be measured by probabilities. This set is in most cases not a full-fledged sigma-algebra. So to this set of interest \(\varepsilon\) more sets have to be added to "fill the gap" and to embed \(\varepsilon\) into a sigma-algebra. If this has been accomplished we say that "\(\sigma(\varepsilon)\) has been generated by set system \(\varepsilon\)". Sometimes (e.g. Pollard, 2010, p.19f) trivial examples with countable sets (e.g. \(\varepsilon =\{\{a, b, c\},\{c, d, e\}\}, \Omega=\{a, b, c, d, e\}) \) or some very abstract sometimes counterintuitive examples are given.

Despite its theoretical importance no stochastic textbook presents an algorithm for the generation of the smallest \(\sigma(\varepsilon)\).

Such an algorithm should be developed in a BSc or MSc-thesis.

A solution sketch was provided by Prof. E. Behrends. "...this is an interesting question for which I know of no theoretical research. I think that the complexity grows strongly exponentially. More precisely it looks like this.Let \(\Omega\) have \(r\) elements, let \(k\) subsets \(E_1,...,E_k\) be given, and one is interested in the generated sigma algebra \(\Sigma\) of \(E_i\). To do this, one must know the atoms of \(\Sigma\), the minimal nontrivial elements. If there are \(n\) pieces, \(\Sigma\) has \(2^n\) elements.

And how do you find the atoms? The easiest way is by induction on \(k\). For \(k=1\) there are (at most) 2 atoms, and at the transition \(k \rightarrow k+1\) one has only to form the intersections with \(E_k\) and \(\Omega\setminus E_k\) of the atoms to \(k-1\). In short: There will be at most \(2^k\) atoms, i.e. \(\Sigma\) can have 2^(2^k) elements

And this can happen in the worst case. For example, if \(\Omega=\{0,1\}^s\) and one chooses \(E_i\) as "i-th entry equals 1" for \(i=1,..,s\), the atoms are the one-element sets, so there are \(2^s\) atoms. Unfortunately, I cannot contribute further subtleties. For example, how elaborate is it to compute the intersection of two subsets of an r-elementary set? I guess \(2r\) steps, and thus we end up with an effort of 2r2^(2^k). (personal email of E.Behrends, 2021/03/04)

In a BSc-thesis the algorithm for a finite \(\Omega\) should be formulated in pseudocode, its complexity should be specified, and implemented in WebPPL, Julia, or Python.

In a MSc-thesis to the achievements of a BSc-thesis the algorithmic idea should be transferred to the transfinite domain. A solution sketch is provided in Behrends (2013, p.29f).

**References**

- BEHRENDS, E., Elementare Stochastik - Ein Lernbuch, von Studierenden mitentwickelt - , Springer Spektrum, 2013
- HABLE, R., Einführung in die Stochastik - Ein Begleitbuch zur Vorlesung - , Springer Spektrum, 2015
- HALPERN, J.Y., Reasoning About Uncertainty, 2/e, MIT Press 2017
- HÜBNER, G., Stochastik - Eine anwendungsorientierte Einführung für Informatiker, Ingenieure und Mathematiker, 5.Auflage, Vieweg-teubner, 2009
- POLLARD, D., A User's Guide to Measure Theoretic Probability, Cambridge University Press, 2010

Students interested should consult this schedule ("Leitfaden") and are invited to contact me via email:** **

**Prof. Dr. Claus Möbus **

## Paradigmatic Problems for the Probabilistic Programming Languages PYRO and WebPPL

WebPPL and PYRO are relatively new Turing-complete universal probabilistic programming languages (PPLs). The development of both languages is supervised by Noah Goodman of Stanford University. WebPPL is embedded in the functional part of JavaScript (JS). PyRo is based on PyTorch and is embedded in Python. Both languages are open source, but the first is more academic and the second more industrial.

PPLs are used for building generative probabilistic models (GPMs). These models represent *causal background* knowledge which is characteristic for experts. In contrast, deep learning models only represent *shallow* knowledge which is useful for pattern matching. In that respect the latter have become rather successful. This paper describes with many examples the fundamental difference between DL and GPMs (Lake, et al., 2016).

The thesis should survey models programmed in WebPPL and PYRO according to a metric measure (e.g. code length). This set is called the paradigmatic example set WebPPL-problems \(\cup\) PYRO-problems. This set should be partitioned in the joint set WebPPL-problems \(\cap\) PYRO-problems and the two difference sets PYRO-probs \ WebPPL-probs and WebPPL-probs \ PYRO-probs. The last two sets are of special interest. Are there these sets by chance or are there fundamental difficulties in formulating a problem solution in one of the two languages?

Let's take the example of "penalty kicks" from the football world. There are two agents in the penalty shootout. The goalkeeper and the shooter. Each agent has certain preferences for certain shots and certain defensive measures: e.g. left upper corner, etc.. At the same time, each agent has guesses about the opponent's preferences. We know from WebPPL that the language means are sufficient to model such situations. But what about PYRO ?

Such a question should be clarified in the work.

Interested students are invited to contact me via email:

**Prof. Dr. Claus Möbus **

## Bayesian Portfolio Optimisation through Diversification of Risks

At the latest when prices collapse, some securities owners would wish they had done something to spread and minimise risk. According to Martin Weber (Professor at the University of Mannheim), a layman cannot perform better than the market, but he/she can do something for risk management in the portfolio. The classical non-Bayesian procedure was described theoretically by Nobel Prize winner Markowitz in 1952 in the article Portfolio Selection. Markowitz was awarded the Nobel Prize for this in 1990. In his book, Weber gives practical advice in chapter 6 of his book 'Genial einfach investieren; Mehr müssen Sie nicht wissen - das aber unbedingt !'. Somewhat more mathematical - but still easy to read - is the treatment of the topic of portfolio optimisation in Section 4.2 Diversification of Risks in the book by Cottlin & Döhler, *Risikoanalyse*, 2013 2/e.

In a previous BSc thesis the problem was solved up to a portfolio of 3 assets in the form of ETFs (see below news on this website).

There are now two new challenges. (1) In another **BSc-thesis** the optimization problem with more than 3 ETFs shall be solved, the existing webapp shall be further developed and evaluated for usability with financially affine laymen.

(2) An **MSc thesis** will investigate how the classical approach of Markowitz Bayesian can be extended the Bayesian way. For this purpose, the posterior distribution of the securities shares in the portfolio must be calculated. A literature search is expected in the first third of the paper. In the second third, realization possibilities are to be examined. In the last third a small demonstrator is to be built.

**Previous knowledge:** The candidate should have successfully studied WI, have a basic knowledge of descriptive statistics (mean, standard deviation, variance, correlation, regression, etc.), machine learning, understand chapter 6 of Weber's book and the existing BSc-thesis, and be able to create dynamic web pages.

Interested students are invited to contact me via email:

**Prof. Dr. Claus Möbus **

## Kloning a Human Telecontroller with a Probabilistic or Deep Learning Model in the Realm of Sumo-Robotics

2017 an explorative MSc-thesis with a similar topic was finished. In the thesis Sumo-bots in the Lego league were controlled by agents using simple probabilistic models of the "naive" Bayesian type. These results were encouraging so that a more refined methodology should provide more perfect results. Especially the lessons learnt are important for further research.

The new research should improve following aspects:

1. Human telecontrolers shouldn't use solely a bird's eyes view but the view of the controlled bot or a combination of both.

2. Model evaluation should be more systematic.

3. The behavior of the human telecontroller and the sumo bot should be organized in a behavior hierarchy along maneuvers, tactics, and strategies.

4. Bot-actions should be inferred in real-time by alternative Bayesian or Deep Learning models.

5. The sumo-bots should fall into a standard sumo-bot competition category.

6. The sumo-bots should be controlled by standard game controller.

** Prerequisites:** The candidate should have basic knowledge in

*and*

**(educational) robotics****and/or**

*probabilistic***. The**

*deep learning modeling**first*knowledge can be obtain when playing with e.g. Lego bots, the

*second*when attending the seminars "Probabilistic Modelling I & II" (Inf533, Inf534), and the

*third*, when reading the book "Deep Learning".

** Contact:** (MSc), (Prof. Dr.)

## Probabilistic Modeling with Model Fragments, Patterns, or Templates

WebPPL is a web-based probabilistic programming language (PPL) embedded in JavaScript. PPLs are used for implementing probabilistic models in domains with uncertain knowledge (cognition, medicine, traffic, finance, etc). Dependent on the situation-specific problem questions in the form of unknown (conditional) probabilities are formalized. Models and programs generate answers by numerical inference processes.

The interactive tutorial "Probabilistic Models of Cognition" provides a variety of WebPPL-models. There is nearly always a fixed sequence modelling steps: 1) modelling the causal process of interest (root causes, expositions -> syndroms -> symptoms), 2) observation of evidence (data), 3) (diagnostic) inference (most often) contrary to the causal direction (symptoms -> syndroms -> expositions).

** Challenge:** The research question is, whether the modelling process can be improved substantially by a library of model fragments, patterns, or templates.

* Prerequisites: *The topic is suited for a master thesis. Preknowledge can be acquired by successful participation in the seminar "Probabilististic Modeling I & II" (Inf533, Inf534) and studying the above mentioned tutorial.

**Contact:**