# Part III/MPhil Projects Suggestions

I believe in supervising cutting edge research projects I’m also passionate about. While the project is entirely in the responsibility of the student, I believe the student can develop core research skills with close contact, meeting weekly or more often as the project requires. The transition from undergraduate studies to a research project is challenging, so as long as the student is willing to put in the effort, I provide support in terms of guidance, explanations, and reading material.

- See the Degree Project Management Tool page for this year’s proposals.

Please contact me even if only parts of the project sound interesting to you, and perhaps we could find a different project. You’re also encouraged to suggest your own projects!

## Research prospects

The short duration of an MSc dissertation means you will likely not do enough research for a publication in a top-tier venue in my area. Such venues usually require 1-3 years of work, even with experienced researchers. If you are interested in a more extensive Masters-level research degree, consider out MPhil programme.

# Past projects

Below are some of my past project suggestions and how they turned out. Please contact me even if only parts of the project sound interesting to you, and perhaps we could find a different project. You’re also encouraged to suggest your own projects!

## Dependently-typed regex matchers in Idris

Student: Katarzyna Marek

### Note

Katarzyna stayed for another 6 months as a research associate to complete this project into a publication:

- Kammar, Ohad, and Katarzyna Marek. “Idris TyRE: a dependently typed regex parser.” arXiv preprint arXiv:2305.04480 (2023).

and additional follow-up work.

## Description

In *type-driven development* we use advanced type-level
programming which help us write more flexible programs. In this project
you’ll be using type-driven development to implement regular expression
matchers and lexers. By translating different regexes into different
types, you can take advantage of the compiler’s runtime and memory
allocation routines to implement Thompson’s construction and its
analogues. The project can extend depending on your skills and
interests: you can verify the correctness of your construction, or you
can look into type-driven nested-word automata and lexers.

Edwin Brady. Type-driven Development with Idris. Manning Publications. 2017.

Typed parsing and unparsing for untyped regular expression engines (http://dx.doi.org/10.1145/3294032.3294082)

R. Alur and P. Madhusudan. Adding nesting structure to words. In DLT’06, pages 1–13.

Difficulty: Very hard

### Completion criteria

- Basic

- Type-driven implementation of Thompson’s construction 2a. Verify the construction; or 2b. Implement a typed-regex parser on top of this matcher.

- Extension

- Type-driven implementation of a nested-word regex matcher
- Performance evaluation and identification of bottlenecks

## Modular probabilistic programming with algebraic effects

Student: Oliver Goldstein

Follow-up poster: * Eff-Bayes: ProbProg with built-in effect handlers. Oliver Goldstein, Žiga Lukšič, Matija Pretnar, Daan Leijen, Ohad Kammar, and Adam Ścibior. Poster accepted at the Probabilistic Programming Conference 2020.

### Abstract

Probabilistic programming languages, which exist in abundance, are languages that allow users to calculate probability distributions implicit in probabilistic programs, by using inference algorithms. Inference algorithms calculate probability distributions implicit in the programs. However, the underlying inference algorithms are not implemented in a modular fashion, though, the algorithms are presented as a composition of other inference components. This discordance between the theory and the practice of Bayesian machine learning, means that reasoning about the correctness of probabilistic programs is more difficult, and composing inference algorithms together in code may not necessarily produce correct compound inference algorithms. In this dissertation, I create a modular probabilistic programming library, already a nice property as its not a standalone language, called Koka Bayes, that is based off of both the modular design of Monad Bayes – a probabilistic programming language developed in Haskell – and its semantic validation. The library is embedded in a recently created programming language, Koka, that supports algebraic effect handlers and expressive effect types – novel programming abstractions that support modular programming. Effects are generalizations of computational side-effects, and it turns out that fundamental operations in probabilistic programming such as probabilistic choice and conditioning are instances of effects.

## On the expressive power of monadic reflection and effect handlers

Student: Yannick Forster

Yannick’s dissertation page.

Resulted, after additional research, in these publications:

Yannick Forster, Ohad Kammar, Sam Lindley and Matija Pretnar, 2019. On the expressive power of user-defined effects: Effect handlers, monadic reflection, delimited control, Journal of Functional Programming, volume 29, p. e15.

Yannick Forster, Ohad Kammar, Sam Lindley and Matija Pretnar, 2017. On the expressive power of user-defined effects: effect handlers, monadic reflection, delimited control, Proceedings of the ACM on Programming Languages, volume 1, issue ICFP, pp. 13:1–13:29.

## A meta-theory for effectful optimisations

Student: Jan Polášek

Optimising compilers may employ type-and-effect systems to justify code transformations involving computational effects, such as memory accesses, I/O interaction, and probabilistic computation. For example, it is incorrect to remove code duplication in a block of code that reads and writes to memory, but if the duplicated code only writes to memory, we may only run it once:

```
let x = M in let x = M in
let y = M in is equivalent to let y = x in
N N
```

There is embarassingly little work on formally justifying these compiler optimisations, as establishing the contextual equivalence of two program phrases using operational semantics is difficult. Denotational semantics are particularly compelling for this task: to validate an optimisation, show both pieces of code have the same denotation.

The starting point in this project is the observation that these optimisations arise as appropriate algebraic laws underlying the language’s denotational semantics. For example, the validity of the optimisation above is equivalent to a generalised idempotency law, more specifically, generalising the unary idempotency law

`f(f(x)) = f(x)`

to n-ary terms:

```
f(f(x11, x12, ..., x1n), f(x11,
f(x21, x22, ..., x2n), x22,
..., = ...,
f(xn1, xn2, ..., xnn)) xnn)
```

This project aims to establish the meta-theory of effect-dependent source-to-source global compiler optimisations. Such a meta-theory will enable:

a characterisation of which optimisations have such algebraic descriptions;

an elegeant unification of many existing and tedious proofs for the semantic characterisation of these optimisations;

a formulation of practically useful meta-properties of these optimisations; and

modular computational models for establishing sound and complete decision procedures within optimising compilers.

This challenging project requires the student to obtain knowledge of categorical semantics of programming languages, the algebraic approach to semantics, and dependently-typed programming.

**Pre-requisite knowledge:** denotational
semantics, and category theory and
logic.

**Recommended knowledge:** types (for the
dependently-typed programming), optimising
compilers (for the high-level picture of the application area).