Interested in a software project topic, research project topic, or a thesis topic? Our topic areas include embedded and distributed systems, Internet of Things, virtualization and cloud computing, middleware and operating systems, performance engineering, software verification, program analysis, programming languages and tools, model-driven development, and more.
Naturally, we will tailor the assignments to your skills and interests, just contact us to get started. Also, do not hesitate to contact us if you have your own idea for a topic that is related to our topic areas.
Or, just take a look at the topic suggestions below, we can start from there:
- Verification of LLVM bit-code via a system of constrained Horn clauses (Jan Kofroň, Martin Blicha)
- MCSat-based SMT solver (Martin Blicha)
- Bit-vector solver for OpenSMT (Martin Blicha, Jan Kofroň)
- Array support in the Korn software verifier (Jan Kofroň, Martin Blicha)
- Robust monitoring service and corresponding data abstraction layer for the ExtremeXP framework (Tomáš Bureš)
- Prototypical example of Edge-cloud continuum (Tomáš Bureš)
- Models and their support for describing the experimentation process of optimising complex data analytics (Tomáš Bureš)
- Data exploration language for IVIS (Tomáš Bureš, Tomáš Petříček)
- Regression Benchmarking (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
- Modularization of Dynamic Analyses (Lubomír Bulej)
- Modular Weaver for DiSL (Lubomír Bulej)
- Modern JVM Benchmark Suite (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
- Integrating data in programming systems (Tomáš Petříček)
- History and philosophy of programming (Tomáš Petříček)
- Design of new interactive programming systems (Tomáš Petříček)
An approach to verification of software is model checking, which creates a logical model of the program and then checks the model for a given safety property. A logical formalism of constrained Horn clauses (CHCs) [1,2] represents a very useful model of programs that enriches the simple logical representation of theories of first-order logic with semantics being able to capture the behaviour of loops and recursion.
The goal of this thesis is to design and implement a translation from (a subset of) LLVM bit-code  to the language of CHCs (using appropriate logical theory) and experiment with different modelling approaches and different CHC back-ends.
- Horn clauses: https://en.wikipedia.org/wiki/Horn_clause
- LLVM project: https://llvm.org/
- Bjørner N., Gurfinkel A., McMillan K., Rybalchenko A. (2015) Horn Clause Solvers for Program Verification. In: Beklemishev L., Blass A., Dershowitz N., Finkbeiner B., Schulte W. (eds) Fields of Logic and Computation II. Lecture Notes in Computer Science, vol 9300. Springer, Cham. https://doi.org/10.1007/978-3-319-23534-9_2
- Bc. thesis, Master thesis or Research project
SMT solvers are the underlying reasoning engines for a wide range of automated reasoning techniques in the verification of software, protocols, distributed systems, etc. Most of the state-of-the-art SMT solvers are based on CDCL(T) framework. MCSat [1,2] represents a generalized approach which has, however, not been thoroughly explored yet. MCSat is particularly attractive for the applications in interpolation-based model checking , a popular technique for automated software verification. The goal of this work is to study the description of MCSat framework and implement an efficient prototype solver based on the architecture described in . The solver should support at least the boolean reasoning and the plugin for linear real arithmetic. It will be evaluated on the benchmarks from SMT-LIB  and compared to the existing state-of-the-art solvers.
- de Moura, L., Jovanović, D. (2013). A Model-Constructing Satisfiability Calculus. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2013. Lecture Notes in Computer Science, vol 7737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35873-9_1
- D. Jovanovic, C. Barrett and L. de Moura, “The design and implementation of the model constructing satisfiability calculus,” 2013 Formal Methods in Computer-Aided Design, 2013, pp. 173-180, doi: 10.1109/FMCAD.2013.7027033.
- Jovanović, D., Dutertre, B. (2021). Interpolation and Model Checking for Nonlinear Arithmetic. In: Silva, A., Leino, K.R.M. (eds) Computer Aided Verification. CAV 2021. Lecture Notes in Computer Science(), vol 12760. Springer, Cham. https://doi.org/10.1007/978-3-030-81688-9_13
Bit-vector solvers typically rely on two things: powerful pre-processing of bit-vector terms and efficient SAT solver to solve the bit-blasted problem, i.e., the transformed input formula. The goal of this thesis is to review the state-of-the-art approaches to bit-vector solving and implement the most promising one in the OpenSMT solver . For an approach using bit-blasting, a different heuristic for tuning the underlying SAT solver should be compared.
- OpenSMT: https://github.com/usi-verification-and-security/opensmt/
- SMT-LIB: http://smtlib.cs.uiowa.edu
- SMT-COMP: https://smt-comp.github.io/2019/
Verification of C programs is an active research field with many tools competing at an annual competition on software verification (SV-COMP) . Korn  is a prototype software verifier for C programs, which translates C programs into a system of constrained Horn clauses. The generated Horn clauses are then passed to a dedicated Horn solver, which does all the hard work. Korn currently supports control-flow analysis, including loops and functions. The goal of this project is to implement proper support for arrays, which is now very limited in Korn.
The goal is to design and implement a robust monitoring service and corresponding data abstraction layer for the ExtremeXP framework. It is necessary to create the data model supporting the adaptive experiment planning that will handle different types of data related to the framework including (i) metadata of simulators, ML, and data analytics processes, (ii) input data and results of simulators, ML, and data analytics processes (performance, resource consumption, time), (iii) status of running experiments, (iv) results of past experiments. Furthermore, the data layer will fuse contextual information and user feedback to enable further analysis of user profiles. The implementation of the data layer will be based on the IVIS platform.
The goal is to implement an application and related platform services, which demonstrate the idea of the Edge-cloud continuum (ECC). The emphasis is on show-casing the key characteristics of ECC (mobility, dynamic redeployment, device heterogeneity, machine learning,…). It is assumed that the implementation will mostly reuse existing services and other assets and will config them, integrate them, and extend them to provide a wholesome demonstration of the ECC concept.
Models and their support for describing the experimentation process of optimising complex data analytics
The goal is to design and implement meta-models and an associated domain specific modelling language (DSML), including a web-based tools (editor, etc.) for describing the experimentation process of optimising complex data analytics. The DSML will be defined using Eclipse EMF and will allow the specification of different variants of a complex analytics process to be mapped to a user-specified intent and context; such variants include different datasets/data sources, the flow of complex analytics tasks (visualization, simulation, ML), deployment options, algorithms, models, hyper-parameters. DSML will be a part of the ExtremeXP framework.
Regression benchmarking refers to the practice of evaluating performance of individual software releases with performance tests, much as it is done with functional tests in functional regression testing. The following topics are related to our ongoing research activities in regression benchmarking:
Evaluating regression detection accuracy
In the past years, we have collected a large database of measurements, partially annotated with information about performance regressions. This task is about coming up with data analysis algorithms, including possible applications of deep learning approaches, to identify performance regressions in the measurements, and evaluating the detection accuracy and efficiency.
Medium size project for people interested in data analysis and real life software performance, suitable as individual research project, bachelor or master thesis. Some knowledge of statistics and general experimental methodology is useful.
Alternatively, the task can be about implementing innovative approaches to detection research, for example in the form of an API that would permit crowdsourcing the data analysis.
Small to medium size project for people interested in programming, suitable as individual research project, bachelor or master thesis. Knowledge of Python and Django, plus API technologies in general, is useful.
Seeking regression root causes
Discovering the root cause (reason) of a particular performance regression is usually a manual task. This task is about coming up with methods and tools for easing this task. Of particular interest is the ability to identify incidental performance changes that do not require developer attention.
Medium to large size project for people who enjoy researching difficult open problems, suitable as individual research project or master thesis. Knowledge of compilers and other programming tools, and computer architectures, is useful.
Assessing performance test coverage
Although the concept of test coverage is well defined for functional tests, an analogous concept for performance tests is still missing. This task is about coming up with ways to characterize performance test coverage.
A medium to large size project for people who enjoy open research topics, suitable as individual research project or master thesis. Knowledge of compilers and general working of computers is useful.
- Master thesis
Dynamic program analysis tools based on code instrumentation serve many important software engineering tasks. However, constructing new analysis tools is difficult, because existing frameworks offer little or no support to the programmer beyond the incidental task of instrumentation. The goal of this project is to develop an extension of the DiSL instrumentation framework that would support composing dynamic analyses in a modular fashion so that tasks related to maintenance of analysis state could be separated and reused.
This topic can also be an extension of the Modular Weaver for DiSL project.
- Bc. thesis or Individual software project
DiSL is a bytecode instrumentation framework utilizing a domain-specific language for instrumentation hosted in Java. At the heart of DiSL is a weaver which inserts user-defined snippets of Java code into the base program. The existing weaver in the DiSL framework is rather monolithic, hindering maintenance and development of new features. The goal of the project is to modularize the DiSL weaver into multiple self-contained transformations that are applied on the Java bytecode in a predefined order.
The development of modern compilers and programming language runtimes relies on empirical measurements to assess application performance. These measurements require benchmark suites that are reasonably compact but also represent practical applications. One such suite is Renaissance.
Renaissance is in constant development and provides many contribution opportunities:
Extending the suite with benchmarks in other languages
Java Virtual Machine can execute a spectrum of languages supported by modern dynamic compilers (C, R, JS, Ruby and so on). Renaissance currently only supports Java and Scala, this task should extend the build system and include more workloads.
Medium size project for people who like coding in diverse environments, knowledge of Java, Scala build system, and build systems in general is useful.
Augmenting the existing workloads with validation
In the context of compiler development, execution correctness is not necessarily guaranteed. To detect incorrect compiler or runtime behavior, some workloads in Renaissance validate their own results. This task should extend validation to all existing Renaissance workloads.
Small to medium size project for people who like coding in diverse environments, knowledge of Java, Scala, and frameworks such as Spark is useful.
Analyzing the workload coverage
It is essential that the benchmark workloads are diverse enough to approximate many applications, but also compact enough to execute in relatively short time. This task should come up with ways to analyze the coverage of the compiler or runtime by the benchmark.
A medium to large size project for people who enjoy open research topics, knowledge of compilers and general working of computers is useful.
- Bc. thesis, Master thesis or Research project
Code written by data scientists to explore data (in Jupyter Notebooks and similar systems) is interesting for a number of reasons. First, it often works with concrete data sources and so you can more closely integrate data into programs. Second, it often interleaves execution and programming (you write a bit of code, run it and then continue editing). The following ideas for projects around programming with data use those two ideas in one way or another.
Analysing data science code - First of all, it would be interesting to know how exactly data analytical code looks like. The idea of this project is to scrape code from GitHub and capture the differences between code written to analyze data (e.g. Python in Jupyter Notebooks) and normal source code (e.g. Python libraries and applications). In what ways are the two different?
Data integration via Type Providers - One interesting way of integrating data into programming languages is using type providers (see F# Data library and TheGamma project). Type providers generate types, based on some schema or logic, that a programmer can then use to access data. Related project may include implementing type providers for interesting data sources (e.g. Semantic Web and Linked Data) and also add support for type providers to other languages (e.g. TypeScript).
AI-based tools for data exploration - Tools like Copilot help programmers write code - but how can they most effectively be integrated with tools for data exploration? One possible idea is to see how to use the underlying machine learning models to semi-automatically construct code for accessing data (possibly using the above type providers) that the user is interested in, based on natural language queries.
Data visualizations to encourage critical thinking - How can we visualize data so that the result makes viewers think more critically about what they see? A nice example of this is the You Draw It visualization by New York Times. How can we built other visualizations like this? And could we also encourage readers to critically think about model behind the data (e.g. for Agent-based economic models)?
- FSharp.Data: Data Access Made Simple - type provider library
- The Gamma: Democratizing data science - online playground
- The Gamma Programmatic Data Exploration for Non-programmers (PDF)
- You Draw It: What Got Better or Worse During Obama’s Presidency - nice data vis!
- Fluid: data-linked visualisations - explorable data visualizations
- Bc. thesis or Master thesis
Understanding the full complexity of programming requires an inter-disciplinary approach that is aware of the history and origins of computing concepts, understands their socio-technical nature and the historical developments of the field. If you are interested in looking at programming from a broader historical and philosophical perspective, there are many potential topics to work on:
Recovering ideas from past programming systems - There are many past programming systems that have been forgotten (Commodore 64 BASIC, Hypercard, Boxer, LISP machines), but have interesting ideas in them that could be adapted and used in programming. A project could document such systems using historical sources and reimplement some of the ideas.
Document how programming concepts evolve - Concepts like types, processes, objects or events have existed in many different programming environments over time (but also in logic or linguistics). The idea is to document the meanings of such concepts in the past, see how the meaning changes and how new technologies, formalisms or languages shape the meaning of our concepts (see also Lakatos (1976)).
The above are just two example ideas - if you find them interesting, please get in touch! A project like this would put more emphasis on rigorous working with historical sources and quality of writing and may be more suitable for students with some relevant background or existing interest in the topic. In other words, it is less technical but equally challenging as more technical work.
- The Lost Ways of Programming: Commodore 64 BASIC - online reconstruction
- What we talk about when we talk about monads (PDF)
- For more ideas, see HaPoP 5 abstracts and Computing and Programming in Context
- Imre Lakatos (1976). Proofs and Refutations: The Logic of Mathematical Discovery
- Bc. thesis, Master thesis or Research project
Programs are created not just by writing code, but by interacting with rich stateful programming systems or developer environments. Programming systems still include code, but they also encompass live interactive previews, structure editors, runtime environment and other developer tools. Thinking about programming systems, rather than just languages, can give us new ideas about making programming easier, trustworthy and accessible.
The ideas below are open-ended. They can be used as a starting point for designing a novel programming system. For a project or thesis, we could focus on one or two of those, build a prototype for a suitable application domain and evaluate the result in an appropriate way, e.g. by formailizing the design or developing substantial case study.
If you are interested, please get in touch to discuss the details!
Simplifying programming by working with concrete values - When we program, we typically work with abstract variables. But what if we were writing code only once we have concrete values for all our variables? That way, the programming environment could help us by looking at the concrete (sample) inputs. Something like this already happens in a number of places. A debugger may let you inspect values and edit code, a Jupyter Notebook lets you load concrete data before writing more code, and the Darklang project uses this idea when writing REST handlers (see Office Sign-in Application demo). The idea is to design a programming system that lets you easily construct programs by starting form concrete values.
Understanding programs by treating them as sequences of interactions - When programming, we invoke refactoring tools, copy code snippets, write and run tests or paste code into a REPL to test it, but the programming environment does not know about this! The idea of this project is to represent programs not as source code but as sequences of interactions with environment. This allows many interesting things. The programming environment can give you better hints (based on the result of running a code snippet or a test), merge conflicts get easier (renaming a variable in many places is just a single edit) and you can display a single program in multiple different views. The idea would be to develop a prototype programming system and explore capabilities of this new representation. See also the Histogram project for a few more ideas.
Bringing together program evaluation and editing - As a variation on the above, it would also be possible to use the “sequence of interactions” idea not just to represent a program, but also its execution. That is, when a program runs, the execution would generate a sequence of edits that enact the evaluation step-by-step. This would make debugging easier (you can manually replay programs) and allow you to define logic by something like “macro recording”. The idea may be applicable, for example, to programming education (you can manually simulate execution to understand it better).
Explainability via meta-programming over execution traces - In LISP, code is data and so you can write code that manipulates code (meta-programming). But what if the execution trace of a program was also data, accessible to the program? This way, you could write programs that analyze (or even modify) their own execution. This can be used for explainable data visualizations (see Fluid: data-linked visualisations), to track provenance (explain what inputs contributed to what outputs) or analyze test failures (what part of execution invalidated the test assumption).
Tree-based Turing machine models of programming systems - Languages like Subtext represent programs as documents - tree structures that include both data and code (and possibly also the execution trace as in the project above). Execution of such programs is done by rewriting the document (tree). The idea of this project is to define minimal formal model of such execution - a “Turing machine” that operates on a tree (can shift to parent/child nodes and rewrite values in nodes) and use the formalism to define the meaning of a small Subtext-like programming language.