Topics for student projects and theses

Interested in a software project topic, research project topic, or a thesis topic? Our topic areas include embedded and distributed systems, Internet of Things, virtualization and cloud computing, middleware and operating systems, performance engineering, software verification, program analysis, programming languages and tools, model-driven development, and more.

Naturally, we will tailor the assignments to your skills and interests, just contact us to get started. Also, do not hesitate to contact us if you have your own idea for a topic that is related to our topic areas.

Or, just take a look at the topic suggestions below, we can start from there:

Regression Benchmarking (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
Modern JVM Benchmark Suite (Petr Tůma, Lubomír Bulej, Vojtěch Horký)
Integrating data in programming systems (Tomáš Petříček)
History and philosophy of programming (Tomáš Petříček)
Design of new interactive programming systems (Tomáš Petříček)
Theory Plugins into MCSat-based SMT solver Yaga (Martin Blicha, Jan Kofroň)

Regression Benchmarking

Contact

Regression benchmarking refers to the practice of evaluating performance of individual software releases with performance tests, much as it is done with functional tests in functional regression testing. The following topics are related to our ongoing research activities in regression benchmarking:

Evaluating regression detection accuracy

In the past years, we have collected a large database of measurements, partially annotated with information about performance regressions. This task is about coming up with data analysis algorithms, including possible applications of deep learning approaches, to identify performance regressions in the measurements, and evaluating the detection accuracy and efficiency.

Medium size project for people interested in data analysis and real life software performance, suitable as individual research project, bachelor or master thesis. Some knowledge of statistics and general experimental methodology is useful.

Alternatively, the task can be about implementing innovative approaches to detection research, for example in the form of an API that would permit crowdsourcing the data analysis.

Small to medium size project for people interested in programming, suitable as individual research project, bachelor or master thesis. Knowledge of Python and Django, plus API technologies in general, is useful.

Seeking regression root causes

Discovering the root cause (reason) of a particular performance regression is usually a manual task. This task is about coming up with methods and tools for easing this task. Of particular interest is the ability to identify incidental performance changes that do not require developer attention.

Medium to large size project for people who enjoy researching difficult open problems, suitable as individual research project or master thesis. Knowledge of compilers and other programming tools, and computer architectures, is useful.

Assessing performance test coverage

Although the concept of test coverage is well defined for functional tests, an analogous concept for performance tests is still missing. This task is about coming up with ways to characterize performance test coverage.

A medium to large size project for people who enjoy open research topics, suitable as individual research project or master thesis. Knowledge of compilers and general working of computers is useful.

Modern JVM Benchmark Suite

Contact

The development of modern compilers and programming language runtimes relies on empirical measurements to assess application performance. These measurements require benchmark suites that are reasonably compact but also represent practical applications. One such suite is Renaissance.

Renaissance is in constant development and provides many contribution opportunities:

Extending the suite with benchmarks in other languages

Java Virtual Machine can execute a spectrum of languages supported by modern dynamic compilers (C, R, JS, Ruby and so on). Renaissance currently only supports Java and Scala, this task should extend the build system and include more workloads.

Medium size project for people who like coding in diverse environments, knowledge of Java, Scala build system, and build systems in general is useful.

Extending the suite with support for natively compiled serverless workloads

With the introduction of native compilation for JVM based languages, serverless workloads can reduce cold start delays. In this task, we want to extend the Renaissance harness to support metrics relevant to serverless workloads (especially cold start time), and introduce workloads that represent typical serverless applications.

Medium size project for people interested in serverless and performance engineering. Knowledge of Java, Scala, and serverless platforms is useful.

Extending the existing workloads with an AI workload

The most important aspect of Renaissance suite is that it provides a diverse set of workloads. In this task, we would like to extend the suite with a new workload from the world of artificial intelligence. The workload must be a reasonably isolated representative of a typical AI task.

Small to medium size project for people interested in both AI and performance engineering. Knowledge of Java, Scala, and AI frameworks is useful.

Augmenting the existing workloads with validation

In the context of compiler development, execution correctness is not necessarily guaranteed. To detect incorrect compiler or runtime behavior, some workloads in Renaissance validate their own results. This task should extend validation to all existing Renaissance workloads.

Small to medium size project for people who like coding in diverse environments, knowledge of Java, Scala, and frameworks such as Spark is useful.

Analyzing the workload coverage

It is essential that the benchmark workloads are diverse enough to approximate many applications, but also compact enough to execute in relatively short time. This task should come up with ways to analyze the coverage of the compiler or runtime by the benchmark.

A medium to large size project for people who enjoy open research topics, knowledge of compilers and general working of computers is useful.

Integrating data in programming systems

Target

Bc. thesis, Master thesis or Research project

Contact

Tomáš Petříček

Code written by data scientists to explore data (in Jupyter Notebooks and similar systems) is interesting for a number of reasons. First, it often works with concrete data sources and so you can more closely integrate data into programs. Second, it often interleaves execution and programming (you write a bit of code, run it and then continue editing). The following ideas for projects around programming with data use those two ideas in one way or another.

Analysing data science code - First of all, it would be interesting to know how exactly data analytical code looks like. The idea of this project is to scrape code from GitHub and capture the differences between code written to analyze data (e.g. Python in Jupyter Notebooks) and normal source code (e.g. Python libraries and applications). In what ways are the two different?
Data integration via Type Providers - One interesting way of integrating data into programming languages is using type providers (see F# Data library and TheGamma project). Type providers generate types, based on some schema or logic, that a programmer can then use to access data. Related project may include implementing type providers for interesting data sources (e.g. Semantic Web and Linked Data) and also add support for type providers to other languages (e.g. TypeScript).
AI-based tools for data exploration - Tools like Copilot help programmers write code - but how can they most effectively be integrated with tools for data exploration? One possible idea is to see how to use the underlying machine learning models to semi-automatically construct code for accessing data (possibly using the above type providers) that the user is interested in, based on natural language queries.
Data visualizations to encourage critical thinking - How can we visualize data so that the result makes viewers think more critically about what they see? A nice example of this is the You Draw It visualization by New York Times. How can we built other visualizations like this? And could we also encourage readers to critically think about model behind the data (e.g. for Agent-based economic models)?

References

FSharp.Data: Data Access Made Simple - type provider library
The Gamma: Democratizing data science - online playground
The Gamma Programmatic Data Exploration for Non-programmers (PDF)
You Draw It: What Got Better or Worse During Obama’s Presidency - nice data vis!
Fluid: data-linked visualisations - explorable data visualizations

History and philosophy of programming

Target

Bc. thesis or Master thesis

Contact

Tomáš Petříček

Understanding the full complexity of programming requires an inter-disciplinary approach that is aware of the history and origins of computing concepts, understands their socio-technical nature and the historical developments of the field. If you are interested in looking at programming from a broader historical and philosophical perspective, there are many potential topics to work on:

Recovering ideas from past programming systems - There are many past programming systems that have been forgotten (Commodore 64 BASIC, Hypercard, Boxer, LISP machines), but have interesting ideas in them that could be adapted and used in programming. A project could document such systems using historical sources and reimplement some of the ideas.
Document how programming concepts evolve - Concepts like types, processes, objects or events have existed in many different programming environments over time (but also in logic or linguistics). The idea is to document the meanings of such concepts in the past, see how the meaning changes and how new technologies, formalisms or languages shape the meaning of our concepts (see also Lakatos (1976)).

The above are just two example ideas - if you find them interesting, please get in touch! A project like this would put more emphasis on rigorous working with historical sources and quality of writing and may be more suitable for students with some relevant background or existing interest in the topic. In other words, it is less technical but equally challenging as more technical work.

References

The Lost Ways of Programming: Commodore 64 BASIC - online reconstruction
What we talk about when we talk about monads (PDF)
For more ideas, see HaPoP 5 abstracts and Computing and Programming in Context
Imre Lakatos (1976). Proofs and Refutations: The Logic of Mathematical Discovery

Design of new interactive programming systems

Target

Bc. thesis, Master thesis or Research project

Contact

Tomáš Petříček

Programs are created not just by writing code, but by interacting with rich stateful programming systems or developer environments. Programming systems still include code, but they also encompass live interactive previews, structure editors, runtime environment and other developer tools. Thinking about programming systems, rather than just languages, can give us new ideas about making programming easier, trustworthy and accessible.

The ideas below are open-ended. They can be used as a starting point for designing a novel programming system. For a project or thesis, we could focus on one or two of those, build a prototype for a suitable application domain and evaluate the result in an appropriate way, e.g. by formailizing the design or developing substantial case study.

If you are interested, please get in touch to discuss the details!

Accessible, understandable programming systems - What would it take to create programs that can explain how a certain result was produced? The idea of this project is to represent programs and program execution in a more transparent way, so that it can be analyzed by tools in the programming environment. If we store trace of how the program runs, we can then analyze this trace to explain why the program resulted in a specific result. This can be used to build explainable data visualizations, trustworthy software systems and more. See Fluid, Linked visualizations and Subtext below.
Novel low-code/no-code programming tools - How can we make programming easier and allow non-programmers to solve programming problems? This can be approached from multiple directions. You could use “programming by demonstration” (where programs are created by interacting with concrete sample data and then generalized), “programming by example” (where programs are synthetized from pairs of input/output values), or design new graphical notations for programming (to express program logic in diagrams that are closer to the problem domain). See Histogram and Darklang below.
Formal models of interactive programming systems - Programming language theory, type systems and semantics are all created around programming languages - they see programs as formal entities written in some textual language with grammar. But many interesting aspects of programming happen in stateful interactive programming system (think modern IDEs with REPLs and debuggers, Notebook systems for data science or older systems like Smalltalk). Formally modelling such interactive systems is an interesting (a bit more) theoretical problem that can be a good fit for a more academically inclined student.

References

Technical dimensions of programming systems
Darklang and more videos about it
Histogram: You have to know the past to understand the present - online demo
Subtext: uncovering the simplicity of programming - project homepage
Fluid: data-linked visualisations - online demo
Linked Visualisations via Galois Dependencies (PDF)

Theory Plugins into MCSat-based SMT solver Yaga

Target

Bc. thesis, Master thesis or Research project

Contact

SMT solvers are the underlying reasoning engines for a wide range of automated reasoning techniques in the verification of software, protocols, distributed systems, etc. Most of the state-of-the-art SMT solvers are based on CDCL(T) framework. MCSat [1,2] represents a generalized approach which has, however, not been thoroughly explored yet. MCSat is particularly attractive for the applications in interpolation-based model checking [3], a popular technique for automated software verification. The goal of this work is to extend the current version of Yaga SMT solver [5] being developed in the Formal Methods Group at D3S with new modules and functionality. This involves support for theories beyond Linear Real Arithmetic, in particular Linear Integer Arithmetic, Non-Linear Arithmetic, and Uninterpreted Functions. Additional desired extensions include support for incrementality and Craig interpolation. The tool is to be evaluated on the benchmarks from SMT-LIB [4] and compared to the existing state-of-the-art solvers.

Depending on the type of the thesis, a particular extension or a subset of the extensions above will be selected.

de Moura, L., Jovanović, D. (2013). A Model-Constructing Satisfiability Calculus. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2013. Lecture Notes in Computer Science, vol 7737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35873-9_1
D. Jovanovic, C. Barrett and L. de Moura, “The design and implementation of the model constructing satisfiability calculus,” 2013 Formal Methods in Computer-Aided Design, 2013, pp. 173-180, doi: 10.1109/FMCAD.2013.7027033.
Jovanović, D., Dutertre, B. (2021). Interpolation and Model Checking for Nonlinear Arithmetic. In: Silva, A., Leino, K.R.M. (eds) Computer Aided Verification. CAV 2021. Lecture Notes in Computer Science(), vol 12760. Springer, Cham. https://doi.org/10.1007/978-3-030-81688-9_13
SMT-LIB
Yaga MCSat-based SMT Solver