Programming tools for data analysis and data visualization are also programming, but the kind of code that data analysts write often differs ordinary code written by programmers. They also have different way of working and often start from concrete data and write code interactively. There is a lot of work to be done on understanding how data analysts write code, analysing code they write and building better tools for this kind of programming.

  • Analysing Real-World Data Science Code. How exactly data analytical code looks like? We can find out by analysing data scraped from GitHub. A concrete project can then tackle different questions such as (1) how does data science code (e.g., Python in Jupyter notebooks) differ from ordinary code (e.g., Python libraries and applications), (2) what language features are used in data analysis code, or even (3) can we automatically detect certain kinds of bugs?

  • Tracking Provenance in Data Analysis and Visualizations. Data analyses and visualizations should make it possible to see how the visual representation links to original data source (e.g., what contributed to the height of a bar chart). The aim of the project is to build small data visualization language/tool, inspired by Fluid [2], that can track provenance (source of data) through simple data transformations.

  • Data Visualizations to Encourage Critical Thinking. How can we visualize data so that the result makes viewers think more critically about what they see? A nice example of this is the You Draw It visualization by New York Times [1]. How can we built other visualizations like this? And could we also encourage readers to critically think about model behind the data (e.g. for Agent-based economic models)?

References