Prediction Framework

Project: Prediction Framework - Predicting software engineering artefacts
Managed by: ViliamSimko
Abstract The second part of my PhD thesis focuses on predicting software engineering artefacts based on statistical classification.

This framework demonstrates principles of domain model elicitation from natural language specification. It uses Stanford CoreNLP linguistic pipeline for analyzing documents and Maximum Entropy classifier from Apache OpenNLP project.

Edit gallery

DEMO of the tool

Prerequisities and Notes

  • This demo works only on Linux. However the framework itself is pure Java-based application composed of multiple OSGi bundles.
    • tested on Linux Mint 14 Nadia 64bit
  • Prediction framework is a console-based application. Do not expect any GUI!
  • Java 1.7+ must be installed (check out java -version)
  • The ZIP contains some statistical models for natural language processing
    • edu.stanford.nlp.models_1.3.4.jar - models from Stanford CoreNLP
    • reprotool.dmodel.trained_1.0.0.jar - our MaxEnt models for domain model elicitation. Also includes a classification model for the sentence splitter annotator.
  • Currently, the framework uses only the bundled models. We plan to fix this in the next release.

Instructions for running the DEMO on Linux

  1. Download the archive Prediction Framework demo (ZIP 190M)
  2. Open a terminal where you downloaded the ZIP file
  3. Extract the ZIP, e.g. unzip
  4. Go into the extracted directory: cd prediction-demo
  5. Here, you can see two directories
    • prediction-example - contains all the demo examples
    • prediction-framework - contains the OSGi (Eclipse) based product
  6. Now, you can choose which demo to run:

preprocessing phase

  1. cd prediction-example/preprocessing
  2. ./ - runs the preprocessing phase. It loads either the annotated version of the HTML document or the same document without annotations. The preprocessing phase produces two XMI files.

elicitation phase

  1. cd prediction-example/elicitation
  2. ./ - loads the preprocessed XMI file containing a document without annotations. Then it tries to predict the domain model from the document by using maxent models bundled in the JAR file prediction-framework/plugins/reprotool.dmodel.trained_1.0.0.jar

Note: At the moment, unfortunately, it is not possible to replace the bundled trained model in an elegant way. They are loaded by the classloader.

training phase

  1. cd prediction-example/training
  2. ./ - loads the preprocessed XMI file containing linguistically processed document, domain model and links between them. Then using the given configuration file, it trains a number of maxent models.

feature selection phase

  1. cd prediction-example/selection
  2. ./ - evaluates the prediction performance of various classification models based on the given configuration file. The results are stored in CSV files.

Theoretical Background

Technical Report

Simko V., Kroha P., Hnetynka P.: Implemented Domain Model Generation, Tech. Report No. 2013/3, D3S, Charles University in Prague, April 2013 PDF

Conference Paper