Main

# Machine Learning / Data Mining (mostly in R)

**On this page...** (hide)

## Useful resources and tutorials

- Nice intro to CRFs
- Numerically stable algorithms for calculating variance
- Black-box Confidence Intervals
- 10 types of regressions. Which one to use?
- http://www.datasciencecentral.com/forum/topics/how-to-choose-an-analytic-tool
- Topic modeling with LDA: MLlib meets GraphX

## Useful resources related to R

### Natural Language Processing

- LSA in R (Latent Semantic Analysis)
- The
`topicmodels`

package containing LDA (Latent Dirichlet Allocation) - A paper describing the
`topicmodels`

package

### Visualization in R

### Optimization and fitting

- Multi-criteria optimization using genetic algorithms
- the
`nsga2()`

function implements NSGA-II algorithm minimizes a multidimensional function to approximate its Pareto front.

- the
- Curve, surface and function fitting with an emphasis on splines, spatial data and spatial statistics

### Time-series analysis

- Handling of regular and irregular time series
- Extends the older
`zoo`

package allowing duplicate indexes

- Extends the older
- Continuous wavelet transform, wavelet coherence, wavelet clustering
- plotting the spectrum of a signal
- feature extraction from time-series
- coherence analysis of two signals

- Discrete wavelet transform
- noise reduction
- signal compression
- multiscale decomposition
- plotting the wavelet filters
- the
`wavethresh`

package seems to be better than the`wavelets`

package available here

### R from command line (CLI)

- Rscript command (e.g. located in
`/usr/bin/Rscript`

). Your first line in a shell script may look like this:

#!/usr/bin/Rscript --default-packages=utils,argparser

- Command-line argument parser
- cross-platform, written purely in R with no external dependencies.
- very simple to use