Interested in any kind of statistical methods in NLP, sentiment analysis, document clustering & classification, regression analysis, suffix arrays, etc.
Our suffix array-based tool for the extraction of 'discontinuous repeats' from large text corpora:
https://code.google.com/p/saphre/