I did the following project in 2012. I used Project Gutenberg to obtain a large text corpus and then created a suffix array. Using the suffix array I was able to write an algorithm that queried for each word what word it was most easily replaced with (in the sense that in its surrounded context the replacement generated many counts). The suffix array was useful because the count operation used to gather the statistics was fast.

Updated: