An improved spelling corrector in Haskell

My previous post came up on the Haskell Reddit yesterday, and I got some great suggestions for improvements to the program, which I've checked into GitHub.

  • Map.fromListWith replaces List.foldl' and Map.insertWith'. fromListWith appears to be specifically designed for aggregating values that share a common key (in .NET terminology it's the GroupBy function).

  • I'll commonly use a construct like readFile "big.txt" >>= return . train . lowerWords, to bind a monad value readFile "big.txt" to a function (train . lowerWords), then produce a new monad value with return. A shorter way to write this is fmap (train . lowerWords) (readFile "big.txt"), or, with an import from Control.Applicative, (train . lowerWords <$> readFile "big.txt").

  • You can replace the lambda syntax (\w -> Map.lookup w nwords) with an operator section, ('Map.lookup' nwords). You might see this more commonly as (+1) in place of (\x -> x + 1); it's the same thing. Edit: the challenging Markdown syntax means that you'll have to imagine the single quotes around Map.lookup replaced with backticks, `.

I should have realised this at the time, but it didn't occur to me that you can use full pattern-matching syntax in the right-hand side of a Haskell list comprehension. If the match fails on some element then that element is filtered out. We can use this to improve on the Python version of the edits1 function: whereas the Python version combines its b[0] subscripts with an if b check, the Haskell version can use pattern matching to do both. We can also use the inits and tails functions to build substrings instead of brute-force applications of take and drop.

# Python
def edits1(word):
   s = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in s if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in s if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in s for c in alphabet if b]
   inserts    = [a + c + b     for a, b in s for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

> -- Haskell
> edits1 word = let s = zip (inits word) (tails word)
>                   deletes    = [ a ++ y     | (a, _:y  ) <- s ]
>                   transposes = [ a ++ y:x:z | (a, x:y:z) <- s ]
>                   replaces   = [ a ++ c:y   | (a, _:y  ) <- s, c <- alphabet ]
>                   inserts    = [ a ++ c:x   | (a, x    ) <- s, c <- alphabet ]
>                in Set.fromList $ concat [ deletes, transposes, replaces, inserts ]

The edits1 function is the densest one in both the Python and Haskell versions. It's the one where the intent of the code was least clear to me when I originally saw it, probably because of the extra clutter in the list comprehensions. In the latest Haskell revision it's more obvious what's going on.

I've consistently used the names x, y and z for the first three parts of the word's suffix. Because they consistently stand for the same thing, any inconsistencies jump out at you when you read the code:

  • a ++ y stands out because x is missing; it's been deleted. (x isn't even mentioned in the pattern match.)
  • a ++ y:x:z stands out because x and y are the wrong way round; we're transposing them
  • a ++ c:y stands out because we've got c instead of x; x has been replaced. (Again, x has dropped out of the pattern match.)
  • a ++ c:x has c in front of x; it's been inserted

After I'd written the first Haskell version I was skeptical. The Python version uses some typical Python idioms, like if b for checking for empty strings, using the same syntax for accessing lists and sets, and overloading the or operator to look for the first non-empty set, and Haskell doesn't have these same shortcuts. However the latest set of revisions makes the the core algorithm in the Haskell version more readable than the Python equivalent.

1 TrackBack

Update: Thanks to the commenters on this blog and on Reddit, I've got a much improved and more readable Haskell port of the spelling corrector. On Wednesday I attended the StackOverflow DevDay in London, which was a day of excellent... Read More

Leave a comment

Close