Finding the Levenshtein Distance in Scala
In the spirit of esoteric code snippets like this one, I thought I’d put my two bob in
I have written the Levenshtein Distance algorithm using Scala below. The Levensthein Distance algorithm is a Dynamic Programming Algorithm (DPA). This implementation is a little different to the Python one (which creates the arrays explicitly and fills them using loops):
- The code more closely represents the mathematical definition of the algorithm
- The code is easier to reason about because the destructive updates occur behind the scenes (in the memoisation library)
- The code below requires a third party library (Scalaz) note that Scalaz comes with demo code including the levenshtein distance and other DPAs
- The code has a better complexity than the typical loopy version by using lazy evaluation (notice that ‘c’ is not always evaluated)
- While the code memoises with an array, you could use say, a map and save some space as well
- The code builds the call stack as it traverses the matrix (the loopy one does not)
import scalaz.memo.Memo import scalaz.memo.SizedMemo.arraySizedMemo object Levenshtein { def levenshtein[A](x: Array[A], y: Array[A]): Int = { val im = arraySizedMemo val m = im[Memo[Int, Int]](x.length + 1) // the call matrix def mx(i: Int, j: Int): Int = if(i == 0) j else if(j == 0) i else { def f = (n: Int) => im[Int](y.length + 1) val a = m(f)(i - 1)(mx(i - 1, _))(j) + 1 val b = m(f)(i - 1)(mx(i - 1, _))(j - 1) + (if(x(i - 1) == y(j - 1)) 0 else 1) lazy val c = m(f)(i)(mx(i, _))(j - 1) + 1 if(a < b) a else if(b <= c) b else c } mx(x.length, y.length) } def main(args: Array[String]) = println(levenshtein(args(0).toCharArray, args(1).toCharArray)) }
To run this code:
$ wget http://projects.workingmouse.com/public/scalaz/artifacts/2.4/scalaz.jar # download Scalaz 2.4 $ scalac -classpath scalaz.jar Levenshtein.scala # compile $ scala -classpath .:scalaz.jar Levenshtein algorithm altruistic # find the distance! 6
May 8th, 2008 at 10:32 am
Hi,
I wrote a spellchecker years ago : I used lots of differents algos (including double-metaphone and algo finding ‘probable typos’) to find ‘candidates’, then I was using a Levenshtein edit distance to sort the candidates from ‘most relevant’ to ‘less relevant’. Anyway…
I’ve got questions : first, when working with large strings, using O(mn) memory, Chas Emerick noticed that he could not compare huge strings without getting out-of-memory errors, which lead him to write a version more efficient memory-wise (maybe he’s doing DNA string matching ? No idea…). Anyway here is his algo:
http://www.merriampark.com/ldjava.htm
Does your Scala version here use O(mn) memory ? If so, can it easily be changed to use less memory ?
You write: the code builds the call stack as it traverses the matrix (the loopy one does not). So your version is actually a recursive version right? If so then I guess it’s unsuitable for big strings (even if the O(mn) memory problem, should it be in your version, could be fixed)!?
Then you write :
The code has a better complexity than the typical loopy version by using lazy evaluation (notice that ‘c’ is not always evaluated)
I’m not familiar with Scala and I’m curious about it. Could the following :
lazy val c = m(f)(i)(mx(i, _))(j - 1) + 1
if(a
be rewritten with :
if(a
and… Would that change anything (Scala-wise) ?
In my view, you used what is called the ‘Top-down’ approach to implement the DP algo: memoization and recursion combined. While the more memory-efficient and more stack-efficient approach by Chas Emerick is the ‘Bottom-up’ approach (’Top-down’ and ‘Bottom-up’ are defined on Wikipedia, under the Dynamic programming entry).
I must admit I’m surprised by the definition on Wikipedia : to me the ‘top-down’ approach weren’t DP, but simply memoization + recursion.
Anectodically, I find the bottom-up DP version of this algo easier to read then the recursive+memoization version (I mean, even if you were to write a recursive+memoization version of this DP algo in Java
Lastly, after 0-1 Knapsak and edit distance, do you plan to have fun with all the DP algos ? (Floyd-Warshall being next !?
May 8th, 2008 at 10:34 am
erf… Post got fuxxored.
I meant could the two lines be rewritten to this :
if(a
May 8th, 2008 at 10:36 am
Gross… Preview is fin and posts are fuxxored : third try. That blog/post response is bugged… Preview is fine and it screws the posts once you click on ’submit comment’. What you see is NOT what you get
Would this work :
if(a &lt b) a else if(b &lt= c) b else m(f)(i)(mx(i, _))(j - 1) + 1