Archive for April, 2008

QOTD (I found it amusing anyway)

Thursday, April 24th, 2008

How funny :) Laziness is unpredictable!

I lurk on the Haskell Cafe mailing list and eager evaluation is often the solution to reliability or performance problems: laziness is unpredictable.

Finding the Levenshtein Distance in Scala

Thursday, April 24th, 2008

In the spirit of esoteric code snippets like this one, I thought I’d put my two bob in :)

I have written the Levenshtein Distance algorithm using Scala below. The Levensthein Distance algorithm is a Dynamic Programming Algorithm (DPA). This implementation is a little different to the Python one (which creates the arrays explicitly and fills them using loops):

  • The code more closely represents the mathematical definition of the algorithm
  • The code is easier to reason about because the destructive updates occur behind the scenes (in the memoisation library)
  • The code below requires a third party library (Scalaz) note that Scalaz comes with demo code including the levenshtein distance and other DPAs
  • The code has a better complexity than the typical loopy version by using lazy evaluation (notice that ‘c’ is not always evaluated)
  • While the code memoises with an array, you could use say, a map and save some space as well
  • The code builds the call stack as it traverses the matrix (the loopy one does not)
import scalaz.memo.Memo
import scalaz.memo.SizedMemo.arraySizedMemo
 
object Levenshtein {
  def levenshtein[A](x: Array[A], y: Array[A]): Int = {
    val im = arraySizedMemo
    val m = im[Memo[Int, Int]](x.length + 1)
    // the call matrix
    def mx(i: Int, j: Int): Int = if(i == 0) j else if(j == 0) i else {
      def f = (n: Int) => im[Int](y.length + 1)
      val a = m(f)(i - 1)(mx(i - 1, _))(j) + 1
      val b = m(f)(i - 1)(mx(i - 1, _))(j - 1) + (if(x(i - 1) == y(j - 1)) 0 else 1)
      lazy val c = m(f)(i)(mx(i, _))(j - 1) + 1
      if(a < b) a else if(b <= c) b else c
    }
    mx(x.length, y.length)
  }
 
  def main(args: Array[String]) =
    println(levenshtein(args(0).toCharArray, args(1).toCharArray))
}

To run this code:

$ wget http://projects.workingmouse.com/public/scalaz/artifacts/2.4/scalaz.jar # download Scalaz 2.4
$ scalac -classpath scalaz.jar Levenshtein.scala # compile
$ scala -classpath .:scalaz.jar Levenshtein algorithm altruistic # find the distance!
6

Another Call to Genocide

Saturday, April 12th, 2008


Rabbi Yisrael Rosen, director of the Tsomet Institute says:

All of the Palestinians must be killed; men, women, infants, and even their beasts.

Oh really? I didn’t know that! Thanks for letting me in on it.

Rosen asserted that there is evidence in the Torah to justify this stand.

Uh oh, an hallucinabating Rabbi.

Rosen, an authority able to issue religious opinions for Jews…

You mean, people who do not possess the ability to think independently? I had a few of those commenting on my blog recently!

… wrote that Palestinians are like the nation of Amalekites that attacked the Israelite tribes on their way to Jerusalem after they had fled from Egypt under the leadership of Moses.

Did you learn that at a nativity scene when you were being indoctrinated as a child?

He wrote that the Lord sent down in the Torah a ruling that allowed the Jews to kill the Amalekites,…

Oh the Lord did that? That’s OK then.

The true outrage is that most of those authorised to issue Jewish religious opinions support the view of Rabbi Rosen, as confirmed by Haaretz newspaper.

Well it has to be right then! I bet the aforementioned commenters agree too, so it’s super spectacularly right! Oh wait, the truth value of the proposition was never open for analysis. Sorry Authority.

The danger of these religious opinions lies in the fact that the religious authorities issuing them have wide respect among religious Jewish youth.

You mean, your children subscribe to views of self-appointed authority without question? Hey, we have children that do that too! Does your indoctrination scheme masquerade as an education system like ours?

Wasil Taha, Arab Knesset member from the Tajammu Party led by Azmi Bishara, says that these religious opinions lead to the committal of crimes… Taha holds that the sectors of the Palestinian population most likely to be harmed by these religious opinions are those living in the various cities populated by both Jews and Palestinians, such as Haifa, Jaffa, Lod, Ramleh and Jerusalem.

I feel sick, but not as much as the hallucstibating Rabbi appears to be.

Burning question: Do Atheists have cause to be angry, despite not bearing arms to express this anger (due to a more grounded sense of morality?)?

Automated Unit Testing your Java using ScalaCheck

Sunday, April 6th, 2008

What is automated unit testing?

I’m assuming you’re familiar with traditional mainstream unit testing techniques such as that purported by JUnit, NUnit and so on. Automated unit testing automates the generation of elements to apply a function to, in light of a stated algebraic property of that function. This is in contrast to traditional unit testing, where a proposition is made about the application of a function to specific elements of the domain.

Take, for example, the java.lang.String.endsWith function. Using JUnit, we might write:

  assertTrue("abc".endsWith(""));
  assertTrue("abc".endsWith("bc"));
  assertFalse("abc".endsWith("ab"));
  // ...and so on

Using automated testing, we might instead write something like this English statement:

For any two Strings (x and y), then (x + y).endsWith(y) is true.

Here is that same statement for formally:

∀ x. ∀y. (x + y) endsWith y

Note that we don’t have to declare x or y to be elements of the set java.lang.String, since the type system proves this fact. Note also that we assume correctness of + as a premise of the proposition, while the proposition itself is open (that is, this is the statement we are trying to disprove).

After putting forward the above open statement, the test software would automatically attempt to falsify (note that this is different to “prove”) this proposition by generating arbitrary values. As a user, you may wish to skew the distribution of these values, adjust the amount of attempts at falsification and various other configuration values. Many have reasonable defaults, such as when generating integer values, include 0, 1, -1, max, min, max-1 and min-1 and typically make 100 to 500 attempts at falsification by default (in critical code, for example, you may turn this up).

Credit where it is due

This concept of automated testing did not originate with ScalaCheck. In fact, I have recently noticed the mainstream fraternity have started replicating watered down versions of automated testing (note that it is extremely cumbersome to achieve effectively in mainstream languages). Indeed, advanced programming concepts often take a lot of time to bubble down from research/academia to mainstream in a typically diluted form. This concept originated from a paper titled QuickCheck: Automated Specification-Based Unit Testing. Note that JUnit et. al. is not automated, but (very) manual testing.

On with the story

The original language, Haskell, has language features called type-classes and higher-order functions, as does Scala. Both of these are required language features to use automated testing effectively, though it is possible to achieve in a lesser form in less useful languages. Note that Java does not have either of these language features. Are we hosed? No!

Take a look at this source file, which states our property about endsWith using Scala (see endsWithProperty):

// CheckString.scala
 
import org.scalacheck.Prop._
import org.scalacheck.ConsoleReporter.testStatsEx
import org.scalacheck.Test.check
 
object CheckString {
  // Here is the property
 
  val endsWithProperty = property((x: String, y: String) =>
    ((x + y) endsWith y)
  )
 
  // A List of properties - we only have one at the moment.
  val tests = scala.List(
      ("endsWithProperty", endsWithProperty)
  )
 
  // Run all our properties with the default configuration.
  // Are our properties false?
  def main(args: scala.Array[String]) =
    tests foreach { case (name, p) => testStatsEx(name, check(p)) }
}

You can run this source file by downloading ScalaCheck-1.2.jar and compiling it with the Scala compiler.

$ scalac -classpath ScalaCheck-1.2.jar CheckString.scala

Does our stated property falsify?

$ scala -classpath .:ScalaCheck-1.2.jar CheckString
+ OK, passed 100 tests.

No! After 100 unit tests, the stated property cannot be shown to be false. We might assume its correctness now. Note that this is java.lang.String that we are testing here running on a Java Virtual Machine, not some other esoteric String — yes, the one you use in your Java enterprise applications ;) You can apply this technique to any Java type.

How about this proposition:

For any two Strings (x and y), then x.reverse.startsWith(y.reverse)) is equivalent (in truth) to x.endsWith(y).

Again, startsWith is assumed correct as a premise of the proposition (or theorem). Can the statement be falsified by the test automation?

// CheckString.scala
 
import org.scalacheck.Prop._
import org.scalacheck.ConsoleReporter.testStatsEx
import org.scalacheck.Test.check
 
object CheckString {
  val endsWithProperty = property((x: String, y: String) =>
    ((x + y) endsWith y)
  )
 
  // Can you falsify this?
  val endsWithProperty2 = property((x: String, y: String) =>
    x.endsWith(y) == x.reverse.startsWith(y.reverse))
 
  val tests = scala.List(
      ("endsWithProperty", endsWithProperty),
      ("endsWithProperty2", endsWithProperty2)
  )
 
  def main(args: scala.Array[String]) =
    tests foreach { case (name, p) => testStatsEx(name, check(p)) }
}

Well, can it? Oh come on, please tell me! :)

$ scala -classpath .:ScalaCheck-1.2.jar CheckString
+ OK, passed 100 tests.
+ OK, passed 100 tests.

No. We have now successfully executed 200 unit tests so we may be prepared to declare satisfaction with the correctness of the propositions under analysis. Note again, that we have not proven these properties to be true (we did prove others to be true using the type system). In fact, doing so for the general program is equivalent to solving the halting problem, so good luck if you’re looking for something more rigorous ;)

You may have observed that I called a method on String that does not exist in the Java API — reverse. In Scala, you can add methods to objects in a type-safe manner. The reverse method is added by default in the Scala runtime. In Java, you might instead have written reverse as a static method of a StringUtils class or something. This fact does not negate the fact that we are still using the same ol’ java.lang.String that Java Jim is using next door.

What were the String values?

The test automation supplied 100 String values for each of the two properties in an attempt to falsify; where did they come from? ScalaCheck includes an abstract type called Arbitrary and an implementation for the String type is supplied by ScalaCheck. It is this that is used to generate String values. If we wish to generate values for our own types, we must write our own Arbitrary implementation for it.

I will state some properties now about java.util.LinkedList and its addFirst method. However, note that unlike java.lang.String, ScalaCheck does not include an arbitrary generator for linked lists, so we will have to write one. Luckily, generators are instances of what is known as a functor (also, a monad), so this task will be quite trivial. Let’s start with doing that.

We can generate arrays, since that ability comes with ScalaCheck, so we can use the map function (i.e. the essence of a functor) to take the generator for arrays to a generator for linked lists. Given an array, we can create a linked list using: new LinkedList[A](Arrays.asList(array)). This is precisely what the code below does using what is called a higher-order function inside the Arbitrary functor. You needn’t concern yourself with these advanced programming concepts in these demonstrations, but just acknowledge their existence and that the given code is trivial and it works.

import org.scalacheck.Arbitrary
import org.scalacheck.Arbitrary.arbitrary
import java.util.{LinkedList, Arrays}
 
object CheckLinkedList {
  implicit def arbitraryLinkedList[A](implicit a: Arbitrary[A]): Arbitrary[LinkedList[A]] =
    Arbitrary(arbitrary[Array[A]].map(array => new LinkedList[A](Arrays.asList(array))))
}

Note the implicit declaration on the method. This is required, however, this language feature is very powerful and only a very lengthy discussion would do it justice, so we will say nothing more than it is required in this case, it is very powerful and Java has no equivalent (you can mildly emulate it with the singleton anti-pattern — another topic).

What properties can we state now about addFirst? How about for any list and any element, adding the element to the list adds 1 to its size? We call this property addsLengthOne below.

// CheckLinkedList.scala
 
import org.scalacheck.Arbitrary
import org.scalacheck.Arbitrary.arbitrary
import java.util.{LinkedList, Arrays}
import org.scalacheck.Prop._
import org.scalacheck.ConsoleReporter.testStatsEx
import org.scalacheck.Test.check
 
object CheckLinkedList {
  implicit def arbitraryLinkedList[A](implicit a: Arbitrary[A]): Arbitrary[LinkedList[A]] =
    Arbitrary(arbitrary[Array[A]].map(array => new LinkedList[A](Arrays.asList(array))))
 
  val addsLengthOne = property((list: LinkedList[String], s: String) =>
    list.size + 1 == { list.addFirst(s); list.size }
  )
 
  val tests = scala.List(
      ("addsLengthOne", addsLengthOne)
  )
 
  def main(args: scala.Array[String]) =
    tests foreach { case (name, p) => testStatsEx(name, check(p)) }
}

I’ve copied and pasted the boilerplate to get the tests running for the demonstration, but please feel encouraged to put this kind of code in its own function to avoid this repetition.

How did we go?

$ scala -classpath .:ScalaCheck-1.2.jar CheckLinkedList
+ OK, passed 100 tests.

Yippee!!

We might state two more properties:

  • For any list and any element (s), list.add(s) then list.get(0) will always yield s.
  • For any list, any element (s) and any integer between 0 and list.length - 1 (n), then calling list.get(n) is equivalent to calling list.addFirst(s).get(n + 1).
$ scala -classpath .:ScalaCheck-1.2.jar CheckLinkedList
+ OK, passed 100 tests.
+ OK, passed 100 tests.
+ OK, passed 100 tests.

I have left off the code for the moment and I will give a complete example toward the end. Note that we have 300 passing unit tests — the code is around 25 lines (excluding the boilerplate to get the tests running).

At this point, we have given an unambiguous specification for addFirst. In other words, it is not possible to write a terminating function that does not satisfy these properties that is not equivalent to addFirst. However, this ignores the unfortunate potential for uncontrolled side-effects, which is an inherent property of Java (and Scala for that matter). We will assume it to be the case — no additional side-effects. It then follows that one additional benefit of automated testing is that we also obtain extremely valuable documentation for the function.

Consider being provided with:

  • The type signature for addFirst (the properties which are proved).
  • The three stated properties for addFirst.

Is there anything more useful to add in terms of documentation? Isn’t this unambiguous, highly readable specification more than you could possibly ask for? This is an interesting question to ponder, particularly in a more pure environment than the one we are discussing here, but I thought I would mention it anyway :)

I will state a property that is false, just to exemplify what happens. The idea is that it is extremely difficult (but not impossible — remember halting problem) to state a false property that ScalaCheck passes. I’m not going to look for a specific corner case that traditional unit testing is unlikely to pick up, but I encourage you to do so as an added exercise.

Instead, I will state an obviously false property; for any list and any element, adding the element with addFirst is equivalent to adding that element with addLast.

Here we go (note the fourth property — thisIsFalse):

// CheckLinkedList.scala
 
import org.scalacheck.Arbitrary
import org.scalacheck.Arbitrary.arbitrary
import java.util.{LinkedList, Arrays}
import org.scalacheck.Prop._
import org.scalacheck.ConsoleReporter.testStatsEx
import org.scalacheck.Test.check
 
object CheckLinkedList {
  implicit def arbitraryLinkedList[A](implicit a: Arbitrary[A]): Arbitrary[LinkedList[A]] =
    Arbitrary(arbitrary[Array[A]].map(array => new LinkedList[A](Arrays.asList(array))))
 
  val addsLengthOne = property((list: LinkedList[String], s: String) =>
    list.size + 1 == { list.addFirst(s); list.size }
  )
 
  val getZero = property((list: LinkedList[String], s: String) =>
    { list.addFirst(s); list.get(0) == s }
  )
 
  val getPlusOne = property((list: LinkedList[String], s: String, n: Int) =>
    (n >= 0 && n < list.size) ==>
    (list.get(n) == { list.addFirst(s); list.get(n + 1) })
  )
 
  val thisIsFalse = property((list: LinkedList[String], s: String) =>
    {
      val listt = list.clone.asInstanceOf[LinkedList[String]]
      list.addFirst(s)
      listt.addLast(s)
      list == listt
    }
  )
 
  val tests = scala.List(
      ("addsLengthOne", addsLengthOne),
      ("getZero", getZero),
      ("getPlusOne", getPlusOne),
      ("you're a big fat liar!", thisIsFalse)
  )
 
  def main(args: scala.Array[String]) =
    tests foreach { case (name, p) => testStatsEx(name, check(p)) }
}

And when we run:

$ scala -classpath .:ScalaCheck-1.2.jar CheckLinkedList
+ OK, passed 100 tests.
+ OK, passed 100 tests.
+ OK, passed 100 tests.
! Falsified after 2 passed tests:
> ARG_0 = "[, ²T, ]"
> ARG_1 = ""
java.lang.Error: you're a big fat liar!: Failed(List(Arg(,[, ²T, ],0), Arg(,,0)))

Our usual three properties pass just fine, but notice that ScalaCheck prints the counter-example to the false property. We can determine that this will never be the empty list since the property holds for the empty list (does it? don’t just take my word for it!). ScalaCheck likely tries the empty list one or two times before it declared to have “Falsified after 2 passed tests” using a non-empty list.

Imagine if ScalaCheck had printed a list of length 20 or 30. If our property was particularly complicated, it would be more difficult to reason about our code and the property to determine why what we stated was false. If the property is also false for a list of length 10, we’d probably want to see that counter-example instead to make debugging easier. Indeed, it would be even easier to have the shortest possible list length that still falsifies what we believed to be correct. This concept is known as shrinking and ScalaCheck certainly provides this ability — occasionally with some assistance from the user for user-defined types — a bit like having to declare your own Arbitrary, you also declare your own shrinking strategy and this is also typically trivial.

In our case, the linked list was not shrunk, since we hadn’t provided a shrinking strategy for linked list (and one is not supplied with ScalaCheck), so the first counter-example was used. In non-demonstration environments, it is encouraged to put this additional effort in to your testing practices by supplying a shrinking strategy for user-defined types (if possible and sensible — it is not always so).

Conclusion

Automated unit testing is an incredibly important tool to any programmer — even when using Java. It serves as rigorous documentation at the very worst and more typically, as a robust unit testing platform (300 unit tests in 25 lines in a demo — it’s usually even better than that! here is over 30,000 unit tests).

Coming up with good properties is an extremely (this adverb cannot be exaggerated) disciplined task and I implore any programmer to strive to do it effectively. Note that stating one property may imply others, so stating those others is potentially redundant. In other words, it is the set of properties that make up the unit for analysis — you cannot necessarily declare the legitimacy of each one independently.

This form of testing is a physical manifestation of an integral part of computer programming — the formation of the logical theorem (see Curry-Howard Isomorphism) that makes up the program. We conclude by noting that this more disciplined, robust and formal approach to programming requires considerably less effort than traditional unit testing techniques, but rigorous mental discipline. It can also be observed that this approach de-emphasises the English description (i.e. function name, javadoc, etc.) as a meaningful method of extrapolating function behaviour.

Yes you are in a cult, no I don’t recognise its legitimacy

Saturday, April 5th, 2008


I was trolled yesterday — I think — I still don’t quite know what “troll” means, since it seems to be a universally applicable verb to describe an action that you dislike. Maybe I wasn’t trolled. I was in the #ocaml programming language channel when a user started displaying an ignorance of the Haskell programming language. I made a small effort to point out that the propositions being put forward were incorrect, but it turns out that “of course I would say that, I am a Haskeller” and so the truth value of the propositions were never explored.

No, I am not a Haskeller. I am not an anti-Java, O’Caml hating, Scala using Haskell worshipper. I completely reject this attempt to assign cult status to the various programming languages. As a matter of fact, I completely reject the existence of the concept of X Programmer for any value of X such that X is an element of the set of all computer programming languages. Correct, Java programmers do not exist. Try telling that to your local recruitment agent.

The corporations are having a field day, while amateurs continue to repeat the many different brand/meme names. “Oh hi, I am a C# Programmer, what religion… err programming language fraternity do you belong to?”. My criticisms of Java do not stem from a personal distaste or apostasy; they stem from the fact that it is indeed, far removed from anything useful, practical or resembling soundness in the practice of computer programming.

Can you imagine two motor mechanics discussing their day jobs where one says, “I am an NGK spark plug installer”, to which the other replies, “oh, I am a Champion spark plug installer, but I use NGK on my own vehicle”? It is this degree of absurdity that I assign to any notion of Java Programmer, Haskeller or what-have-you. It is simply and extremely ridiculous. No really, it is. I won’t pretend otherwise despite any scoffing.

Take for example, the distinction between Java and C#, which is almost zero, in the greater scheme of computer programming language theory. What benefit does distinguishing a “C# Programmer” from a “Java Programmer” give? It serves only to resemble status and cult membership. They are both utterly useless as computer programming languages (yes, this is despite their popularity). Members of both of these cults will scorn at this proposition simply because it appears provocative and causes offence — not because there is evidence to the contrary (there is an enormous amount of evidence supporting the affirmative position, but that’s another topic). What if it is true?

I predict that the many members of the current (why was I born in this era?!) anti-intellectualism establishment probably wouldn’t read this rant to this point, so I’m probably preaching to the choir when I point out that the practice of computer programming is founded on various fields of mathematics.

What does this have to do with the latest advertisement on msdn.com or whatever bullshit marketing material you subscribe to? Nothing. Not a bit. So no, I’m not in your cult, I’m not in cults that rival you, I’m not in a cult that considers itself superior or inferior to yours. I’m an independent thinker.

Your inability to see beyond subscription to cleverly-constructed brand naming is complete bullshit, that’s what it is.