Thoughts about software craftsmanship

Wednesday, December 26, 2012

The Skyline problem

I recently learned that a common interview question for software engineering positions is the Skyline problem, and people struggle a lot to solve it. It also took me a few tries before I had the final solution, but once I got the idea the problem became a lot easier than I thought it would be. Below I will present and explain the different approaches one can take to solve the problem. I am also going to give running time analysis for each of them. Before we go into details, let's start with the problem statement first.

Problem statement

We have to design a program which helps drawing the skyline of a two-dimensional city given the locations of the rectangular buildings in the city. All of the buildings are built on a flat ground (that is they share a common bottom) and each building B_i is represented by a triplet of (l_i, r_i, h_i) where l_i and r_i are the left and right coordinates, respectively, of the ith building, and h_i is the height of the building. In the diagram below there are 8 buildings, represented from left to right by the triplets (1, 5, 11), (2, 7, 6), (3, 9, 13), (12, 16, 7), (14, 25, 3), (19, 22, 18), (23, 29, 13) and (24, 28, 4).

Input

The input of the program is a sequence of building triplets. In some versions of the problem further restrictions are applied on the input, like all coordinates are non-negative integers less than 10,000, there is at least 1 and at most 5,000 buildings in the input sequence, and the triplets are sorted by l_i (the left x-coordinate of the building) in ascending order. Sometimes the input source is a file and the format of this file is also described, but we're going to ignore it because pre-processing the input sequence of buildings is completely irrelevant to the problem.

Output

The output is a collection of points which describe the path of the skyline. In some versions of the problem this collection of points is represented by a sequence of numbers p₁, p₂, ..., p_n, such that the point p_i represents a horizontal line drawn at height p_i if i is even, and it represents a vertical line drawn at position p_i if i is odd. In our case the collection of points will be a sequence of P₁, P₂, ..., P_n pairs of (x_i, h_i) where P_i(x_i, h_i) represents the h_i height of the skyline at position x_i. Note that technically it is the exact same representation as the previous one, except having pairs of numbers representing the height at a given position is more meaningful than a parity-based interpretation of numbers. In the diagram above the skyline is drawn with a yellow line around the gray buildings and it is represented by the sequence of position-height pairs (1, 11), (3, 13), (9, 0), (12, 7), (16, 3), (19, 18), (22, 3), (23, 13) and (29, 0).

The brute-force solution

Let's simplify the problem by not ignoring all the restrictions mentioned in the Input paragraph. The most important piece of information is that we know that the left and right coordinates of each and every building are non-negative integers less than 10,000. Now why is this important? Because we can assign a height-value to every distinct x_i coordinate where i is between 0 and 9,999. How do we do that? It's simple.

Allocate an array for 10,000 elements and initialize all of the elements to 0. Let's call this array H.
Iterate over all of the buildings and for every B_i building iterate on the range of [l_i..r_i) where l_i is the left, r_i is the right coordinate of the building B_i.
For every x_j element of this range check if h_i > H[x_j], that is if building B_i is taller than the current height-value at position x_j. If so, replace H[x_j] with h_i.

Once we checked all the buildings, the H array stores the heights of the tallest buildings at every position. There is one more thing to do: convert the H array to the expected output format, that is to a sequence of position-height pairs. It's also easy: just map each and every i index to an (i, H[i]) pair. That's it, we just solved the Skyline problem!

Notice that the result is not an optimal skyline (a skyline with minimum number of position-height pairs) because it will likely contain redundant information. What points in the skyline are considered redundant? Those points where the height didn't change compared to the preceding point's height. More formally, given the sequence of P₁, P₂, ..., P_i-1, P_i, ..., P_n pairs of (x, h), the point P_i can be left out from the skyline if h_i = h_i-1. To get an optimal skyline we need to map only those i indexes where the height changes. Here's a working implementation in Scala which - instead of allocating an array for 10,000 elements - creates an array that is exactly as big as needed to store all possible building coordinates.

The implementation

First we're going to define some data structures to represent a building, a point in the skyline and the skyline:

case class Building(left: Int, right: Int, height: Int) {

  require(left <= right, "left must not be greater than right")
  require(height >= 0, "height must be non-negative")
}

case class Point(x: Int, height: Int)

class Skyline private (val points: List[Point])

Now we can implement our first solution:

object Skyline {

  def apply(buildings: Building*): Skyline = if (buildings.length == 0) {
    new Skyline(List.empty)
  } else {
    val (min, max) = buildings.foldLeft((Integer.MAX_VALUE, Integer.MIN_VALUE))((minMax: (Int, Int), b: Building) => (math.min(minMax._1, b.left), math.max(minMax._2, b.right)))
    val heights = Array.ofDim[Int](max - min + 1)
    buildings.foreach(b => Range(b.left, b.right).foreach(i => if (heights(i - min) < b.height) heights(i - min) = b.height))
    val points = ListBuffer.empty[Point]
    heights.foldLeft((0, 0)) { (res: (Int, Int), height: Int) => val (idx, prevHeight) = res
      if (height != prevHeight) points += Point(idx + min, height)
      (idx + 1, height)
    }
    new Skyline(points.toList)
  }
}

The brute force algorithm in action

Computed skyline

Time and space complexity

Now that we've solved the Skyline problem, let's have a look at the resources necessary to execute this algorithm. Even though the problem statement may define the maximum number of buildings and the biggest coordinate, we assume that the above algorithm is designed to work with inputs of arbitrary number of buildings and arbitrary (positive) integer coordinates. From now on n indicates the number of buildings in the input sequence and m indicates the maximum coordinate.

The best case is clearly when n = 0, that is the input sequence is empty (lines 3 and 4). The running time is O(1) (we have to check if the input sequence is empty) and the used space is also O(1) (we create an empty skyline).

But what's the running time when there are actually some buildings in the input sequence? In lines 6 and 7 we first find the minimum and maximum coordinate values then allocate an array of size max - min + 1. To do so, we have to search the whole input sequence once, which means linear running time, that is O(n) for n buildings^*, and the allocated array uses O(m) space in the worst case (when min = 0 and max = m - 1). Next (in line 8) we iterate over the input buildings and save the height value for the coordinates the actual building spans over. In the worst case we have n equal-size buildings, each having l = 0 left and r = m - 1 right coordinates, that is every building spans over the whole [0..m) interval. Thus the running time of setting the height of every position is O(n * m). Finally - in lines 9-13 - we go once more over the elements of this array to find those positions where the height changes and include them in the final result (in the ListBuffer defined in line 9). This is again O(n) running time. Therefore, the overall time-complexity is O(n) + O(n * m) + O(n) = O(n * (m + 2)), which is a lot larger than O(n²) if m > n. It seems that our algorithm performs pretty badly (worse than quadratic running time) and it also uses a lot of space to calculate the final solution.

* If we know in advance that all coordinates are less than 10,000, we can skip this min-max search and just simply allocate an array for 10,000 elements. What this means is that there is no O(n) penalty on the running time but we may have a bigger array than needed.

Notice that even in this simplest solution we can completely ignore whether or not the buildings are sorted in the input sequence. But something is not quite right besides the very bad running time and used space. This solution only works if the building coordinates are positive integers (so that we can use an array to calculate the heights). And what if the maximum coordinate is way bigger than 10,000? Can we be sure that we can always allocate a big enough array to store all the heights? And if the coordinates are not even integers or perhaps negative? And finally, can we come up with a more time- and space-efficient algorithm? Can we do better?

The inductive solution

Sure we can! It'd be a huge speed-up if somehow we could determine the skyline by calculating the height for those coordinates only where it matters, wouldn't it? Intuition tells us that if we can insert a building into an existing skyline then instead of all the coordinates the building spans over we only need to check the height at the left and right coordinates of the building plus those coordinates of the skyline the building overlaps with and may modify. Similarly to the mathematical induction we can construct an algorithm that does exactly that.

The base case is trivial, every building can be converted into a skyline: given B(l, r, h) the skyline is P(l, h), P(r, 0). Now the inductive step is that we already have a skyline into which we need to insert a new building. Obviously, if we can insert a building into the skyline then we can insert all the buildings to get the final solution. The algorithm is as follows:

If the current skyline is empty add the two P(l₁, h₁), P(r₁, 0) points representing B₁ to the skyline.
Otherwise find the first P_i(x, h) point of the skyline which immediately precedes l_j, the left coordinate of the building B_j, that is x_i < l_j ≤ x_{i + 1}. Save h_i to H if there is such a P_i point, otherwise set H to 0.
Prepend the skyline with P(l_j, h_j) if there is no such P_i point or insert P after P_i if h_i < h_j.
For every P_k(x, h) point where l_j ≤ x_k < r_j replace P_k with P(x_k, h_j) if h_j > h_k and h_j ≠ h_{k - 1}, or remove P_k if h_j ≥ h_k and h_j = h_{k - 1}. Save h_k to H.
Let P_l(x, h) denote the first point in the skyline such that r_j ≤ x_l. Append P(r_j, 0) to the skyline if there is no such point or insert P(r_j, H) before P_l if r_j < x_l and H < h_j.

This algorithm always produces an optimal skyline since in step 4 it removes those points which would be duplicates. This is done for every inserted building, so the final skyline is described with the minimum number of points. Notice that there is no need to store all the heights for all the coordinates of the tallest buildings in an array, so technically the coordinates can be negative or real numbers. Nevertheless, we are going to work with the previously defined Building and Point data structures which use integers. This, however, is not a requirement.

The implementation

Now let's see some code, a functional, recursive implementation. There are some minor differences between those few lines of code below and the algorithm described above which are worth noting, even though they are only implementation details. While the algorithm above suggests an implementation where the actual skyline is being modified on the fly in every iteration by inserting and removing points, the Scala code below, however, emits a new skyline when a building is inserted and uses that as the input skyline for the next building. Another difference is that the described algorithm works with a building and a skyline, while the code below first converts the actual building into two Points as if it was a skyline. This is only done so that it's easier to work with the building's coordinates recursively.

object Skyline {

  def apply(buildings: Building*): Skyline = new Skyline(
    buildings.foldLeft(List.empty[Point]) { (points: List[Point], b: Building) =>
      mergePoints(List(Point(b.left, b.height), Point(b.right, 0)), points, 0, 0, 0, ListBuffer.empty[Point])
    }
  )

  @tailrec private def mergePoints(p1: List[Point], p2: List[Point], prevHgt1: Int, prevHgt2: Int, currHgt: Int, res: ListBuffer[Point]): List[Point] = if (p2.isEmpty) {
    (res ++= p1).toList
  } else if (p1.isEmpty) {
    (res ++= p2).toList
  } else {
    val (firstPt1, firstPt2) = (p1.head, p2.head)
    val (hgt1, hgt2) = if (firstPt1.x == firstPt2.x) (firstPt1.height, firstPt2.height) else if (firstPt1.x < firstPt2.x) (firstPt1.height, prevHgt2) else (prevHgt1, firstPt2.height)
    val maxHgt = math.max(hgt1, hgt2)
    val (points1, points2) = if (firstPt1.x == firstPt2.x) (p1.tail, p2.tail) else if (firstPt1.x < firstPt2.x) (p1.tail, p2) else (p1, p2.tail)
    mergePoints(points1, points2, hgt1, hgt2, maxHgt, if (maxHgt != currHgt) res += (Point(math.min(firstPt1.x, firstPt2.x), maxHgt)) else res)
  }
}

The inductive algorithm in action

Building
Existing skyline
New skyline

Time and space complexity

How much did we improve the brute force solution with this more sophisticated approach? The main work is done inside the mergePoints function. The trivial case, when the skyline is empty, is handled in lines 9-10 by simply adding all the points from the p1 list - which represents the building as skyline points and as such it contains maximum 2 points at any moment - to the result. This clearly runs in O(1) time. If the algorithm already inserted n - 1 buildings and it's about to insert the nth, things get a bit more complicated, but it'll become clear in a moment as I explain.

Since there is an existing skyline and there is a building to be inserted, the mergePoints function skips the first few lines of code and continues in line 14 where it selects the first point (left-hand side) of the building and the first point of the skyline. Because of it always creates a new skyline from scratch when it inserts a building, the algorithm has to determine the height of the point to be added to the skyline (lines 15-16). There are three different cases: the building starts right where the skyline does, or it is before or after the first skyline point. If the two coordinates are the same, then the taller one is selected. If the left-hand side of the building comes before the skyline, then the building's height is selected, otherwise the algorithm has not yet found the building's position in the skyline and the first skyline point's height is selected. In line 17 the algorithm decides how to proceed with the unprocessed points. If the building starts where the skyline does then both points are processed and only the right hand side of the building and the rest of the skyline points need to be merged in the next recursive call. If the building's left-hand side is before the skyline then only the building's left coordinate is processed, the skyline is still to be merged with the right-hand side of the building. Finally, if the whole building is positioned after the first skyline point then only this first skyline point is processed and the building still needs to be merged with the rest of the skyline. The last step is adding the point that comes first with the selected height to the newly built skyline (if its height is different than the previously added point) and recursively call the function with the unprocessed points, the actual heights of the building and the skyline and the current height of the newly built skyline. By doing so the algorithm finds the positions where the building's points are to be inserted and it also maintains the correct height throughout the recursion. If during the algorithm either the building or every point of the skyline is fully processed then it just appends the rest of the building / skyline points to the new skyline in lines 10 or 12.

In the worst case none of the previous n buildings are overlapping so there is a gap between each building in the skyline, and the new building is also distinct. A building is described by 2 points in the skyline (P(l, h), and P(r, 0)) which means that the p2 list contains 2 * n elements. Since the new building is also not overlapping with any of the previous buildings and the above algorithm consumes one point at a time, its running time is O(2 * n + 2), where the +2 is coming from the two points representing the building to be inserted. For the sake of simplicity let's swap the number of buildings by the number of points such that we have n / 2 buildings and n points. For the analysis we can ignore the +2, since this is only a negligible constant. Therefore, the running time of the mergePoints function is linear: O(n).

There is one more thing to consider, though: lines 4-6 where the algorithm iterates through all the buildings. Now that we know that mergePoints is linear in terms of the number of buildings, we can say that when the algorithm inserts the (n + 1)th building, it has to scan through the skyline points of 0, then 1, then 2, ..., then n buildings. Hence the overall running time is 0 + 1 + 2 + ... + n = n * (n + 1) / 2 = O((n² + n) / 2) ≈ O(n²), which is still quadratic.

The used space is always linear, though, in terms of the number of buildings, and it does not depend on the size of the range the buildings span over. The actual skyline is stored in the p2 list and is garbage-collected at the end of every iteration where a new skyline is created from the points of the skyline and the inserted building. Even though the algorithm does not depend on m and the coordinates need not to be non-negative integers anymore, unfortunately it seems we didn't really improve the brute force solution. The question remains: can we do better?

The divide and conquer solution

As a matter of fact, yes we can! We already represented a building in the previous algorithm as if it was a skyline so you might ask if merging two arbitrary skylines is a significantly different problem than merging a building and a skyline? No, it's not. Luckily, we already have everything in place! The above mergePoints function has no idea about what the p1 list represents, be it only a building or a skyline of multiple buildings. The function merges the two skylines in linear time: if p1 contains n₁, p2 contains n₂ points, then the time complexity is O(n₁ + n₂). This is as good as it gets, it can't be improved because we need to check every point in both skylines to decide which one to keep and which one to drop. If we want to improve, we have to find it elsewhere.

Inserting the buildings one after the other is not the fastest way to solve this problem as we've seen it above. If, however, we first merge pairs of buildings into skylines, then we merge pairs of these skylines into bigger skylines, and then merge pairs of these bigger skylines into even bigger ones, then - since the problem size is halved in every step - after log(n) steps we can compute the final skyline. We can still use the mergePoints function to merge two skylines, which eventually runs in O(n) time in every step (since in the worst case none of the buildings are overlapping), so the overall running time of the divide and conquer approach will be - as expected - O(n * log(n)), which is much-much faster than any of the previous two algorithms. And all this is achieved by merging partial skylines instead of inserting the buildings one by one. Let's see how.

The implementation

Compared to the previous solution only the implementation of the apply method changes, so I am not going to repeat the mergePoints function here.

def apply(buildings: Building*): Skyline = {
  def toSkyline(from: Int, to: Int): List[Point] =
    if (to - from == 1) List(Point(buildings(from).left, buildings(from).height), Point(buildings(from).right, 0))
    else {
      val middle = (from + to) / 2
      mergePoints(toSkyline(from, middle), toSkyline(middle, to), 0, 0, 0, ListBuffer.empty[Point])
    }

  new Skyline(if (buildings.length == 0) List.empty else toSkyline(0, buildings.length))
}

Unlike the mergePoint function, toSkyline is not tail recursive. This should not be a problem, though, for we don't expect this method to go very deeply by the recursive calls: it reaches maximum log(n) depth which is very small even for a very a large n.

The divide and conquer algorithm in action

Skyline #1
Skyline #2
New skyline

Final notes on the used Scala collections

In case you are not or less familiar with Scala, you may be interested in the performance characteristics of the different collection types used throughout the different approaches, namely the mutable Array and ListBuffer and the immutable List and Range. There is an implicit 5th type, the WrappedArray, which is the actual type of the vararg method parameter buildings: Building*, and is actually just an Array, "victim" of Scala's implicit type-conversions.

The methods used on the arrays and the range are foreach and foldLeft. Both iterate through all the elements of the underlying collection, thus their running time is O(n).

The methods used on the lists are isEmpty, head and tail. All of these methods are blazing fast, they run in O(1) time and basically have no impact on the performance of the algorithms.

Lastly, the methods used on the list buffer are += and toList. Both of these methods require constant running time so they also don't change significantly the overall performance of the Skyline algorithm. We must be careful, though, with modifying a ListBuffer after it has been exported by the toList method: after an export, the next modifying method will run in O(n) time because all of the elements in the buffer must first be copied in order to not to modify the exported list. Fortunately, we don't have to be worried about it, because the toList method is always the last operation after which the ListBuffer will be garbage-collected.

You can download the complete source code and test cases from here.

Monday, March 19, 2012

Understanding Scala's partially applied functions and partial functions

The two similar terms - partially applied function and partial function - quickly confuse anybody new to functional programming languages. Mastering Scala is difficult by its own which doesn't help understanding the differences between the two similarly named function types. I try to dispel this confusion through some examples.

Before we start, we need to define some terms.

A method in Scala, just like in Java, is a part of a class (or object) and it has a complete signature: name, arguments, return type, and if it's not an abstract method, a body that implements the method. For example:
```
class Math {

  def add(a: Int, b: Int): Int = a + b
}
```
As a syntactic sugar the return type can be omitted, because Scala can infer the type from the last expression of the method.
A Scala function on the other hand is in fact a class. There is a series of traits in Scala, each of them represents a function with various number of arguments: Function0 for parameterless functions, Function1 for functions with one parameter, Function2 for functions with two parameters, and so on up until Function22. Given these traits, a function is a class that mixes in one of them (depending on the actual number of parameters), and as such, it has methods. The most important one is the apply method, which contains the implementation of the body of the function. The number of arguments of the apply method is exactly the same as the number in the name of the mixed in trait, that is it's 0 for Function0, 1 for Function1, 2 for Function2, and so on. For example:
```
object add extends Function2[Int, Int, Int] {

  override def apply(a: Int, b: Int): Int = a + b
}
```
Scala has many syntactic sugars, one of them is the special apply-syntax: if you write a symbol name followed by an argument list in parentheses, Scala translates that into a call to the apply method of the named object. Using the previous example instead of writing add.apply(1, 2), we can simply write add(1, 2) to get the same result.
A function literal or anonymous function is an alternate syntax to define functions. Instead of creating a new class that mixes in one of the FunctionN traits and then overriding the apply method, we can just simply list the named function parameters in parentheses, then write a right arrow, and then the body of the function. For example:
```
(a: Int, b: Int) => a + b
```
From such function literals the Scala compiler generates a function object which mixes in one of the FunctionN traits: the left hand side of the => becomes the parameter list and the right hand side becomes the implementation of the apply method. As you see, it is really nothing more than a shorthand form of a function.
A function value is an instance of some class that extends one of the FunctionN traits. A function literal, for example, is first compiled into a class that mixes in a FunctionN trait and then instantiated runtime. The created instance is a function value. Since function values are objects, we can store them in a variable, but they are functions, too, so we can invoke them using the parentheses function-call notation (in fact it will be converted to a call to the apply method of the function class instance which is assigned to the variable). For example:
```
val add = (a: Int, b: Int) => a + b
add(1, 2) // returns 3
```

Partially applied functions

Now that we're familiar with these terms we arrived to the partially applied functions. From now on, the terms method and function are interchangeable. While in general this is not true, I'll use this simplification because it really doesn't matter when we talk about partially applied functions.

Normally, when we invoke a method or a function, we pass in all the required arguments. The function is then applied to these arguments and it computes some value. In Scala, however, it is not strictly necessary to use all of the arguments. We can decide to pass in only some of them or none of them. In either case, Scala cannot compute any result, since arguments are missing, so instead of doing so, it will automatically generate a partially applied function using the supplied arguments and then creates a function value from it. The generated partially applied function mixes in one of the FunctionN traits, where N depends on the number of missing function-arguments. The body of the generated function's apply method is basically the invocation of the original function's apply method with the supplied arguments completed with the arguments passed in to the apply method of the generated partially applied function. While it may sound very complicated, it turns out to be very simple if it's demonstrated through an example:

val add = (a: Int, b: Int) => a + b
val inc = add(_: Int, 1)
inc(10) // returns 11

What happens here is the following: we invoke the add(Int, Int) function with a fixed argument (the number 1) but the other Int argument is missing. Each missing argument is substituted with an underscore. Since only some of the arguments are supplied, the add(Int, Int) function cannot be executed, so inc will not store the result of the function but it will be a reference to the function value instantiated from the generated partially applied function. What is the type of this generated function? Well, there is only one missing argument, hence its type will be Function1. Function1 has two type parameters: one for the function's input parameter and one for its result type. The type of the missing argument of add is Int and its return type is also Int, so both of the type parameters will be Int. And what exactly does its apply(Int) method do? It simply invokes the original add function with the parameter supplied to apply and the fixed argument value 1. Given all this information we could even implement our own partially applied function (fortunately, the Scala compiler takes care of this so we don't have to):

object inc extends Function1[Int, Int] {

  override def apply(v: Int): Int = add(v, 1)
}
inc(10) // returns 11

Another use case is when none of the arguments are given. One can substitue each of the missing arguments with the underscore character but Scala provides an even simpler format: a single underscore character after the name of the function replaces an entire parameter list, for example add _. (Notice the space between the function name and the underscore! This is needed, otherwise the Scala compiler thinks we're referring to the function called add_.) We can use an even more concise form by leaving off the underscore character. This format, however, can only be used at places where a function is required at that point in the code. Such a point in the code is a higher order function which expects another function as an argument, for example the foreach method defined by the GenTraversableOnce trait, which accepts a single-parameter function: Array(1, 2, 3).foreach(println). In this last example the println(Any) method (defined in the scala.Predef object) is passed in to the foreach method of the array. No parameters are given to println and at that point in the code a function is expected, so we simply leave of the underscore character after println. The Scala compiler generates a partially applied function from the println method, which takes exactly one argument. The actual argument passed in to the partially applied println is the current element of the array as the foreach method iterates on it, so in every iteration the current element is printed to the console.

Partial functions

Despite the similar name, partial functions have nothing to do with the partially applied functions. If you studied mathematics, however, you already know what they are: a partial function from X to Y is a function ƒ: X' → Y, where X' is a subset of X. In other words the function is not defined for all elements of the X set. The most obvious example is perhaps the division function, which maps real numbers to real numbers, but it is not defined for the value 0. More formally ƒ(x, y) = x / y, where x, y ∈ ℝ and y ≠ 0.

Scala's partial functions do exactly the same: the function is defined only on a subset of the possible input arguments. Before we go into details about how they are actually implemented, let's take a little detour to the world of patterns. They're beyond the scope of this post so for now it's enough to know that Scala heavily uses patterns, starting from variable definitions, through for expressions to pattern matching. Most of the time you will encounter them inside a match expression as a series of cases, which let you select from a number of alternatives:

args(0) match {
  case "foo" => println("bar")
  case _ => println("?")
}

There are different kinds of patterns, but to simplify things let's just say whatever follows the case keyword is the pattern, and if the expression matches this pattern, the right hand side of the arrow will be executed. The match expression is not the only place though where we can use case sequences. As a matter of fact a case sequence in curly braces can be used anywhere a function literal can be used. Why? Because such a construction is a function literal itself, but it's more general: while a normal function or method has a single entry point with a given parameter list, case sequences have multiple entry points, and each of them has its own parameters (defined by the patterns). Another difference is that a single function has exactly one function body, while case sequences have as many function bodies as many cases form the sequence, that is the right hand side of each case (or entry point) is a function body. This generalization leads us to the partial functions, because such a case sequence is only interpreted on a subset of its arguments (where any of the patterns matches), and it's undefined in every other case.

val div: (Double, Double) => Double = {
  case (x, y) if y != 0 => x / y
}

The above example is a possible implementation of the aforementioned division function. While it does not demonstrate the usage of multiple cases, the single pattern (x, y) if y != 0 is enough to describe our partial function: it is defined for every pair of numbers except when the value of the divisor is 0. What is the output then of the following function call: div(1, 0)? Well, it will fail with a MatchError, because there is no pattern defined in our function which would match the case when the divisor is 0.

Defining a sequence of cases this way has two small problems though. The first one is that there is no way to test (other than invoking the function itself) if the input arguments are in the subset of the valid values. To avoid exceptions we have to put our function call into a try-catch block but that would pollute our code. The second one is that if the Scala compiler has enough information about all of the possible values of the patterns used in the case sequence, it will complain with a warning message if we don't define all of them. The compiler is absolutely right in this case, since forgotten pattern values lead to runtime exceptions. In case of partial functions, however, we don't want to define all of them, only a subset of them.

Fortunately Scala has an elegant way out of both problems. If we tell the compiler that we want to create a partial function, then we can not only check if the function is defined at the input arguments but the Scala compiler will also stop complaining about non-exhaustive matches on the patterns. So how can we let the compiler know that what we want is really a partial function? It's simple: we only have to define the type of the function as PartialFunction. Here's the div function again, this time as a real partial function:

val div: PartialFunction[(Double, Double), Double] = {
  case (x, y) if y != 0 => x /y
}

PartialFunction is basically a specialized Function1. It has all the methods Function1 has, plus some very handy new methods. For example the isDefinedAt() which we can use to check if the function is defined at the input arguments.

div.isDefinedAt(1, 0) // returns false

By using this method we can avoid the try-catch block, and additionally it's more expressive than catching a MatchError.

As you get more and more familiar with Scala, you'll notice that partially applied functions and partial functions are used at many places. For instance, when you use higher order functions, you are actually working with partially applied functions. Or if you look at a try-catch block, you will notice that the catch block is essentially a partial function. I hope this post helped to clarify the "myth" of partially applied and partial functions, and - should you find a use case for it - you can already start writing your own.

Sunday, February 12, 2012

Neo4j transactions in Scala

Recently I started to work on a project which depends on Neo4j. Neo4j - using is its own definition - "is a graph database, storing data in the nodes and relationships of a graph".

Once we have a database instance, it's fairly simple to work with it. There are methods to get, create, update and delete nodes and relationships. There is, however, a requirement: every operation that writes to the database must be in a transaction. A transaction must either be committed or rolled back when it's finished. Neo4j is written in Java, so the simplest way to demonstrate a basic transaction is a Java example:

public class Neo4jExample {

    private final GraphDatabaseService db;

    public Neo4jExample(GraphDatabaseService db) {
        this.db = db;
    }

    public Node createFooNode() {
        Transaction tx = db.beginTx();
        try {
            Node node = db.createNode();
            node.setProperty("foo", "bar");
            tx.success();
            return node;
        } finally {
            tx.finish();
        }
    }
}

First we need to start a new transaction in the database with the db.beginTx() method. When the transaction is started we can create a new node in the database by calling db.createNode(), then we can set the foo property to bar by using node.setProperty("foo", "bar"). To finish a transaction we need to call tx.finish(), which will either commit or roll back the transaction. It will always roll back unless the transaction was marked as successful with tx.success(), in which case the transaction is committed.

Now notice the try-finally block which encloses the whole transaction. This is the recommended way to work with Neo4j transactions, because this is the only way which ensures that the transaction is properly finished. If an exception is thrown in the middle of the try-block before tx.success() is even being called, the transaction will be rolled back since it was not marked as successful.

The solution is fail-safe and simple, but there is one problem with it. The transaction-handling pollutes our code. The only important thing is in the two middle lines in which we create a node and set a property on that node. The other six lines surrounding it are required by design, but they only take our attention away. Additionally, if we need transactions at many different parts of our code base, we need to repeat those six lines everywhere, which is a lot of code duplication. We could, of course, overcome it by using AOP or by rewriting the above code to use the command pattern, but the former requires some AOP library and the latter makes this simple code a lot more complicated.

A more elegant solution exists in Scala which does not require additional libraries or an utterly complicated design. In Scala not only can we define functions and call
them, but we can write down functions as unnamed literals and then pass
them around as values. Scala supports first-class functions, which means we
can express functions in function literal syntax, i.e., (x: Int) => x + 1,
and that functions can be represented by objects, which are called function values.

So all we need is a method that takes a function-literal argument and executes that function inside a transaction:

trait TransactionSupport {

  protected def transaction[A <: Any](db: GraphDatabaseService)(dbOp: => A): A = {
    val tx = db.beginTx()
    try {
      val result = dbOp
      tx.success()
      result
    } finally {
      tx.finish()
    }
  }
}

We can use this method in our code where we need a transaction:

class Neo4jExample(db: GraphDatabaseService) extends TransactionSupport {

  def createFooNode(): Node = transaction(db) {
    val node = db.createNode()
    node.setProperty("foo", "bar")
    node
  }
}

The transaction handling is defined in a trait which can be mixed in to any class where transactions are needed. The transaction method takes two argument lists: the first one is the database in which the transaction is started, the second one is the function that must be executed inside the transaction boundaries. This signature makes it possible to call the method in the transaction(db) { ... } format which makes it look like the transaction handling is supported by the language itself. The method simply returns whatever the passed in function returns. The body of transaction follows the exact same structure of the above Java example: it starts a transaction, then does something in the try-block, and last it finalizes the transaction in the finally-block. The only difference is that instead of specific database operations it executes the passed in function.

Using the above transaction support is very simple: just wrap the code which must be executed in a transaction into the transaction(db) { ... } method. The code duplication is reduced to mixing in the TransactionSupport trait and calling the transaction method whenever it's necessary. No external libraries are used and no complicated patterns are implemented. And the resulting code is not only not complicated but also very easy to read.

The TransactionSupport trait, however, is not perfect. As long as we don't need complex transaction handling it's good enough, but as soon as we have to roll back transactions when certain conditions are met, or we must acquire read- or write-locks we need to come up with some other solution.

Wednesday, August 31, 2011

Conventions

I'm fond of clean coding. I can't imagine programming on a way that leaves mess and garbage behind. If you do so, then you fail to take enough care over the code you're responsible for.

You always wash the dishes after you used them and most likely you always iron and fold your clothes before you put them into the closet. You don't signal to the left when you turn right, and you don't go through the red light and stop at the green light. You start the sentences with capital letter and end them with the proper punctuation mark.

Some of these examples are only habits, others are forced by some law, yet another is only a matter of following some rules. What they have in common is that they're only conventions and as such they're are all breakable. What happens if you intentionally break these habits, laws and rules? Hopefully nothing serious, and everything goes on normally. On the other hand... You can put your crumpled clothes into the closet but when you wear them you'll look austere. You can cross the red light but you endanger your and other's safety. You can start a sentence with lower case letter but other's may think you're uneducated. You don't want, of course, any of these to happen, so you don't do these things. You simply follow the rules and habits because you know that's the way to do it.

The same principles are true when it comes to clean coding. You don't want others to clean up your mess. You format your code, so it's nice to read. You don't call your method add when it does subtraction. You don't call Iterator.next() when Iterator.hasNext() returns false. And you don't start a class name with lower case letter, or a method name with capital letter. Why? It's obvious. Because only applying the industry standards ensures that you don't make your colleagues' life harder when they have to read, understand and maintain the code you wrote.

For most of us it's common sense. It goes without saying or without thinking. We learned these conventions, principles and rules and use them all the time without making extra effort or wasting time. But unfortunately there's always a few exceptions whose concern towards the code ends as soon as it builds. The code they produce is syntactically and semantically correct, though, but the hair on the back of your neck stands when you have a look at it. When I see such a code I often wonder if the author of the code is as negligent in other aspects of his life as much he was when he wrote the code, or it's only his indolence that prevented him doing things on the right way...

Sunday, August 7, 2011

Wicket state

I heard a question from a colleague on Friday: In Wicket, what is a stateless form? That's easy to answer. It's a form that doesn't have any state. No kidding, really? But how do I know if my form has or has not a state? That's also easy to answer. If you create an instance of the class StatelessForm then your form does not maintain any state. Is it really as simple as that? Well, not exactly... First we need to understand what state means in Wicket.

Being a server-side web framework, Wicket must maintain the state of the web application for every user. When a user interacts with a Wicket web application and accesses a web page an instance of a Page subclass is stored in the memory. Pages visited previously by the user are serialized to the disk and can be loaded later should the user visit them again. But what if a page can have multiple states during a session? Does Wicket serialize every different state of the page to the disk? No, that's not the case. There's another state-handling mechanism, called page versioning, which maintains the current state of the page by logging the changes made to the page. Should the user work only forward only the current state of the pages are important, but should he go backward the previous states and versions become equally important. Both page serialization and versioning happen automatically and every default Wicket component logs the changes to their own state. Custom components on the other hand - like a subclass of the Page class with some declared fields - must implement their own logger by extending the Change class and registering a new instance of this class with the Component.addStateChange(Change) method every time when the page's state changes.

Everything mentioned above, however, is only true for stateful Wicket pages. If the page is stateless then there's no need to version it, to keep it in the session, or to serialize it to the disk, since stateless pages can be instantiated every time when they're needed. So what are the prerequisites of a stateless page? A page is considered stateless if it is bookmarkable and it contains only stateless components. This requires a little bit of explanation.

Bookmarkable page means that a URL can be assigned to the page and this URL does not contain any session-related information, so when the user clicks the link, a new instance of the Page is created. To make a page bookmarkable it must have either a default no-arg constructor, or a constructor that accepts a PageParameters argument, or both. Pages not having such constructors (or these constructors are not public) can only be instantiated from within another page for there's no way Wicket can figure it out what constructor to use with what arguments. When the user first visits a non-bookmarkable page, Wicket serializes it to the session because the only way to recreate the page without the constructors when the user returns to it is to deserialize it. For this reason, every page without the aforementioned two public constructors are considered stateful even if they don't maintain any real state.

Stateless components are components where the stateless hint says they're stateless and the hint of every behavior added to them says the behavior is stateless. Wicket operates in stateless mode by default (as long as the pages have at least one of the two constructors) because most of the Wicket components are stateless. Keep in mind though, that as soon as you add a Link or a Form to your page, or add an AJAX behavior to any of the components contained by the page, Wicket silently switches to stateful mode. If you want to keep your application stateless, you can always use the StatelessForm and StatelssLink components coming with Wicket, which are the same as Form and Link except their stateless hint says they're stateless.

In the first paragraph I asked the question if having a stateless form is really as simple as creating a new instance of the StatelessForm class. I just mentioned that a stateless form only hints that it's stateless but it's still up to you to make it sure that it does not rely on anything else but the data coming from the HTTP request. If the stateless form uses any field from the containing page then it isn't really stateless - unless the page is stateless. Using stateless forms in stateful pages is of course possible, but in my opinion it doesn't make any sense, since the page gets serialized to the session anyway, and stateless forms are meant to help keeping your application stateless.

Thursday, August 4, 2011

How to unit test Spring-based RESTful APIs

Even though the idea of the RESTful services is not new, REST APIs became popular only recently. There's not a single serious service without providing RESTful access to its resources. If there's any, then it's not serious...

There are many ways to implement a REST API: you can use different languages and can choose from multiple frameworks. Still being one of the most popular languages, my choice is Java with the Spring application framework. The annotation-based Spring controllers provide an elegant way to map the HTTP requests to services and then return a formatted response.

I had a problem, however, during the implementation. I love unit tests. I don't release software without testing it to its last bits. And those tests must be automatic, repeatable, fast, and natural to read. The only responsibilities of the controllers should be to map the HTTP requests to method calls and to translate the results into HTTP responses, but I couldn't find a way to test this most essential part of the REST API. At least not by googling for "how to unit test Spring controller request mapping". My research was not extensive, though, but none of the solutions and suggestions I found was to my liking. They were either not automatic, because a servlet container had to be started manually and the curl or wget commands were used later to try it out. Others required a lot of boilerplate code, for those started up an embedded Jetty, configured every HTTP request and then evaluated the HTTP responses. Yet another was just simply not easy enough to digest when I read it, because it used annotation-based test-configuration and relied on the ugly side effects of certain method calls.

After almost an hour of frustration I decided it is easier to think a bit harder than to find a solution I'd like, so I came up with my own that I'm going to share now. First we need a service with a REST API. For the sake of simplicity it'll be a feedback server where someone can leave a positive or a negative feedback - something similar to Facebook's Like and the missing Dislike buttons. An optional message can also be posted along with the feedback. Once a feedback is given, it can be viewed. I'll only show the relevant parts, but if you're interested, you can download the full source code from here. You won't find, however, a fancy client application with actual Like and Dislike buttons. Only the RESTful service is implemented which returns plain text or JSON to the caller. Here we go...

 @Controller
 public class FeedbackController {

     private final FeedbackService feedbackService;

     @Autowired
     public FeedbackController(FeedbackService feedbackService) {
         Validate.notNull(feedbackService, "feedbackService is required");

         this.feedbackService = feedbackService;
     }

     @RequestMapping(value = "/thumbsup", method = {RequestMethod.POST})
     public HttpEntity<String> saveThumbsUpFeedback(@RequestParam(value = "message", required = false) String message) {
         feedbackService.savePositiveFeedback(message);

         return new HttpEntity<String>("Thank you for your feedback", createHeader(TEXT_PLAIN));
     }

     @RequestMapping(value = "/thumbsdown", method = {RequestMethod.POST})
     public HttpEntity<String> saveThumbsDownFeedback(@RequestParam(value = "message", required = false) String message) {
         feedbackService.saveNegativeFeedback(message);

         return new HttpEntity<String>("Thank you for your feedback", createHeader(TEXT_PLAIN));
     }

     @RequestMapping(value = "/list", method = {RequestMethod.GET})
     public HttpEntity<String> listFeedbacks() {
         Gson gson = new GsonBuilder().setDateFormat("H:m:s dd:MM:yyyy").create();

         return new HttpEntity<String>(gson.toJson(feedbackService.listFeedbacks()), createHeader(APPLICATION_JSON));
     }

     private HttpHeaders createHeader(MediaType mediaType) {
         HttpHeaders httpHeaders = new HttpHeaders();
         httpHeaders.setContentType(mediaType);

         return httpHeaders;
     }

     @ExceptionHandler({HttpRequestMethodNotSupportedException.class})
     public void handleRequestExceptions(HttpServletRequest request, HttpServletResponse response) throws IOException {
         response.sendError(SC_METHOD_NOT_ALLOWED, "Request method '" + request.getMethod()
         + "' is not supported on " + request.getRequestURI());
     }

     @ExceptionHandler
     public void handleExceptions(Exception e, HttpServletResponse response) throws IOException {
         response.sendError(SC_INTERNAL_SERVER_ERROR, "An internal error occurred.");
     }
 }

As you can see, the controller above is indeed quite simple:

when a POST request comes in to /thumbsup or /thumbsdown it instructs the feedback service to save the feedback
when a GET request comes in to /list it asks the feedback service to return every feedback and then converts them to JSON
if something goes wrong it handles the exceptions so that the client receives an HTTP status code along with a short message

I created this class with TDD in mind, so tests came first, then the implementation. The tests, however, were very useless: I didn't mock the FeedbackService class, so the tests only retested what was already tested. This was the moment when I started to look for solutions about testing the controller mappings. An hour later came the moment when I gave up and implemented what was necessary to test those nasty Spring annotations.

So what exactly do we need in our unit test? If you know Spring's Web MVC framework then you know that to make our controller work a DispatcherServlet has to be created in our web application's web.xml. This DispatcherServlet will then automatically take care of delegating the requests to their destinations. Simply put, we'll need a web application context and a DispatcherServlet instance in this context. Having configured everything, we can send mocked HTTP requests to the servlet which in turn will reply with mocked HTTP responses. Let's create our own test web application context with a DispatcherServlet:

 public class MockXmlWebApplicationContext extends XmlWebApplicationContext {

     public MockXmlWebApplicationContext(String webApplicationRootDirectory, String servletName, String... configLocations) throws ServletException {
         init(webApplicationRootDirectory, servletName, configLocations);
     }

     private void init(String webApplicationRootDirectory, String servletName, String... configLocations) throws ServletException {
         MockServletContext servletContext = new MockServletContext(webApplicationRootDirectory, new FileSystemResourceLoader());
         MockServletConfig servletConfig = new MockServletConfig(servletContext, servletName);
         servletContext.setAttribute(WebApplicationContext.ROOT_WEB_APPLICATION_CONTEXT_ATTRIBUTE, this);

         DispatcherServlet dispatcherServlet = new DispatcherServlet();

         setServletConfig(servletConfig);
         setConfigLocations(configLocations);
         addBeanFactoryPostProcessor(new MockBeanFactoryPostProcessor(servletName, dispatcherServlet));
         addApplicationListener(new SourceFilteringListener(this, new ContextRefreshedEventListener(dispatcherServlet)));
         registerShutdownHook();
         refresh();

         dispatcherServlet.init(servletConfig);
     }

     private final class ContextRefreshedEventListener implements ApplicationListener<ContextRefreshedEvent> {

         private final DispatcherServlet dispatcherServlet;

         private ContextRefreshedEventListener(DispatcherServlet dispatcherServlet) {
             this.dispatcherServlet = dispatcherServlet;
         }

         @Override
         public void onApplicationEvent(ContextRefreshedEvent event) {
             dispatcherServlet.onApplicationEvent(event);
         }
     }

     private final class MockBeanFactoryPostProcessor implements BeanFactoryPostProcessor {

         private final String servletName;
         private final DispatcherServlet dispatcherServlet;

         private MockBeanFactoryPostProcessor(String servletName, DispatcherServlet dispatcherServlet) {
             this.servletName = servletName;
             this.dispatcherServlet = dispatcherServlet;
         }

         @Override
         public void postProcessBeanFactory(ConfigurableListableBeanFactory beanFactory) throws BeansException {
             beanFactory.registerSingleton(servletName, dispatcherServlet);
         }
     }
 }

What's happening in this class? First of all it extends the XmlWebApplicationContext class which is needed if we want to simulate a servlet environment. Later in the init(String, String, String...) method we set up the servlet context and the servlet configuration. These will be used by the DispatcherServlet to find it's resources. If otherwise not specified, an XmlWebApplicationContext would try to load the standard applicationContext.xml from the WEB-INF directory. The applicationContext.xml, however, is not always in this directory or one may choose a different name for it, so it's configurable with the String... configLocations constructor argument. The locations listed in this argument are then used to load all the necessary configurations.

A DispatcherServlet is instantiated but it is not yet registered in the web application context. We can't simply add new beans to an existing application context - normally the beans are loaded from the applicationContext.xml - but we can register a post processor (a BeanFactoryPostProcessor) which will take care of it when the application context gets refreshed. The DispatcherServlet can receive events when the application context is refreshed so it's a good idea to set up an event listener (an ApplicationListener). Since the dispatcher servlet is only interested in the ContextRefreshedEvent, we can filter out the other application events by wrapping our event listener into a SourceFilteringListener, which will only delegate events that the listener can handle. An optional step is to register a shut down hook on our web application context so that the owned resources are properly released when the JVM is shut down. Having configured it all only two more steps are required: the application context should be refreshed so that the whole configuration gets processed and to initialize the dispatcher servlet.

From now on we can use the new MockXmlWebApplicationContext in our unit test with the proper arguments: we need to pass in the root directory of our web application, the name of the dispatcher servlet, and all the applicationContext.xml files which are needed by our application to work. Here is an example unit test demonstrating the usage of the web application context implemented above. The unit test checks if a POST request was sent to /thumbsup then a positive feedback is saved, and also checks if the request method is not supported on the accessed resource, the correct HTTP status code is set in the response along with an error message.

 public class FeedbackControllerTest {

     private static ApplicationContext applicationContext;

     private DispatcherServlet subject;

     @BeforeClass
     public static void setUpUnitTest() throws Exception {
         applicationContext = new MockXmlWebApplicationContext("src/main/webapp", "feedback-controller", "classpath:spring/applicationContext.xml");
     }

     @Before
     public void setUp() throws ServletException {
         subject = applicationContext.getBean(DispatcherServlet.class);
     }

     @Test
     public void shouldSavePositiveFeedbackWithoutMessage() throws Exception {
         MockHttpServletRequest request = new MockHttpServletRequest("POST", "/thumbsup");
         MockHttpServletResponse response = new MockHttpServletResponse();

         subject.service(request, response);

         assertThat(response.getStatus(), is(SC_OK));
         assertThat(response.getContentAsString(), is("Thank you for your feedback"));
         assertSavedFeedback(POSITIVE, null);
     }

     @Test
     public void shouldSetMethodNotAllowedStatusCodeIfPositiveFeedbackIsAccessedWithGet() throws Exception {
         MockHttpServletRequest request = new MockHttpServletRequest("GET", "/thumbsup");
         MockHttpServletResponse response = new MockHttpServletResponse();

         subject.service(request, response);

         assertThat(response.getStatus(), is(SC_METHOD_NOT_ALLOWED));
         assertThat(response.getErrorMessage(), is("Request method 'GET' is not supported on /thumbsup"));
     }
 }

The tests above meet all of my requirements. Not because I wrote them but because they satisfy the aforementioned criteria:

Being JUnit tests they can be automatically executed every time when the application is built. For the same reason they're also repeatable.
Other then loading the Spring configuration files by the MockXmlWebApplicationContext once in the beginning, the tests are executed very fast for they are not communicating with a real server.
The amount of boilerplate code is reduced to the short implementation of a web application context and to the straightforward configuration of this context. The web application context is reusable for every other controller, and there are no side effects utilized. No real server is started, no real requests are set up and sent to the server in every test, and no cumbersome parsing of HTTP responses are implemented, yet every bit of the controller is tested by simulating the communication between the client and the server.
The tests cases are short, descriptive, and are very natural to my eyes when I read them. I admit, this statement is subjective, but I still like that in only a few lines I set up an HTTP request, create an HTTP response object in which the result is expected, send the request to the dispatcher servlet, and then assert the response.