Skip to content

Ray Tracer in F# – Part III

In part 2 we created a working lighting model for our ray tracer and got a step closer to creating a realistic-looking sphere. Soon we will increase the realism further by applying reflections and creating shapes other than spheres. However, this article is going to be about multiplicity. We are going to extend our program to handle multiple spheres and multiple light sources. This will help teach you about working with important concepts in functional programming as well as learn about some of the F# collections libraries.

But before we get started I just want to tidy up the code a little, because it is just a bit badly formed.

At the moment a sphere has a diffuse colour and we are assuming that specular highlights are always white. While this generally gives good results, it’s not perfect. Because we will want to do more with a shape’s material than just lighting at some point (textures, translucency, etc) let’s break out material properties into another type. Also, the intersection type doesn’t really need a reference to the shape it intersects – it only needs to know the material (as well as the other information it has). So let’s change that too. After you make these simple changes to your types, go through the code and make the appropriate changes, including fixing the code that breaks and removing magic constants from the colorAt function.

type Material = { diffuseColor: Color; specularColor: Color; shininess:float }
type Sphere   = { center:Point3D; radius:float; material:Material }
type Intersection = { normal:Vector3D; point:Point3D; ray:Ray; material:Material; t:float }

//.... and constructing the sphere will look like
let material = { diffuseColor=Color(0.8, 0.1, 0.1); specularColor=Color(0.7,0.7,0.7); shininess=40.0 }
let sphere = { center=Point3D(1.0,1.0,1.0); radius=0.4; material=material }

For completeness, here is the colorAt function.

let colorAt intersections scene =
    let closest = List.minBy(fun i -> i.t) intersections
    let kd = closest.material.diffuseColor
    let ks = closest.material.specularColor
    let Ia = scene.ambientLight
    let L = norm (scene.light.position - closest.point)
    let Id = Math.Max(0.0, Vector3D.DotProduct(L, closest.normal))
    let V = scene.camera.position - closest.point
    let H = norm (L + V)
    let Is = Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,closest.normal)), closest.material.shininess)
    let ambient = kd * Ia
    let diffuse = kd * Id
    let specular = ks * Is
    ambient + diffuse + specular

I’m still not quite happy with the layout of this code. It’s weird that colorAt does the calculation to find the closest intersection point to the camera (or ray origin, more precisely). I’d prefer to move that into the calling function, so in the main nested loop

match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt intersects scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))

becomes

match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt (List.minBy(fun i -> i.t) intersects) scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))

And making the corresponding change in colorAt is simple:

let colorAt intersection scene =
    let kd = intersection.material.diffuseColor
    let ks = intersection.material.specularColor
    // etc...

Now we want to allow for multiple spheres. Change the definition for Scene so it looks like

type Scene = { camera:Camera; spheres:Sphere list; ambientLight:Color; light:Light }

Note the word ‘list’. This means that the ‘spheres’ field will carry a typed list of Sphere values.

This of course breaks our castRay function, which depends heavily on the Scene having just a single sphere. So currently the castRay function performs the task of calculating intersection points. Really what castRay should be doing is delegating this task to another function and simply collecting and returning the results. So let’s make some changes to castRay and also define a separate intersect function to handle the sphere intersect code.

let intersect ray sphere =
    let s = ray.origin - sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + sphere.radius*sphere.radius
    if discr < 0.0 then []
    else
        let normalAtTime t = norm (pointAtTime ray t - sphere.center)
        let (t1,t2) = (-sv + sqrt(discr), -sv - sqrt(discr))
        [ { normal = normalAtTime t1; point = pointAtTime ray t1; ray = ray; material=sphere.material; t=t1 };
          { normal = normalAtTime t2; point = pointAtTime ray t2; ray = ray; material=sphere.material; t=t2 } ]

let castRay ray (scene:Scene) =
    scene.spheres |> List.collect (fun x -> intersect ray x)

The intersect function merely returns a list of intersections with a sphere. We will be making this more generic to handle other shapes later, but for now this works fine. Incidentally, this is the reason we’re using a list instead of a tuple or something where handling only two results would have been more appropriate.

Notice the |> operator in castRay. This is called the pipeline operator and Dom Syme, the man behind the F# language, it is “perhaps the most important operator in F# programming.” (This quote is from Expert F# 2.0). Basically it allows us to “pipe” the value of some expression into another expression, which results in some pretty cool looking code. We’ll be using the pipelining feature more later on.

So this is pretty much all we need to render multiple shapes. Let’s add another shape to the scene and take a look.

    let material1 = { diffuseColor=Color(0.8, 0.1, 0.1); specularColor=Color(0.7,0.7,0.7); shininess=40.0 }
    let sphere1 = { center=Point3D(-0.1, -0.1, 1.8); radius=0.39; material=material1 }
    let material2 = { diffuseColor=Color(0.1, 0.1, 0.8); specularColor=Color(0.7,0.7,0.7); shininess=20.0 }
    let sphere2 = { center=Point3D(0.4, 0.3, 2.0); radius=0.3; material=material2 }

    let scene = { camera=camera; spheres=[sphere1;sphere2]; ambientLight=Color(0.2,0.2,0.2); light=light }

Of course you are free to use as many spheres as you like – experiment with procedurally creating arrays of spheres, it can be pretty fun.

We’re going to take a similar approach for using multiple light sources. Here we’re going to use nested inner functions (closures) to break our code into functions which we can hide from the calling code. colorAt now looks like

let colorAt intersection scene =
    // nested function for ambient color
    let ambientColorAt intersection scene =
        scene.ambientLight * intersection.material.diffuseColor

    // nested function for specular color
    let specularColorAt intersection scene =
        let ks = intersection.material.specularColor
        let V = scene.camera.position - intersection.point

        let specularAtLight light =
            let L = norm (light.position - intersection.point)
            let H = norm (L + V)
            let Is = light.color * Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,intersection.normal)), intersection.material.shininess)
            ks * Is

        List.sumBy(fun x -> specularAtLight x) scene.lights

    //nested function for diffuse color
    let diffuseColorAt intersection scene =
        let kd = intersection.material.diffuseColor

        let diffuseAtLight light =
            let L = norm (light.position - intersection.point)
            let Id = light.color * Math.Max(0.0, Vector3D.DotProduct(L, intersection.normal))
            kd * Id

        List.sumBy(fun x -> diffuseAtLight x) scene.lights

    let ambient = ambientColorAt intersection scene
    let specular = specularColorAt intersection scene
    let diffuse = diffuseColorAt intersection scene

    ambient + diffuse + specular

On a side note, I just noticed that throughout this series I have made an embarrassing mistake – I forgot to multiply the light intensity values with the surface reflectance, resulting in all lights behaving as though they were coloured 1.0,1.0,1.0. This is fixed in the preceding code.

You can see from this snippet that we are summing the values of all colours collected from all light to create specular and diffuse values. Setting up a second light source allows us to view the results.

    let light = { position=Point3D(1.0,4.0,-3.0); color=Color(0.5,0.5,0.5) }
    let light2 = { position=Point3D(-3.0,-2.0,-1.0); color=Color(0.8,0.8,0.8) }
    let scene = { camera=camera; spheres=[sphere1;sphere2]; ambientLight=Color(0.2,0.2,0.2); lights=[light;light2] }

Now there’s one more thing I want to do before I close off this entry. Right now other objects are ignored when we are summing up the lights. If we just check for intersections between each intersection point and the light source we are inspecting we can see if that intersection point should be in shadow. Seems simple enough, so let’s implement some simple shadowing.

So all we need to do is check whether a ray between the intersection point and the light source intersects ANY other shapes. If not, we know there’s a clear path to the light source. If so, that light source doesn’t contribute to the lighting at that intersection point. We can ignore ambient lighting, because it applies whether we are in shadow or not. Make the following modifications to the colorAt function.

let colorAt intersection scene =
    // check if we're in shadow
    let inShadow point light =
        let ray = { origin = intersection.point; direction = light.position - intersection.point}
        let intersections = castRay ray scene
        if intersections.Length = 0 then false
        else true
// ...then where diffuse light is calculated...
    let diffuseColorAt intersection scene =
        let kd = intersection.material.diffuseColor

        let diffuseAtLight light =
            let L = norm (light.position - intersection.point)
            let Id = light.color * Math.Max(0.0, Vector3D.DotProduct(L, intersection.normal))
            if inShadow intersection.point light then Color(0.0,0.0,0.0)
            else kd * Id            

        List.sumBy(fun x -> diffuseAtLight x) scene.lights

Although I’m sure technically we should do the same for specular light, in my experience it looks like ass with these sharp shadows, so I keep specular light around. I’ll let you judge for yourself which is better. At some stage we’ll implement more advanced lighting and shadowing and we won’t need to decide. So here’s the scene rendered earlier with our shadow model. On the left if with specular lighting shadowed, the right not shadowed. I prefer the right.

The banding effect on the shadows occurs because I included a third light to demonstrate that shadows can have multiple levels of darkness, depending on how many lights are blocked.

In the next article we will finally start actually ray tracing, which means reflections! We’ll get to play with recursion and we’ll also start using shapes other than spheres. The source code so far is available here.

Tagged , , , , , , ,

Introducing Trip, an open-source UML editor for Eclipse

UML Editing in Trip

In my final year of university we were tasked with building a capstone project. The aim was a round-trip engineering tool based on UML as an Eclipse plugin. Round-trip engineering is the process of taking code, producing some kind of graphical (or otherwise) model of that code, manipulating it and being able to push your changes through to the underlying source code, and vice versa.

My group’s project was the most successful in the class, and I decided to continue working on it in an open-source capacity. I have done almost nothing since the final grading so far, but I have plans to continue work immediately.

Some of the code is kind of sketchy. We were learning a lot about design as we went so it’s kind of a dependency nightmare, but honestly it’s no worse than production code I run into every day at work. My initial plans for the project are to first improve the design. I intend to use modern design techniques like dependency injection and inversion of control to get the dependencies to a more manageable and testable state. I will also be incorporating many more unit tests and fixing some of the “unit tests” in the project which are actually more like integration tests.

I intend to use this project as a way to improve my own OO design skills, as well as learning more discipline in areas like TDD. I may use it to experiment with modern methodologies such as BDD, and I intend to dig deeper into the modelling platform on which it’s built and perhaps try my hand with a DSL, but that will happen later in the project.

Take a look at the project on BitBucket. Trip is licensed under the Eclipse Public License (EPL).

Trip is built on the Eclipse platform using the Graphical Editing Framework. Special thanks go to my wonderful class mates from Canterbury university, Karen Harris, Will Gittoes, Myse Elmadani, Peter Empson and Ed Robinson.

Ray Tracer in F# – Part II

In the previous post we set up the bones of a ray caster and got it rendering a very basic “sphere” with no shading or anything. Today we’re going to make the sphere look more like a real sphere by applying a Phong shading model to it.

If we want to shade each pixel on the sphere correctly then we’re going to need to get some information about the point of intersection between the ray and the sphere. Before getting into code, we’ll just start with the mathematical description of our lighting model. This will be brief and I encourage you to investigate further if you have never done any kind of shading or lighting before.

We are going to be working with three different kinds of lighting:

  1. Ambient – ambient lighting is a general light that applies to every surface in the scene, regardless of orientation, shadows, etc. Ambient light has no source. You can think of ambient light as the base-line or “zero-point” of the scene’s lighting. It’s the amount of light that a portion of the scene will have when in complete shadow. Although there is no basis for ambient light in physics, without it things just look unrealistic. Perfectly black shadows look wrong.
  2. Diffuse – When light from a source strikes a surface, the surface reflects that light in the same quantity in every direction (think of this as a surface reflecting photons in perfectly random directions). The strength of a diffuse reflection is dependent on the angle of the surface with the light travel path. If a surface is perfectly perpendicular to the ray of light then the reflection will be total (100%). The reflectivity drops as the surfaces turns its face on an angle to the light, until it reaches 0% when the surface is 90′ to the light source (this will be described in more detail soon).
  3. Specular – Specular lights are those little highlights in a scene, particularly on curved surfaces. Like diffuse reflections, specular highlights are dependent on the angle of the surface, but in a different way. Specular lighting will be described in detail towards the end of this article.

The three kinds of light add together to form a fully lighted scene:

So let’s define a new data type in our program that represents an intersection between a ray and our sphere. For now we’re still working under the assumption that the scene will only ever have a single sphere. I find working iteratively like this is a good workflow for F#, especially with F# interactive.

type Intersection = { normal:Vector3D; point:Point3D; ray:Ray; sphere:Sphere; t:float }

An intersection has a surface normal, a point in space, an associated ray and a t value, which describes how far along the ray the intersection happens. We can just derive t from point (or vice versa) but I find it convenient to store both.

Now we’re going to want our castRay function to return Intersection objects instead of true or false. It should be capable of returning multiple intersections for a given ray. A sphere will usually have two intersections with a ray, and it’s not difficult to imagine shapes that will have more than two. So we’ll return a list of intersections.

In the previous article all we did for the ray-sphere intersection was compute the discriminant of a quadratic, but in order to find the exact point in space of the intersection we’ll need to solve the full equation for both solutions (if you don’t know the quadratic formula, read this). Here is the modified castRay function. I have also included a small utility function which we’ll make use of, pointAtTime, which just computes the point of some line given by p = a + bt for a given value of t.

/// Get the position of a ray at a given time
let pointAtTime ray time =
    ray.origin + time * ray.direction

let castRay ray (scene:Scene) =
    let s = ray.origin - scene.sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + scene.sphere.radius*scene.sphere.radius
    if discr < 0.0 then []
    else
        let normalAtTime t = norm (pointAtTime ray t - scene.sphere.center)
        let (t1,t2) = (-sv + sqrt(discr), -sv - sqrt(discr))
        [ (t1, { normal = normalAtTime t1; point = pointAtTime ray t1; ray = ray; sphere=scene.sphere });
          (t2, { normal = normalAtTime t2; point = pointAtTime ray t2; ray = ray; sphere=scene.sphere }) ]

Just for a sanity check let’s modify the guts of the nested for-loop that computes the image just to make sure that what we have got is still producing a correct image.

for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            match castRay ray scene with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> bmp.SetPixel(x,y,Color.Red)

The output is now the same image we were getting at the end of the previous article. Now for some shading. What we need to do is create a function that will transform a list of intersections into the correct colour value. The process for this will be

  1. Get intersection with the lowest t value (closest to the camera).
  2. Add the ambient, diffuse and specular lighting components for this ray on this sphere with this surface normal.
  3. Return the aggregate colour value

The fact that I can easily break the function into 3 parts like that suggests it shouldn’t be a single function, but we are going to refactor later so let’s just go for it.

Before we can light something, we need to give it a colour, so let’s give our sphere some colour. Because we’re going to be doing some operations on colours we’ll define our own colour type. The built-in .NET colours don’t have affordance for multiplying or adding colours together or multiplying colours by scalar values.

type Color(r: float, g: float, b:float) =
    member this.r = r
    member this.g = g
    member this.b = b
    static member ( * ) (c1:Color, c2:Color) =
        Color (c1.r*c2.r, c1.g*c2.g, c1.b*c2.b)
    static member ( * ) (c:Color, s:float) =
        Color (c.r*s, c.g*s, c.b*s)
    static member ( + ) (c1:Color, c2:Color) =
        let r = Math.Min (c1.r+c2.r, 1.0)
        let g = Math.Min (c1.g+c2.g, 1.0)
        let b = Math.Min (c1.b+c2.b, 1.0)
        Color (r,g,b)
    static member Zero = Color(0.0,0.0,0.0)

Here we’re defining a simple 3-tuple for RGB colour values where 0.0 <= colour <= 1.0. We're defining operations on these types in the same way you would in C# or other OO languages. We've defined a multiplication operation for both scalar and colour multiplication.

Quick note on spelling. I’m from New Zealand where we use mostly English spelling for words like “colour”. However in my code I use American spelling because I feel that is the de factor worldwide standard for programming.

One last thing to do before computing some shading here is to actually place a light in the scene. We’ll also set the scene’s ambient light. In this code the Light type is new, the Scene and Sphere types are modified.

type Sphere = { center:Point3D; radius:float; diffuseColor:Color }
type Light  = { position:Point3D; color:Color }
type Scene  = { camera:Camera; sphere:Sphere; ambientLight:Color; light:Light }

Also modify our scene declaration a bit

    // scene
    let light = { position=Point3D(0.0,3.0,4.0); color=Color(0.8,0.8,0.8) }
    let scene = { camera=camera; sphere=sphere; ambientLight=Color(0.2,0.2,0.2); light=light }

Okay, let’s do some light computation. We’ll start with the simplest one – ambient light. Ambient light is given by the expression
I = {k_d}{I_a}, ~ 0 <= k_d <=1
where I_a is the intensity of our ambient lighting in the scene and k_d is the diffuse reflectivity of the sphere’s material. Sometimes people model ambient light with k_a, giving objects a different diffuse and ambient reflectivity. In my opinion this case is rare enough that we can skip it for now.

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    Ia * kd

And a small modification to the nested for-loop to make this all work.

    for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            let intersects = castRay ray scene
            match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt intersects scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))


We should really create a new function to do the transformation between our Color type and the built-in one, but for now we’ll leave it. Running the program now results in the image on the right. This is the sphere lit only by our 20% ambient light.

Now for diffuse lighting. The diffuse portion is the diffuse colour of the sphere multiplied by the projection of a vector from the intersection point to the light source (normalized) on the normal vector (normalized) of the intersection point. In other words, the dot product of these two vectors. Mathematically, diffuse lighting is given by
I = {k_d}(L . N)I_d
where k_d is the material’s diffuse reflectivity (as before), L is the vector between the light source and the intersection point (light.position – intersection.point), N is the normal of the intersection point (both normalized) and I_d is the intensity of the light. (That’s a dot product between L and N, amazingly my maths library doesn’t seem to have a dot product character!).

So we add diffuse reflection and ambient together:

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    let Id = Math.Max(0.0, Vector3D.DotProduct(norm (scene.light.position - closest.point), norm closest.normal))
    let ambient = kd * Ia
    let diffuse = kd * Id
    ambient + diffuse

We’re getting there! See the image on the left. I’d like to take a moment to explain that List.maxBy call above, because it is quintessential functional programming. If you have never done functional programming before it probably looks a bit strange. Something to understand about this paradigm is that functions are first-class objects. This means that functions can be passed around as parameters and returned from other functions. Essentially we can treat functions as though they were data members in functional languages.

List.maxBy is a function that takes two parameters – the first parameter is another function which takes some object of the list type (T’) and returns some other object of type U’ with which we make the comparison. The second parameter is a list of type T’ and the return type is the T’ which has the maximum value of all of the U’s associated with the T’s in the list. In this case T’ is type Intersection and U’ is type float (the t value of our intersection). So we’re asking the List.maxBy function to give us the Intersection in this list which has the highest t value. Understand? It’s a little tricky if you’ve never seen it before, but it’s a very powerful technique.

All we have left to compute is the specular component and we have a complete lighting model. Specular light can be given as
I = {k_s}{(N . H)^alpha}I_s
where k_s is the specular reflectivity of the material (the same as ambient/diffuse reflectivity earlier, but usually a white colour), N is the surface normal at intersection (normalized), I_s is the specular intensity of the light source, alpha is a material constant (usually on the order or 100-500) and H is a vector called the “half-way vector”. H is the vector half way between the view vector (camera.position – intersection.point) and the light vector (light.position – intersection.point). I won’t go into detail about the geometry that makes this work because it has been covered all over the place (see Blinn-Phong shading model), but suffice it to say that this expression will give us nice-looking specular highlights :) You can vary the value of alpha to change the size and sharpness of the specular highlight – this value is often referred to as the “shininess” value of a material. So here is the modified code to include all three shading modes.

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    let L = norm (scene.light.position - closest.point)
    let Id = Math.Max(0.0, Vector3D.DotProduct(L, closest.normal))
    let V = scene.camera.position - closest.point
    let H = norm (L + V)
    let Is = Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,closest.normal)), 500.0)
    let ambient = kd * Ia
    let diffuse = kd * Id
    let specular = Color(1.0,1.0,1.0) * Is
    ambient + diffuse + specular

This is a little messy, but we’ll tidy up later! I have actually cheated here and just used pure white light for I_s. We will implement a material type in the next article.

Notice how when we add all of these shading methods together we get the expression
I = {k_a}{I_a} + {k_d}(L . N)I_d + {k_s}{(N . H)^alpha}I_s
When dealing with multiple light sources, this will become
I = {k_a}{I_a} + sum{m subset lights}{}({k_d}(L_m . N)I_dm + {k_s}{(N . H)^alpha}I_sm)
Which is the Phong shading model! (modified to use the Blinn-Phong specular method). We will implement this multiplicity (of lights and shapes) in the next article (now available). Stay tuned!

Source code so far.

Tagged , , , , , , ,

Ray Tracer in F# – Part I

In this series of posts I’m going to demonstrate how to implement a ray tracing engine in F#. The purpose of these posts isn’t just to build a ray tracer – instead, they are put together as a tutorial on the F# language. This program will demonstrate most (if not all) of F#’s major features and show you how to use standard F# patterns and idioms.

I don’t know how long this series will be at this stage. I chose to use a ray tracer for a couple of reasons. Firstly, I already implemented a basic ray tracer in F# for a university assignment last year (see image on right), so I have the skeleton of the code available to me already (although the final product will be much more polished). Another reason is that ray tracing is an excellent application of functional programming. It’s a process that is conducive to immutable data structures, side-effect-free functions and stateless programming. It’s also simple enough to be able to blog about, but complex enough to not seem like a “toy example” to demonstrate F# in.

You should be warned before going into this that there will be some mathematics. For the purposes of learning F# it’s not imperative that you understand the maths, but if you want to understand the ray tracer itself then it will certainly help. I will explain the ray tracing background to everything that I do, but not in excruciating detail, because I want F# to be the focus here.

So let’s begin. First of all, what is a ray tracer? Ray casting (not tracing) is the process of casting a “ray” from some imaginary eye point, through a screen and into a scene. Whatever the ray strikes in the scene is what we will display at the point on the screen where that ray passed through it. If we repeat this process at every point on the screen that will correspond to a pixel, then we can build an image of the scene. This image might help illuminate things.

These images come from Richard Lobb's ray caster demo, used with permission.


The eye point is the black sphere on the left. You can see a ray for each pixel on the “screen” passing from the eye point and through the center of each pixel. Where the ray intersects an object in the scene (everything behind the screen) the pixel that ray ran through becomes coloured appropriately. There will also be a default “background” colour for rays which intersect nothing in the scene.

I should note that some people refer to this as “reverse ray casting, which actually makes a bit more sense. Obviously in real life rays travel from light sources and into the eye. In our ray caster we do that backwards. Ray tracing is the logical extension of this idea when we take into account things like reflection and refraction. We will define this more rigorously when we get there, but for this iteration we will stick to just ray casting.

So that’s the extremely brief introduction to the geometry of ray casting, before we get into the code we need just one more very quick intro lesson, and that’s lighting. I encourage you to explore this further yourself because I’m going to be very brief here. We’re going to use a simple Phong illumination model (actually it’s a slightly modified Phong model known as Blinn-Phong) to light our objects. This takes into account the ambient, specular and diffuse components of light, and we’ll (eventually) be applying this to each of multiple light sources in a scene. The referenced Wikipedia articles are actually a pretty good start to understanding this.

So let’s get started. The very first thing we are going to do is just render a flat sphere with no shading at all. It’s not very impressive but getting something onto the screen ensures we have a decent understanding of the ray tracing basics.

Let’s set up some simple data types for our ray tracer.

open System
open System.Windows.Media.Media3D
open System.Drawing
open System.Windows.Forms

type Sphere = { center:Point3D; radius: float }
type Camera = { position:Point3D; lookAt:Point3D; lookUp:Vector3D }
type Scene  = { camera:Camera; sphere:Sphere }
type Ray    = { origin:Point3D; direction:Vector3D }

We’re going to start very basic, with a scene that has no lighting and just a simple sphere. These type declarations give us a jumping-off point. When I’m writing F# code, especially when I’m just getting started, I like to throw everything into F# interactive, just to give myself a sanity check that my types all look right. You can quickly do this by hitting ctrl-A to select all of your code and hitting alt-enter to execute it in F# interactive.

One annoying thing about the System.Windows.Media.Media3D.Vector3D type is that there is no built-in immutable way to normalize vectors. Calling the Normalize method will actually mutate the vector you call it on, so let’s quickly define our own functional version of normalization.

let norm (v:Vector3D) =
    let abs = sqrt (v.X * v.X + v.Y * v.Y + v.Z * v.Z)
    v / abs

Unfortunately for this ray tracer there’s a fair bit of annoying plumbing code to make things work. I keep this function at the bottom of the main source file. Most of what is in it is not difficult to understand, it’s just necessary.

do
    let width = 480
    let height = 320

    // Vertical and horizontal field of view
    let hfov = System.Math.PI/3.5
    let vfov = hfov * float(height)/float(width)

    // Pixel width and height
    let pw = 2.0 * System.Math.Tan(float(hfov/2.0))/float(width)
    let ph = 2.0 * System.Math.Tan(float(vfov/2.0))/float(height)    

    let box = new PictureBox(BackColor = Color.White, Dock = DockStyle.Fill, SizeMode = PictureBoxSizeMode.CenterImage)
    let bmp = new Bitmap(width,height)

    // sphere
    let sphere = { center=Point3D(1.0,1.0,1.0); radius=0.4 }

    // camera
    let camera = { position=Point3D(0.0,0.0,0.0); lookAt=Point3D(1.0,1.0,1.0); lookUp=Vector3D(0.0,1.0,0.0) }

    // scene
    let scene = { camera=camera; sphere=sphere }

    // set up the coordinate system
    let n = norm (camera.position - camera.lookAt)
    let u = norm (Vector3D.CrossProduct(camera.lookUp, n))
    let v = norm (Vector3D.CrossProduct(n, u))
    let vpc = camera.position - n

    for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            match castRay ray scene with
            | true -> bmp.SetPixel(x, y, Color.Red)
            | false -> bmp.SetPixel(x, y, Color.Gray)

    bmp.Save("output.jpg")

I’ll explain some of this, because it’s not all completely obvious, although there’s nothing particularly fancy in F# terms in here.
hfov andvfov define our horizontal and vertical fields of view. Vertical field of view is defined in terms of the horizontal field of view and the aspect ratio (determined by the width and height of our screen). This ensures a correct aspect ratio in our images.

pw and ph are pixel width and pixel height. Because we want to send our rays through the center of each pixel, we can’t think of pixels are infinitely small. Each pixel has some height (defined in terms of world coordinates) and width. Hopefully you understand enough trigonometry to see how we got these values in terms of the field of view (an angular measurement).

The next few lines define some really basic things like where the camera is and where it’s pointing, where our sphere is, etc.

Next is the really tricky part (if you have never done 3D graphics before). If you don’t understand how points and vectors interact in terms of addition and subtraction you may have some trouble understanding this. Basically what we’re after here is a set of three lines which are mutually perpendicular and a single (world) unit in length. These lines will define our camera’s coordinate system. So we start by defining an arbitrary line, n, as being the vector between the point the camera is looking at and the point where the camera is located, normalized. This gives us some vector as long as the camera is not located exactly where it is looking.

We then find a vector orthogonal to this one by taking the cross product of n and the camera’s look-up vector. Because a cross product gives a vector mutually perpendicular to the input vectors, this is guaranteed to be perpendicular to n. The constraint on this is that the camera must not be looking directly upwards, or u will be zero in magnitude and impossible to normalize.

To find a third orthogonal vector we simple take the cross product of the former two vectors and normalize. There we have it, three mutually orthogonal normalized vectors! Note also that n is negative in the direction that the camera looks. We use this to create vpc, or “view port center”, a point one unit along the camera look-at vector.

The nested for-loop construct should be fairly straight-forward (if not especially functional, but don’t worry, we’ll refactor this as we go). We’re filling in a bitmap grid so we’re just going to loop over each pixel in the grid. For each pixel we are calculating rayPoint, which is some arbitrary point on the ray, which allows us to calculate rayDir. Studying these two lines for a few minutes should allow you to see how this works – the maths isn’t terribly complicated. From this ray direction we can compose a ray our of the camera position and the ray direction. We then cast the ray, and if it hits we render the pixel red, otherwise black.

I’d like to take a small digression to explain that match syntax. This keyword allows for pattern matching, a very powerful technique in F#. We will get into more complex pattern matching scenarions later in this series. The basic idea is that you can pass some expression into a match function along with a series of pipe-separated options for possible matches. The code under the option that matches the expression first is the branch that will execute. This may sound somewhat basic (and with mine looking pretty much like an if-statement, it is in this case) but this actually facilitates some really cool stuff. If you just can’t wait until my next couple of posts, I highly recommend doing some Googling to find out what makes this so interesting.

So pretty much all we have left to do is the castRay function. Again I will post the code and then explain.

let castRay ray scene =
    let s = ray.origin - scene.sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + scene.sphere.radius*scene.sphere.radius
    if discr < 0.0 then false
    else true

The gist of this is that if the ray intersects the sphere we will return true, otherwise false. For general ray-casting or ray-tracing this isn’t especially interesting, and we’ll change it later, but for now it allows us to at least render our sphere.

I’ll quickly go over the maths of this function because if you haven’t done a lot of geometry you’re probably wondering WTF is going on there. My crappy variable names probably aren’t helping.

We can define the ray parametrically as r(t) = r_0 + {r_d}t with r_0 = [x_0, y_0, z_0] and r_d = [x_d, y_d, z_d].
Further, we can define our sphere as the set of points [x_s, y_s, z_s] such that (x_s - x_c)^2 + (y_s - y_c)^2 + (z_s - z_c)^2 = {S_r}^2 where [x_c, y_c, z_c] is the sphere’s center point and S_r is the sphere’s radius. Solve by substituting the ray equation into the sphere equation to get
Ray:
x = x_0 + {x_d}t
y = y_0 + {y_d}t
z = z_0 + {z_d}t
Sub into the sphere equation for x, y and z
(x_0 + {x_d}t - x_c)^2 + (y_0 + {y_d}t - y_c)^2 + (z_0 + {z_d}t - z_c)^2 = {S_r}^2
if you expand this out you’ll find you get
At^2 + Bt + C = 0
A = {x_d}^2 + {y_d}^2 + {z_d}^2
B = 2(x_d (x_0 - x_c) + y_d (y_0 - y_c) + z_d (z_0 - z_c))
C = (x_0 - x_c)^2 + (y_0 - y_c)^2 + (z_0 - z_c)^2 - {S_r}^2

With a bit of inspection you should be able to see how this fits into our F# code above. If the ray direction is normalized then A will always be 1. You can see that B is the dot product of the ray’s direction and a vector between the ray origin and the sphere’s center (sv in the code) and C is the vector between the ray’s origin and the sphere’s center, dotted with itself (ss in the code). Google “sphere ray intersection” for more detail.

So now that we have an expression for the intersection points of a ray and a sphere, all we’re doing is seeing if the intersection actually exists by solving the quadratic, using the regular quadratic formula. In this case we’re just checking for the discriminant, because if it’s negative then there is no intersection, otherwise there are two. For now we don’t care where the intersections happened, just that they did.

Now we put all of that together and we get this:

Okay, so it’s not exactly thrilling, but it is a genuine ray caster. The next step is to add some lighting to make this look like a real sphere. The next article will be posted within a weekis now available. To whet your appetite, here is an image and a quick video of what the product will look like after just a few articles.

Source code so far

Tagged , , , ,

Poor UI Design can Kill

On January 20, 1992, Air Inter Flight 148 crashed into the Vosges Mountains while approaching Strasbourg Airport in Strasbourg, France. 87 of the 96 crew and passengers on board were killed. So why am I writing about a horrible accident on my programming blog? Because of the cause of the crash.

Many factors came together to cause Air Inter Flight 148 to hit a mountain. However one of the primary causes should be of concern to anyone developing software or designing the UI over top of it. The Airbus A320 had the most advanced cockpit in the world. The majority of the gauges and instruments in front of the pilots in this machine are digital.

When the captain of the aircraft was approaching the runway through thick clouds, he thought he had set a descent angle of a mild 3.3 degrees for a gentle descent above the Vosges Mountains. However, unbeknownst to him the plane was descending at a rate of 3,300 feet per minute, much faster than the descent would have been at 3.3′. The aircraft struck a mountain top and exploded. Air accident investigators noticed the similarities between those two numbers – the pilot’s intended descent rate (3.3′) and the actual descent rate (3,300 fpm).

As it turns out, the airbus A320 used a single display for two different descent modes – Vertical Speed (VS) Mode and Flight Path Angle (FPA) mode. The display showed FPA as two digits separated by a decimal point, and VS as just two digits (and both had a minus sign where their values were negatives). To the right is an image of the display showing an FPA of -2.8′. I wasn’t able to find a picture of the display in VS mode, but you can see from the pictures in this study how similar the two displays are – nearly impossible to tell apart.

This simple design error led the pilot to believe he was in one mode while he was in another. It was not the only factor in the accident, but the crash almost certainly would not have happened had the aircraft’s user interface been more clear. This UI design error is one of a well-known class of errors called mode errors.

Despite the fact that anyone who has taken a semester in HCI knows that modes are a Bad Thing, they are still ubiquitous in today’s software, and they’re still as annoying as ever. Unfortunately it’s simply not easy to find alternatives to mode-based interactions sometimes. On the left is an image of Google Chrome in two different modes – regular and ‘private browsing’ mode. The difference between these modes is very subtle – a small picture with little contrast in an area of the window I never look. But the implications of this mode are not trivial. If you close a tab in ‘private browsing’ mode, you can’t simply ctrl-shift-T to bring it back from the dead. It’s gone forever. The browser will not remember your login details, your Amazon shopping cart or your browsing history. Obviously this is the whole point of private browsing, but it’s also significant that this mode is barely distinguishable from regular browsing mode unless you have already made the mistake of closing a tab you still wanted.

I faced my own problems with mode errors when building my university capstone project, a round-trip UML modelling tool. To draw a relationship between two classes, the user selects a relationship type from the palette on the right and then draws the relationship. But how should the tool behave after that? Is it better to immediately jump out of “relationship drawing mode” after the user draws the relationship? Or should the tool remain available for further use? Due to strict time and resource constraints we were never able to perform much usability testing, but from what I saw neither of those answers was correct. They were both awkward and they both led to errors frequently. The modal paradigm is simply broken beyond repair, in my humble opinion.

So I’m not exactly the first one to say it, but I guess the moral of the story is avoid modes whenever possible. Oh, and in case you’re wondering, the new Airbus A320 cockpit displays show Flight Path Angle as a two-digit number while Vertical Speed is shown with a 4-digit representation. It’s not a perfect fix, but it’s a heck of a lot better.

Tagged , , , ,

Writing Testable Web API Wrappers

Writing a wrapper for a web API seems like a pretty simple thing to do. I mean, most of the code is already written for you, all you are doing is adapting the response from some web sever into your own object model. In most cases your object model looks very similar to the server’s web service representation, so what’s difficult about this?

In fact there are several pitfalls and architectural decisions to be made when writing web API wrappers. I intend to write about some of these in the coming months, but today I’m going to talk about writing web API wrappers in a testable way, following a Test-Driven Development (TDD) approach to the problem. This basically means that before I write a lick of code, I must first write the tests to exercise the code.

My sample application is going to be very simple, with a single API method and a single test, against the Flickr API. It is not designed as a demonstration of how to structure an API wrapper, it is merely for the purposes of demonstrating the principles of this article.

Before we jump into the code, let’s carefully consider exactly what it is we want to test. Are we testing that the Flickr API gives us an XML response that looks like what we expect, given a valid input? NO!! We approach this problem under the assumption that the Flickr team has already thoroughly tested their API, and that given a valid input we will get the correct output. Do we test that our HTTP layer is in fact able to connect to the Flickr service and send a request? Again, no! Presumable whoever wrote the HTTP library we are using has tested this.

In preparing for this article I looked at a ton of REST API wrapper projects in the .NET, Ruby and Python communities, wrapping Twitter, Flickr, Chargify and others. Nearly all of them run “unit tests” that involve actual communication with the servers they are working against. This is not a unit test! These end-to-end tests are called ‘integration tests’ (or if they are very broad, ‘smoke tests’). They are valuable, but they are not unit tests. Unit tests need to be very fast and very reliable to run, in order to ensure developers run them frequently. Testing over a network connection and a server half way across the world is neither fast nor reliable, particularly when running hundreds of tests at a time.

So now that we have that rant out of the way, let’s look at how we structure our API wrapper in such a way that we can test it independently of the API itself. As I said, we’re using the Flickr API and I’m going to be calling a method that returns the public ‘favorites’ for a particular User ID, flickr.favorites.getPublicList. I’ll be working in C# with the Visual Studio testing tools, but the principles here apply to almost any language and framework.

Click to enlarge


First I create a new solution with a C# class library project and a unit test project. Don’t forget to reference the API project from the unit test project. After I delete the goo that comes with new projects and create a couple of my own classes my project looks like the image on the right.

Now we need to write the test. This has to happen before we write any code if we are following TDD principles. Let’s consider exactly what we want to test. We’re going to test a method on the Flickr class that should return a list of photos given a user ID. We are not going to test the HTTP layer or the service. This means that we need to mock up the service/HTTP stuff. Here is the code for our test:

using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using TestableApi;
using Moq;
using System.Xml.Linq;

namespace UnitTests
{
    [TestClass]
    public class FlickrTests
    {
        private const string ApiKey = "[My API key]";

        [TestMethod]
        public void GetFavouritesTest()
        {
            // Arrange
            string userId = "22860405@N02";
            string url = "http://api.flickr.com/services/rest/";
            var args = new List<KeyValuePair<string,string>>
            {
                new KeyValuePair<string,string>("method", "flickr.favorites.getPublicList"),
                new KeyValuePair<string,string>("api_key", ApiKey),
                new KeyValuePair<string,string>("user_id",userId)
            };
            var helper = new Mock<IWebHelper>();
            helper.Setup(q => q.HttpGet(url, args)).
                Returns(GetFavouritesResponse);

            Flickr flickr = new Flickr(helper.Object);

            // Act
            var result = flickr.GetFavouritesFor(userId);

            //Assert
            Assert.IsNotNull(result);
            Assert.AreEqual(1, result.Count);
            Assert.AreEqual(1, result.First().Id);
        }

        // Helpers
        private string GetFavouritesResponse
        {
            get
            {
                return new XDocument(
                    new XDeclaration("1.0", "UTF-8", "yes"),
                    new XElement("photos",
                        new XAttribute("page", "1"),
                        new XAttribute("pages", "1"),
                        new XAttribute("perpage", "100"),
                        new XAttribute("total", "1"),
                        new XElement("photo",
                            new XAttribute("id", "1"),
                            new XAttribute("owner", "22860405@N02"),
                            new XAttribute("secret", "abc"),
                            new XAttribute("server", "123"),
                            new XAttribute("farm", "1"),
                            new XAttribute("title", "title"),
                            new XAttribute("date_faved", "1200000000")))).ToString();

            }
        }
    }
}

I know this looks like a bit much for a unit test, but it’s really quite simple.

Arrange

This is where most of the action is happening. I’m using the Moq framework for mocking objects. If you haven’t used this before then it may look a little confusing. But read at the first few lines of GetFavouritesTest() like this: Given valid user ID, API endpoint URL and API arguments, my mock implementation of the IWebHelper interface should return valid XML. This is how we mock the service – we assume that given the right input we will get the expected output. The GetFavouritesResponse property is just some XML we would expect the server to send us given these arguments. So what we’re really testing is our class’s ability to transform this response into something useful, in this case a collection of Photo objects.

Act

This is nice and simple, we just call the method and store the output in our result variable.

Assert

And here we just check that the output is what we would expect, given the input.

So now we get to writing the actual code. I’m going to start with the Photo class, just to get it out of the way. Note that while doing true TDD you will be writing a test for Photo as well, before you create this Photo class. In this demonstration I’m going to skip this, and give a simplified version of Photo.

using System;
using System.Xml.Linq;

namespace TestableApi
{
    public class Photo
    {
        public Photo(XElement xmlNode)
        {
            Id = Convert.ToInt32(xmlNode.Attribute("id").Value);
            // other properties from XML
        }

        public int Id { get; set; }
        // all other Photo fields
    }
}

The Photo class takes an XElement argument and sets the properties as needed. Whether this is a good design or not is up for debate and is something I will be discussing in a future blog post, but it’s quick and it works, so it fits perfectly with the TDD mantra, Red-Green-Refactor.

Next we’re going to look at the IWebHelper interface.

using System.Collections.Generic;

namespace TestableApi
{
    public interface IWebHelper
    {
        string HttpGet(string url, IEnumerable<KeyValuePair<string,string>> args);
    }
}

In a proper system we would also have a POST method, but this works for now. Finally, Flickr class. For now we write just enough code to allow us to compile and run the test. We are expecting it to fail.

using System.Collections.Generic;
using System.Xml.Linq;

namespace TestableApi
{
    public class Flickr
    {
        private readonly IWebHelper webHelper;

        public Flickr(IWebHelper webHelper)
        {
            this.webHelper = webHelper;
        }

        public ICollection<Photo> GetFavouritesFor(string userid)
        {
            return null;
        }
    }
}

Notice how we are using dependency injection (in this case, constructor injection) to allow consumers of the API to inject their own implementations of our dependencies through the constructor. This is crucial to making your libraries testable. Without the ability to inject dependencies we are simply unable to provide mock services.

Now we run our tests, and as expected:

Cool! We have a failing test. Now it’s time to get to work in making it pass. Remember, in TDD we write just enough code to pass the tests, then STOP! In my opinion this discipline leads us to simple, easy-to-read code and saves lots of time in fluffing about writing code that is never fully exercised. I’ll just go ahead and paste in the complete Flickr class:

using System.Collections.Generic;
using System.Xml.Linq;

namespace TestableApi
{
    public class Flickr
    {
        private const string ApiKey = "[My API Key]";
        private readonly IWebHelper webHelper;

        // Uncomment this when we create and test the real concrete
        // WebHelper
        //public Flickr() : this(new WebHelper()){ }

        internal Flickr(IWebHelper webHelper)
        {
            this.webHelper = webHelper;
        }

        public ICollection<Photo> GetFavouritesFor(string userid)
        {
            var result = new List<Photo>();

            var args = new List<KeyValuePair<string,string>>();
            args.Add(new KeyValuePair<string,string>("method", "flickr.favorites.getPublicList"));
            args.Add(new KeyValuePair<string,string>("api_key", ApiKey));
            args.Add(new KeyValuePair<string,string>("user_id", userid));

            string resp = webHelper.HttpGet("http://api.flickr.com/services/rest/", args);
            var doc = XDocument.Parse(resp);
            foreach (var node in doc.Element("photos").Elements("photo"))
                result.Add(new Photo(node));

            return result;
        }
    }
}

A quick note about the constructors: notice how I made the injectable constructor internal. Whether you do this or not is up to you and will depend on the nature of your project and your dependencies. In this case, I don’t want to expose the IWebHelper interface to my library users and they will have no need to use this constructor. Clever readers may have noticed that the IWebHelper interface is publicly exposed. I have to do this so that the Moq mocking framework will be able to mock it in the test. If someone knows a way around this so that I can keep IWebHelper internal, I’d love to hear it. Also if you’re using C#, to make internal members available to the unit test project, simply add this line to your project’s AssmbleInfo.cs file:

[assembly: InternalsVisibleTo("UnitTests")]

(Replace “UnitTests” with your unit test project’s name).

Now when we run the tests:

:)

Summary and Remarks

First, a couple of remarks on this code. Yes, there absolutely should be more unit tests in this code. Obviously this is a demonstration project, so take it for what it is – not production code. Some people may also take issue with the granularity of the unit test I provided. This test is technically exercising more than one functional unit of work. It tests that our method actually calls into the web helper class, and it also tests that it handles the output of this class appropriately. I absolutely agree that this should be refactored and broken into more tests. The purpose of this article was to show how a project can be structured in such a way that it is testable, so again, take it for what it is.

To summarize, here are the steps we took to make our project nicely unit testable.

  • Identify exactly which parts of the system we want to test, and which parts we don’t
  • Where we are interacting with parts of the system that are not being tested, hide them behind interfaces
  • Provide affordance for your class library consumers to inject their own implementations of these dependencies, as seen in the Flickr class constructors above
  • In the test, mock these interfaces and provide appropriate output from the mock given the input. Don’t forget to run tests with mocks that behave correctly given bad input as well as good.

Hope this helps, thanks for reading.

Tagged , , , , , , , ,

Collision Handling the OO Way with C# Multiple Dispatch

Update: The code in this post has been updated due to an error in the first publication.
Collision handling has always been a tricky thing with OO languages. Historically, most strongly-typed (and therefore performant) OO languages have not been multiple-dispatch languages. Read the linked article for details, but the upshot of this is that the exact method overload to be called in most OO languages is determined at compile-time. This is an issue for tasks like collision handling, where we want to use abstract implementations of our types, rather than concrete types, yet we still need to handle collisions differently depending on the colliding types.

Typically this problem is solved by using type switches, enums or similar non-OO techniques. But no more! In C# 4.0 we were introduced to the dynamic keyword. dynamic allows us to tell the compiler, “hey, compiler, get your sticky hands off my method calls and let the runtime decide how to handle this”. Method calls are dynamically dispatched at runtime. So what’s the big deal? Well in terms of collision handling it allows us to write much more elegant and robust code. We no longer have need for type switches, and each class is responsible for handling how it responds to collisions with each other class, or ignore collision altogether.

Here’s a little sample program I put together to demonstrate how I handle this. Note how concrete classes are able to handle the collisions they are interested in, and ignore the ones they don’t want to handle (these are delegated to the default implementation). This is my first crack at OO collision handling, so if you have any critique or a better technique, pleae leave a comment.

using System;

namespace TestGame
{
    class Program
    {
        static void Main(string[] args)
        {
            Entity w = new Wall();
            Entity s = new Snake();

            ((dynamic)w).Collide((dynamic)s);
            ((dynamic)s).Collide((dynamic)s);
            ((dynamic)s).Collide((dynamic)w);
            ((dynamic)w).Collide((dynamic)w);
            Console.ReadLine();
        }
    }

    abstract class Entity
    {
        public virtual void Collide(Entity other)
        {
            // Do nothing in default implementation
        }
    }

    class Snake : Entity
    {
        public void Collide(Snake s)
        {
            Console.WriteLine("Snake collide with snake");
        }
        public void Collide(Wall wall)
        {
            Console.WriteLine("Snake collide with wall");
        }
    }

    class Wall : Entity
    {
        public void Collide(Snake snake)
        {
            Console.WriteLine("Wall collide with snake");
        }
    }
}

And the output from this program

C# Tip: Override Equals() on Value Types for Better Performance

In C# we have the option of using value types (structs) or reference types (classes). The distinction is important and you should consider carefully which kind of object your types should be. In this post I won’t go into detail about how to make this choice, but I’ll give you a quick tip for improving performance in your application.

The default implementation of instance Equals() depends on your object type. Reference types inherit System.Object’s default implementation, which is a simple object identity (ReferenceEquals()) check. By default, reference types will return true on instance Equals() if and only if they are the same object instance.

Value types, on the other hand, inherit from System.ValueType. This class overrides System.Object.Equals with an implementation that uses reflection to check each field in the struct for equality (using instance Equals()). As you may know, reflection is a powerful tool but is also very inefficient. To drive the point home I wrote a quick program to check the difference in performance between the default equality comparison and a custom implementation that simply compares the fields of the struct, but without using reflection to inspect them.

  class Program
    {
        static void Main(string[] args)
        {
            Address a1 = new Address()
            {
                StreetName = "Some Street",
                StreetNumber = 1
            };
            Address a2 = new Address()
            {
                StreetName = "Other Street",
                StreetNumber = 2
            };

            AddressWithEquals a3 = new AddressWithEquals()
            {
                StreetName = "Some Street",
                StreetNumber = 1
            };
            AddressWithEquals a4 = new AddressWithEquals()
            {
                StreetName = "Other Street",
                StreetNumber = 2
            };

            StringBuilder builder = new StringBuilder();
            builder.Append("Iterations,WithoutEquals,WithEquals\n");
            for (int i = 5; i < 10; i++)
            {
                int iterations = (int)Math.Pow(10.0, (double)i);
                bool b;

                DateTime start = DateTime.Now;
                for (int j = 0; j < iterations; j++)
                {
                    b = a1.Equals(a2);
                }
                double finish1 = (DateTime.Now - start).TotalMilliseconds;
                start = DateTime.Now;
                for (int j = 0; j < iterations; j++)
                {
                    b = a3.Equals(a4);
                }
                double finish2 = (DateTime.Now - start).TotalMilliseconds;

                builder.Append(String.Format("{0},{1},{2}\n", iterations,finish1,finish2));
            }

            StreamWriter writer = new StreamWriter(@"C:\Users\Martin\Desktop\structOutput.txt");
            writer.Write(builder.ToString());
            writer.Close();
        }
    }

    struct Address
    {
        public string StreetName { get; set; }
        public int StreetNumber { get; set; }
    }

    struct AddressWithEquals
    {
        public string StreetName { get; set; }
        public int StreetNumber { get; set; }

        public override bool Equals(object obj)
        {
            if (obj == null) return false;
            if (obj.GetType() != this.GetType()) return false;

            AddressWithEquals other = (AddressWithEquals)obj;
            return other.StreetName == this.StreetName &&
                   other.StreetNumber == this.StreetNumber;
        }
    }

What I found was that using your own implementation was consistently as much as 7 times faster than using the default implementation. The ratio between the two running times was nearly perfectly constant as the number of iterations increased, and as I varied the number of fields for comparison.

Here are a couple of graphs showing some of my results. The data is identical, but the second graph uses a logarithmic scale on the Y-axis to match that on the X-axis. The difference in performance is approximately 6-fold, sometimes as high as 7-fold. You can see how perfectly it scales with iterations. The scaling with the number of fields is the same.

Equality checking is a fundamental operation. It is often said that you should never optimize your code until you find a performance bottleneck. In my opinion equality comparison of structs in C# is an exception – always override .Equals() with your structs.

Tagged , , ,

The Broken Java Set Interface

Finding flaws in the Java Collections framework is like shooting fish in a barrel. From an OO design point of view the framework is just plain bad. From the infamous “vector is a list” gaffe to consistent use of inheritance for implementation, it’s enough to send shivers down the spine of any object fanatic.

Today I want to post about an issue I came across in the Java collections hierarchy that I had never seen before. This is a trap that is easy to fall into, so I was surprised that I had never read about it in the many articles I have read about Java collections. The problem is simple and easy to understand, yet it is insidious enough to potentially lead you into a debugging rabbit hole. The problem arises from Oracle’s neglect of contracts (sorry Oracle, when you acquired Sun you acquired their baggage too!).

Java Set Implementations

The wonderful thing about OO design, abstraction and interfaces is the implicit guarantee that if I have some object that conforms to some interface, I don’t need to care what kind of object I’m dealing with. It doesn’t matter if I’m driving a Honda Accord or a Mac truck, when I turn the wheel clockwise I expect my vehicle to turn right accordingly. The fact that the Mac truck implements this through a huge steering box while my little Accord relies on a flimsy rack and pinion system doesn’t matter to me, the driver. The vehicle has exposed an interface to me – a steering wheel – which means they have agreed that it will behave like every other steering wheel I have used. Simple, elegant and the essence of object design.

So let’s take a look at the two main implementations of Java.Collections.Set. We have HashSet and TreeSet, two fundamentally different implementations with very different performance characteristics. All I know is that if someone gives me a Set, it better behave like every other Set I have used.

So what’s the problem here? The problem is, these Sets don’t behave the same, in a subtle but important way. Let’s explore by example. Inspect the following code and try to predict the outcome.

import java.util.*;

public class SetTest {

	Set<Thing> hashSet;
	Set<Thing> treeSet;

	public SetTest() {
		hashSet = new HashSet<Thing>();
		treeSet = new TreeSet<Thing>();

		Thing t1 = new Thing("Thing 1");
		Thing t2 = new Thing("Thing 2");

		hashSet.add(t1);
		hashSet.add(t2);
		treeSet.add(t1);
		treeSet.add(t2);

		System.out.println("Before t1.setString");
		System.out.println("HashSet contains t1: " + hashSet.contains(t1) + ", t2: " + hashSet.contains(t2));
		System.out.println("TreeSet contains t1: " + treeSet.contains(t1) + ", t2: " + treeSet.contains(t2));
		System.out.println();

		t1.setString("Different String");

		System.out.println("After t1.setString");
		System.out.println("HashSet contains t1: " + hashSet.contains(t1) + ", t2: " + hashSet.contains(t2));
		System.out.println("TreeSet contains t1: " + treeSet.contains(t1) + ", t2: " + treeSet.contains(t2));
		System.out.println();
	}

	public static void main(String[] args) {
		new SetTest();
	}
}

class Thing implements Comparable<Thing> {
	private String str;

	public Thing(String str) {
		this.str = str;
	}

	public void setString(String newStr) { str = newStr; }

	public int compareTo(Thing o) { return str.compareTo(o.str); }

	@Override
	public boolean equals(Object o) {
		// flawed equals method for simplicity
		if (o instanceof Thing) return ((Thing)o).str.equals(str);
		return false;
	}

	@Override
	public int hashCode() {
		// flawed hashcode method for simplicity
		return str.hashCode();
	}
}

For those who aren’t in the mood for reading code, here’s what’s happening. We have some class Thing, with a String field, str. This class identifies as its field, str. This means it must also hash on this field. There have been many words written on the theory of hashCode/equals, so I’ll leave it to you to Google around if you don’t know why it’s important to define these methods together.

We now instantiate two Thing objects and two Sets of Things, one HashSet and one TreeSet. The Things are added to each set, and we check that they are indeed in the sets. We then change the name of one Thing and check again. The result?

Before t1.setString
HashSet contains t1: true, t2: true
TreeSet contains t1: true, t2: true

After t1.setString
HashSet contains t1: false, t2: true
TreeSet contains t1: true, t2: true

The HashSet no longer thinks it contains the Thing! So why did this happen? It’s probably fairly obvious by now, but I’ll make it explicit. A HashSet uses a hash table to store values. It uses the hash algorithm to figure out if a value is contained in the table. This means that a HashSet is not only a Set, it’s also a Map, but the details of the Map are hidden behind the Set interface. This seems logical until you consider the contract for a Map.

Maps rely on key values that do not change. It is imperative that the key in a mapped object is immutable in order to find the object later. When we changed that str field in Thing, we implicitly altered the key (the hash code of the previous String). Unfortunately this change was not abstracted away properly in the Set. When we use the contains method we are asking the hash table for an object that is identical to an object already in the set, but that hashes differently to how it hashed when first added.

Incidentally this bug also allows us to add duplicate values to a HashSet.

The result of all of this is that we cannot rely on all Java Sets to behave the same. In our own code we could just start using TreeSets if we know we want to have mutable hashable values. But what if we are given a Set by some other piece of code? The whole point of OO encapsulation and information hiding is that we aren’t supposed to care how it’s implemented. Unfortunately we simply can’t behave like this in the Java Collections framework.

It should be noted that Sun/Oracle did seem to consider, at some point, the issues with mutable Set members. In the Set API they state

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

but then fail to follow through with anything at all in the HashSet API description. Regardless, this business of noting in your API, “by the way, depending on the implementation things might break” is not acceptable OO practice, in my opinion.

So what is the solution? If we were to re-implement the Set interface and implementations, what would we do differently? One hacky solution is to force the HashSet to perform a linear search on its hash table when contains() misses. This seems to solve the problem, but we lose the main advantage of a HashSet, O(1) adding, removal and containment checking. Is there a better solution to this problem? What if we were to rewrite the Set hierarchy from scratch, what would we do differently? I don’t have the answers right now, but it’s something I’ll be thinking about.

Tagged , , , ,

Implementing the Mandelbrot set in F#

Mandelbrot Set SmallOne of my favourite things to do when learning a new language (or graphics package for that matter) is to write an implementation of a Mandelbrot set viewer. So here’s my F# version, annotated to help out anyone who is trying to get to know this language.

I’m still quite new to F# so if you spot something that just looks plain wrong (I know how you functional sticklers can be!) then let me know!

This program will help you understand

  • Interacting with .NET in F#, particularly Windows Forms
  • Recursion in functional programming
  • Working with the new .NET 4 Complex built-in numeric type

An understanding of complex numbers is assumed.

I’m going to start with the core functions of this program – the functions that will determine which points are in the set and which aren’t. I apologize in advance for the code coloring, I haven’t been able to find a code coloring WordPress plugin for F#. If you know of one, let me know! Thanks a lot to Christian Abildsø for suggesting a working F# syntax highlighter in the comments! Take a look at his blog for some more great F# posts.

// Recursively performs the transform Z -> Z^2 + c for the given number
// of iterations or until escapes the Z > |2| threshold
let rec transform (originalNum:Complex) (currentNum:Complex) maxIterations iterationNumber =
  if (iterationNumber = maxIterations) then 0
  else if (currentNum.Magnitude > 2.0) then iterationNumber
  else
    let sqr = Complex.Multiply(currentNum, currentNum)
    let newNum = Complex.Add(sqr, originalNum)
    let newIter = iterationNumber + 1
    transform originalNum newNum maxIterations newIter

// Returns whether or not the given complex number is in the set to the
// best possible accuracy we can get from the given number of iterations
let inMandelbrotSet (num:Complex) maxIterations =
  let iters = transform num num maxIterations 1
  if (iters = 0) then true
  else false

The inMandelbrotSet function takes a complex number and an int and outputs a boolean (Complex -> int -> boolean in Curried notation). You might notice I often prefer curried parameters rather than tuples – this is a personal preference and there really isn’t a good reason to choose one over the other. If you don’t know the difference then you should read this excellent blog entry.

Essentially all this function does is pass on the work of figuring out whether it’s in the set onto the transform function (Complex -> Complex -> int -> int -> int). transform recursively calls itself and checks if the value of the complex number “escapes” the threshold (I don’t want to turn this post into a tutorial on the Mandelbrot set, but suffice it to say that if you apply the Z -> Z^2 + c transformation an infinite number of times and at some point it gives a result |Z| > 2 then the value can not possibly be in the set). If it escapes it returns the number of iterations it took to escape, otherwise it returns 0.

So basically now we have a function, inMandelbrotSet that will tell us whether or not a complex number is in the set.

Let’s set up a Windows Form and a bitmap image to display the set.

// Setting up the UI components
let mainForm = new Form(Width = xSize, Height = ySize, Text = "Mandelbrot Set")
let box =
  new PictureBox
  (BackColor = Color.White, Dock = DockStyle.Fill,
   SizeMode = PictureBoxSizeMode.CenterImage)
mainForm.Controls.Add(box)

// Sets up the canvas and generates the set on it
let generateSet (width:int) (height:int) =
  let bmp = new Bitmap(width, height)
  let gfx = Graphics.FromImage(bmp)
  for x in 0..(width-1) do
    for y in 0..(height-1) do
      let c = complexCoordinates(float(x),float(y))
      if (inMandelbrotSet c 200) then bmp.SetPixel(x, y, Color.Black)
  bmp

xSize and ySize are constants we’ll define in a minute. generateSet is a function that creates a bitmap, builds the Mandelbrot set on it and returns the bitmap. The other stuff behind it is fairly self-explanatory, just setting up an ordinary .NET Windows form. Take a look at the constructor for the PictureBox – notice we can define values inline. This is pretty cool, it means that we can treat the PictureBox as an immutable object even though it really isn’t. Just helps to make things feel more “functional”.

Now basically the only thing left to do is define some constants and transform our Bitmap coordinates to numbers on the complex plane. Here’s the remaining code:

open System
open System.Drawing
open System.Windows.Forms
open System.Numerics

// Size of window and canvas
let xSize = 640
let ySize = 480

// Min and max complex coordinates to display on the canvas
let xMin = -2.0
let xMax = 1.0
let yMin = -1.2
let yMax = 1.2

// Size of each horizontal and vertical pixel in terms of complex numbers
let xFactor = (xMax - xMin) / float(xSize)
let yFactor = (yMax - yMin) / float(ySize)

// Transforms pixel positions into corresponding complex numbers
let complexCoordinates(x,y) =
  new Complex((x*xFactor + xMin), (y*yFactor - yMax))

The constants at the top of the file define the ratio of a pixel to a complex number on the plane – the xFactor and yFactor. This is for mapping bitmap pixels to complex numbers. Finally, complexCoordinates does the actual mapping, turning two integers into a complex number.

Finally we give the program an entry point:

// Kick start the program
[<STAThread>]
do
    box.Image <- ((generateSet xSize ySize) :> Image)
    Application.Run(mainForm)

That funny :> operator is a cast in F#. This particular version is an unchecked runtime cast. The arrow notation is a way of changing an object’s properties – generally something you don’t want to do in functional programming (where everything is immutable) but of course F# has to interact with the .NET framework, so it’s really a hybrid language.

Put it all together and this is what you get:

Mandelbrot Set in F#

Mandelbrot Set in F#

Tagged , , , ,