Skip to content

Splines and curves part I – Bézier Curves

In this series I want to give an introduction to curves and splines for the programmer. These are typically used in computer graphics, game design scenarios (especially camera motion) and engineering to generate smooth shapes from simple datasets.

Hopefully by the end of this series you will be able to not only generate your own curves in your program, but will understand why they are formed the way they are and have the tools to understand which curve technique to select for your own scenarios.

Splines were originally a tool for engineering draftsman for building smooth curves in disciplines like boat-building to generate smooth curves using control points. The flexibility of the flat metal or wood spline along with the placement and constraint of the control points defined the shape of the curve. Most spline techniques in mathematics and software borrow from these ideas to form the shapes desired by the user.

Wikipedia defines splines in the mathematical context as

In mathematics, a spline is a smooth polynomial function that is piecewise-defined, and possesses a high degree of smoothness at the places where the polynomial pieces connect (which are known as knots).

We’ll get into what exactly is meant by “high degree of smoothness” and the other technical terms as we go. For now you can think of a spline as a series of points which we will generally define parametrically over the unit interval, determined by some spline function. We are free to determine the number of points along the interval depending on the context in which the spline is to be used.

As an aside, did you know that the famous Utah Teapot is defined as a set of control points over which we build splines with an arbitrary number of points over each interval? The number of points drawn over each curve interval determines the number of polygons drawn, which is what is meant by the previous paragraph. Using a single small data set we can build a teapot with an arbitrary number of polygons. Neat, huh?

Approximation versus Interpolation

Before we get started it’s important to note the difference between an approximation technique and an interpolation technique for curve generation. All curves are defined interactively by placing control points. If the algorithm for generating the curve passes through all control points then it is referred to as an interpolation algorithm. Algorithms that build curves that are affected by the control points but don’t necessarily pass through them all are referred to as approximation algorithms. Which one you use will depend on your particular use case. We will explore the differences in more detail further down.

Bézier curves

The first curve we will investigate is the infamous Bezier curve. It should be noted that the Bezier curve is not, in fact, a spline, but it’s still a useful starting point. We will start with an example, and delve into some details of how this curve works and why. Note that the interactive examples in this article use the HTML5 canvas element so may not work correctly on older browsers and some mobile devices. Hover your mouse over the curve to see how it is constructed.



Hover over canvas to see line construction. Click & drag control points to modify line.

This is a quadratic Bezier, which is a fancy way of saying it’s a Bezier curve with 3 control points (the reason for the names quadratic, cubic etc will become clear soon). So what’s going on here? You can see when you hover your mouse over the above image that as we vary t, we take the parametric point t along each of the lines that form our control points (I have drawn these straight lines in light grey for your convenience). We then take these new points and draw a line between them. The point that determines the point t on our curve is the point t along this new line.

We can implement this quadratic Bezier using an algorithm known as de Casteljau’s Algorithm. It is essentially a sequence of tweening steps, in this case to generate a part of a parabola. Think of it like this: if we label the two straight lines defined by our three control points A and B, we linearly interpolate across these lines and then repeat the interpolation to find the point P(t) that resides a fraction t of the way between A(t) and B(t):
A(t) = (1 - t){P_0} + t{P_1}
B(t) = (1 - t){P_1} + t{P_2}
P(t) = (1 - t)A + tB

You can see we’re using the basic linear interpolation formula for parametric lines here. If you’re not familiar with linear interpolation then read up!

Anyway, by direct substitution (try this out on paper if you like) we can get to this formula for P(t):
P(t) = {{(1 - t)}^2}{P_0} + 2t(1 - t){P_1} + {t^2}{P_2}

Here’s the code I have used to implement this in Javascript. Please note that most of the code in this guide is rushed and not optimal. This article is more about the concepts than the code, and you should put some thought into how to write optimal versions before you implement your own. Also, for the supporting code around this check out the bottom of this post, I’ll include the code I used for drawing, mouse events, etc at the end because it’s not directly relevant to this post.

    function quadraticBezier(clicks, ctx) {
        function bezierFunc(points, t) {
            var p0 = points[0];
            var p1 = points[1];
            var p2 = points[2];
            var t2 = t*t;

            var dx = dot([p0.x, p1.x, p2.x], [(1-t)*(1-t), 2*t*(1-t), t2]);
            var dy = dot([p0.y, p1.y, p2.y], [(1-t)*(1-t), 2*t*(1-t), t2]);

            return point(dx,dy);
        }

        var p = [];
        for (var j = 0; j < lineSegments+1; j++) {
            p.push(bezierFunc(controlPoints, j/lineSegments));
        }
        curveSegment(p, ctx);
    }

    function dot(v1, v2) {
        var sum = 0;
        for (var i = 0; i < v1.length; i++) {
            sum += v1[i]*v2[i];
        }
        return sum;
    }

    function lineSegment(p0, p1, ctx) {
        ctx.beginPath();
        ctx.moveTo(p0.x,p0.y);
        ctx.lineTo(p1.x,p1.y);
        ctx.stroke();
        ctx.closePath();
    }

    function point(x, y) {
        return {
            x:x,
            y:y
        };
    }

    function curveSegment(points, ctx) {
        ctx.lineWidth = 2;
        ctx.strokeStyle = '#000';
        for (var i = 0; i < points.length-1; i++) {
            lineSegment(points[i], points[i+1], ctx);
        }
    }

In this code I am using a dot product that I defined to make multiplying over points a little cleaner. You don’t have to do it this way but I prefer it. Also some libraries have extremely efficient matrix and vector multiplication algorithms so I tend to gravitate towards these functions.

To explain very briefly what’s happening in this code, given a set of control points we take the first three (we’re only dealing with quadratic Beziers for now) and apply the above formula. We can use a dot product because each control point appears once and only once in the formula, which is thankfully true of Bezier curves of all degree.

We can extend this technique to Bezier curves of higher order. To generate a cubic Bezier curve, we add another control point. We then apply the same interpolation we just did across the first three points, and we apply that interpolation across the last three points. Finally we interpolate across the two points we have just generated with respect to t and this point defines our curve at t. See the demonstration below, where the blue line represents our newly interpolated line:


Hover over canvas to see line construction. Click & drag control points to modify line.

It’s not difficult to find the formula for the cubic Bezier (left as an exercise):
P(t) = {P_0}{{(1-t)}^3} + {P_1}3{{(1-t)}^2}t + {P_2}3(1-t){t^2} + {P_3}{t^3}

For completeness, here is the code to generate this curve, which is very similar to the previous code.

    function cubicBezier(controlPoints, ctx) {
        function bezierFunc(points, t) {
            var p0 = points[0];
            var p1 = points[1];
            var p2 = points[2];
            var p3 = points[3];
            var t3 = t*t*t;
            var t2 = t*t;

            var dx = dot([p0.x, p1.x, p2.x, p3.x], [(1-t)*(1-t)*(1-t), 3*(1-t)*(1-t)*t, 3*(1-t)*t2, t3]);
            var dy = dot([p0.y, p1.y, p2.y, p3.y], [(1-t)*(1-t)*(1-t), 3*(1-t)*(1-t)*t, 3*(1-t)*t2, t3]);

            return point(dx,dy);
        }

        var p = [];
        for (var j = 0; j < lineSegments+1; j++) {
            p.push(bezierFunc(controlPoints, j/lineSegments));
        }
        curveSegment(p, ctx);
    }

In fact, we can extend this to arbitrary degrees. The terms by which we multiply each of our control points turn out to be (for reasons outside the scope of this article) the Bernstein polynomials. You can get these terms by raising the expression a(t) = (1 - t + t) to the nth power (of course this expression is simply a(t) = 1, so all terms always add to 1, an important result that we won’t go into here, but if you’re familiar with affine combinations you’ll surely see why this is pretty nifty).

The Bezier curve extends to any number of control points using this Bernstein polynomial result thusly:
P(t) = sum{k=0}{L}{P_k}{{B_k}^L}(t)
This is pretty cool, but not really used very widely because computing large polynomials is computationaly expensive and there are nicer alternatives for generating curves based on lots of control points. However you should certainly experiment with generating arbitrarily large Bezier curves.

Properties of Bezier curves
Bezier curves have some interesting properties that make them useful for certain purposes but less so for others. Bezier curves are approximation curves in that they don’t pass through all control points, however they will always pass through their endpoints (can you see why, in terms of Bernstein polynomials?). This makes them useful for designers who need to visually generate smooth curves but need well-defined start and finish points.

Bezier curves are also affine invariant (hinted at earlier). This means we can use familiar matrix transformation techniques to shift the control points and the curve will transform as expected – rotating, scaling and skewing the curve is as simple as transforming the control points. This is extremely useful in CAD and graphics design contexts.

Another interesting property of Bezier curves is the convex hull property. All points on a Bezier curve are within a convex hull defined by the control points of the curve. If you don’t know what a convex hull is then you probably don’t need to, but I assume this property is useful for someone.

Problems with Bezier curves
Bezier curves are C^1 smooth functions. This means that the Bezier curve can generally be differentiated once across the unit interval. C^1 smoothness means that, for example, a camera following the curve along the unit interval at a constant speed (with respect to t) will move at a continuous velocity (not a constant velocity, but continuous with no breaks or jumps). With C^1 smoothness acceleration is not guaranteed to be continous.

Typically to get Bezier curves with more than 4 control points we will join multiple Bezier curves together. In this case the complete curve will be C^0 smooth, which means that the curve is continuous but the first derivative is not guaranteed to be, ie there could be jumps in velocity. This can be mitigated with careful placements of the control points. For more information of curve smoothness, read this.

The biggest problem with Bezier curves is the lack of local control over the curve. What I mean by this is that getting precisely the shape of the curve you want is difficult because changing a single control point in a Bezier curve will affect the entire curve. You can see from the formulae that each point on the curve (except for the endpoints) is affected to some degree by the position of the other points. Maybe other curve/spline types will allow us greater local control over our curve…

In the next article I will cover C-Spline and B-Spline curves and the famous Catmull Rom spline. We’ll look at what it takes to work with C^2 smooth splines.

The (very rough) code I wrote for this article can be found here and here.

Problem opening OpenXml files after modifying them programmatically? Read this.

Microsoft has some pretty good documentation around modifying OpenXml files using the System.IO.Packaging API, but I found every time I tried to open an Excel file I had modified it was telling me the file was damaged and needed to be repaired. The repair process would go ahead and remove the files I had modified from the package altogether. If I looked inside the XLSX file before doing the repair my changes were there and seemed to be just fine.

Turns out I was using the wrong combination of MIME type and compression when saving my changes into the package. The correct MIME type for OpenXML spreadsheet package parts is

application/vnd.openxmlformats-officedocument.spreadsheetml.connections+xml

and the compression level should be

CompressionOption.SuperFast

Here is a code snippet I’m using the save the package part.

string partContents = // some XML here
Uri partUri = new Uri(/*uri of part in package*/, UriKind.Relative);
const string OpenXmlSpreadsheetMime = "application/vnd.openxmlformats-officedocument.spreadsheetml.connections+xml";
var newPart = package.CreatePart(partUri, OpenXmlSpreadsheetMime, CompressionOption.SuperFast);
if (newPart != null)
{
    using (var stream = newPart.GetStream())
    using (var writer = new StreamWriter(stream))
    {
         writer.Write(partContents);
    }
}

The other OpenXML MIME types have been documented by Doug Mahugh:

File Extension: Content Type:
docm application/vnd.ms-word.document.macroEnabled.12
docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
dotm application/vnd.ms-word.template.macroEnabled.12
dotx application/vnd.openxmlformats-officedocument.wordprocessingml.template
ppsm application/vnd.ms-powerpoint.slideshow.macroEnabled.12
ppsx application/vnd.openxmlformats-officedocument.presentationml.slideshow
pptm application/vnd.ms-powerpoint.presentation.macroEnabled.12
pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
xlsb application/vnd.ms-excel.sheet.binary.macroEnabled.12
xlsm application/vnd.ms-excel.sheet.macroEnabled.12
xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
xps application/vnd.ms-xpsdocument

Tagged

Importing non-.rdl reports using Microsoft Dynamics CRM Web Services

This post is mostly for my own benefit for the future, in case I need to do this again.

I spent a good hour trying to figure out how to import XLSX reports into CRM using the web services, mostly because it’s not clear how one should encode the report into the BodyBinary field of the Report object. For background details take a look at this article on MSDN, which gives most of the details except what I just mentioned.

The trick is to encode your report in a Base64 string, which doesn’t seem to be documented anywhere. Failing to do this will result in an OrganizationServiceFault with the following message:

The field ‘bodybinary’ contains one or more invalid characters.

Here is the correct code for creating the Report object

var report = new Report
                {
                    Name = ReportName,
                    BodyBinary = Convert.ToBase64String(File.ReadAllBytes(ReportFileName)),
                    FileName = ReportFileName,
                    LanguageCode = ReportLanguageCode,
                    ReportTypeCode = 2 // 2 == "Existing File"
                };
Tagged ,

How to include a large directory tree in a Wix installer using a Visual Studio Wix project

Wix is an awesome tool, but part of the challenge of using it is some dense and difficult to follow documentation. Today I spent a large amount of time trying to build an installer that included a very large directory tree without having to manually add each item. Hopefully this article will help get you through some of the problems I ran into while doing it.

My first approach was to use a HeatDirectory MSBuild task. When building a Wix project in Visual Studio this will cause heat.exe to be called in a pre-build step. For most cases this would work fine, but unforunately I couldn’t make this work. The problem I ran into was

Error 1 The system cannot find the file ‘..\..\Very\Long\File\Path\…\BusinessServiceHostConfigurationImplementation.csproj.FileListAbsolute.txt’ with type ”. light.exe

For the record, here is my original HeatDirectory build task. For most cases this will work perfectly fine.

<Target Name="BeforeBuild">
  <HeatDirectory
   DirectoryRefId="INSTALLLOCATION"
   OutputFile="source.wxs"
   Directory="..\..\MySourceDir"
   ComponentGroupName="SourceComponentGroup"
   ToolPath="$(WixToolPath)"
   PreprocessorVariable="var.SourcePath"
   AutogenerateGuids="true">
  </HeatDirectory>
</Target>

The issue here is that light.exe can’t handle file paths of longer than 255 characters. If your file paths are very, very long then you may well be out of luck here – as far as I know there’s no workaround for this problem.

Luckily for me I was working with folder paths that are long, but not absurdly long. I was just on the borderline so I figured if I copied the directory tree to some reasonable location like c:\st then I could work with light.exe.

The next issue here is that using these VS Wix projects the HeatDirectory tasks are executed even before pre-build steps. I elected to do the directory tree copy and heat.exe call in pre-build steps. To make the project build nicely without having to dick around too much with the Wix VS project I keep the target file in the project and overwrite it with my heat.exe target file. This way you can just check in an empty .wxs and things will work nicely.

Here is my pre-build event code that will copy the directory tree and execute Heat.exe

robocopy "$(SolutionDir)..\MySourcePath" %SystemDrive%\st /MIR
"%WIX%\bin\heat.exe" dir %SystemDrive%\st -dr INSTALLLOCATION -cg SourceComponentGroup -var var.SourcePath -ag -out "$(SolutionDir).\WixProject\source.wxs"

A couple of cool things you might want to take note of in case you don’t know:
%WIX%: This is an environment variable set by the Wix installer. It points to your Wix installation directory, so getting to that /bin folder is nice and easy.
-var: This is an argument to heat.exe. Without it your output .wxs file will point all elements at SourceDir in their Source attribute. Yeah, I’ll be fucked if I know why, but there you go. You can set this variable from your project properties:

Now that we have a new .wxs file in our solution just refer to this component group in your main Wix file using

<ComponentGroupRef Id="SourceComponentGroup"/>

in one of your features and you’re all done.

Yeah, this isn’t the most thrilling subject to read (or write) about, but it can be frustrating to figure this stuff out for yourself.

Tagged , ,

Ray Tracer in F# – Part III

In part 2 we created a working lighting model for our ray tracer and got a step closer to creating a realistic-looking sphere. Soon we will increase the realism further by applying reflections and creating shapes other than spheres. However, this article is going to be about multiplicity. We are going to extend our program to handle multiple spheres and multiple light sources. This will help teach you about working with important concepts in functional programming as well as learn about some of the F# collections libraries.

But before we get started I just want to tidy up the code a little, because it is just a bit badly formed.

At the moment a sphere has a diffuse colour and we are assuming that specular highlights are always white. While this generally gives good results, it’s not perfect. Because we will want to do more with a shape’s material than just lighting at some point (textures, translucency, etc) let’s break out material properties into another type. Also, the intersection type doesn’t really need a reference to the shape it intersects – it only needs to know the material (as well as the other information it has). So let’s change that too. After you make these simple changes to your types, go through the code and make the appropriate changes, including fixing the code that breaks and removing magic constants from the colorAt function.

type Material = { diffuseColor: Color; specularColor: Color; shininess:float }
type Sphere   = { center:Point3D; radius:float; material:Material }
type Intersection = { normal:Vector3D; point:Point3D; ray:Ray; material:Material; t:float }

//.... and constructing the sphere will look like
let material = { diffuseColor=Color(0.8, 0.1, 0.1); specularColor=Color(0.7,0.7,0.7); shininess=40.0 }
let sphere = { center=Point3D(1.0,1.0,1.0); radius=0.4; material=material }

For completeness, here is the colorAt function.

let colorAt intersections scene =
    let closest = List.minBy(fun i -> i.t) intersections
    let kd = closest.material.diffuseColor
    let ks = closest.material.specularColor
    let Ia = scene.ambientLight
    let L = norm (scene.light.position - closest.point)
    let Id = Math.Max(0.0, Vector3D.DotProduct(L, closest.normal))
    let V = scene.camera.position - closest.point
    let H = norm (L + V)
    let Is = Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,closest.normal)), closest.material.shininess)
    let ambient = kd * Ia
    let diffuse = kd * Id
    let specular = ks * Is
    ambient + diffuse + specular

I’m still not quite happy with the layout of this code. It’s weird that colorAt does the calculation to find the closest intersection point to the camera (or ray origin, more precisely). I’d prefer to move that into the calling function, so in the main nested loop

match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt intersects scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))

becomes

match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt (List.minBy(fun i -> i.t) intersects) scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))

And making the corresponding change in colorAt is simple:

let colorAt intersection scene =
    let kd = intersection.material.diffuseColor
    let ks = intersection.material.specularColor
    // etc...

Now we want to allow for multiple spheres. Change the definition for Scene so it looks like

type Scene = { camera:Camera; spheres:Sphere list; ambientLight:Color; light:Light }

Note the word ‘list’. This means that the ‘spheres’ field will carry a typed list of Sphere values.

This of course breaks our castRay function, which depends heavily on the Scene having just a single sphere. So currently the castRay function performs the task of calculating intersection points. Really what castRay should be doing is delegating this task to another function and simply collecting and returning the results. So let’s make some changes to castRay and also define a separate intersect function to handle the sphere intersect code.

let intersect ray sphere =
    let s = ray.origin - sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + sphere.radius*sphere.radius
    if discr < 0.0 then []
    else
        let normalAtTime t = norm (pointAtTime ray t - sphere.center)
        let (t1,t2) = (-sv + sqrt(discr), -sv - sqrt(discr))
        [ { normal = normalAtTime t1; point = pointAtTime ray t1; ray = ray; material=sphere.material; t=t1 };
          { normal = normalAtTime t2; point = pointAtTime ray t2; ray = ray; material=sphere.material; t=t2 } ]

let castRay ray (scene:Scene) =
    scene.spheres |> List.collect (fun x -> intersect ray x)

The intersect function merely returns a list of intersections with a sphere. We will be making this more generic to handle other shapes later, but for now this works fine. Incidentally, this is the reason we’re using a list instead of a tuple or something where handling only two results would have been more appropriate.

Notice the |> operator in castRay. This is called the pipeline operator and Dom Syme, the man behind the F# language, it is “perhaps the most important operator in F# programming.” (This quote is from Expert F# 2.0). Basically it allows us to “pipe” the value of some expression into another expression, which results in some pretty cool looking code. We’ll be using the pipelining feature more later on.

So this is pretty much all we need to render multiple shapes. Let’s add another shape to the scene and take a look.

    let material1 = { diffuseColor=Color(0.8, 0.1, 0.1); specularColor=Color(0.7,0.7,0.7); shininess=40.0 }
    let sphere1 = { center=Point3D(-0.1, -0.1, 1.8); radius=0.39; material=material1 }
    let material2 = { diffuseColor=Color(0.1, 0.1, 0.8); specularColor=Color(0.7,0.7,0.7); shininess=20.0 }
    let sphere2 = { center=Point3D(0.4, 0.3, 2.0); radius=0.3; material=material2 }

    let scene = { camera=camera; spheres=[sphere1;sphere2]; ambientLight=Color(0.2,0.2,0.2); light=light }

Of course you are free to use as many spheres as you like – experiment with procedurally creating arrays of spheres, it can be pretty fun.

We’re going to take a similar approach for using multiple light sources. Here we’re going to use nested inner functions (closures) to break our code into functions which we can hide from the calling code. colorAt now looks like

let colorAt intersection scene =
    // nested function for ambient color
    let ambientColorAt intersection scene =
        scene.ambientLight * intersection.material.diffuseColor

    // nested function for specular color
    let specularColorAt intersection scene =
        let ks = intersection.material.specularColor
        let V = scene.camera.position - intersection.point

        let specularAtLight light =
            let L = norm (light.position - intersection.point)
            let H = norm (L + V)
            let Is = light.color * Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,intersection.normal)), intersection.material.shininess)
            ks * Is

        List.sumBy(fun x -> specularAtLight x) scene.lights

    //nested function for diffuse color
    let diffuseColorAt intersection scene =
        let kd = intersection.material.diffuseColor

        let diffuseAtLight light =
            let L = norm (light.position - intersection.point)
            let Id = light.color * Math.Max(0.0, Vector3D.DotProduct(L, intersection.normal))
            kd * Id

        List.sumBy(fun x -> diffuseAtLight x) scene.lights

    let ambient = ambientColorAt intersection scene
    let specular = specularColorAt intersection scene
    let diffuse = diffuseColorAt intersection scene

    ambient + diffuse + specular

On a side note, I just noticed that throughout this series I have made an embarrassing mistake – I forgot to multiply the light intensity values with the surface reflectance, resulting in all lights behaving as though they were coloured 1.0,1.0,1.0. This is fixed in the preceding code.

You can see from this snippet that we are summing the values of all colours collected from all light to create specular and diffuse values. Setting up a second light source allows us to view the results.

    let light = { position=Point3D(1.0,4.0,-3.0); color=Color(0.5,0.5,0.5) }
    let light2 = { position=Point3D(-3.0,-2.0,-1.0); color=Color(0.8,0.8,0.8) }
    let scene = { camera=camera; spheres=[sphere1;sphere2]; ambientLight=Color(0.2,0.2,0.2); lights=[light;light2] }

Now there’s one more thing I want to do before I close off this entry. Right now other objects are ignored when we are summing up the lights. If we just check for intersections between each intersection point and the light source we are inspecting we can see if that intersection point should be in shadow. Seems simple enough, so let’s implement some simple shadowing.

So all we need to do is check whether a ray between the intersection point and the light source intersects ANY other shapes. If not, we know there’s a clear path to the light source. If so, that light source doesn’t contribute to the lighting at that intersection point. We can ignore ambient lighting, because it applies whether we are in shadow or not. Make the following modifications to the colorAt function.

let colorAt intersection scene =
    // check if we're in shadow
    let inShadow point light =
        let ray = { origin = intersection.point; direction = light.position - intersection.point}
        let intersections = castRay ray scene
        if intersections.Length = 0 then false
        else true
// ...then where diffuse light is calculated...
    let diffuseColorAt intersection scene =
        let kd = intersection.material.diffuseColor

        let diffuseAtLight light =
            let L = norm (light.position - intersection.point)
            let Id = light.color * Math.Max(0.0, Vector3D.DotProduct(L, intersection.normal))
            if inShadow intersection.point light then Color(0.0,0.0,0.0)
            else kd * Id            

        List.sumBy(fun x -> diffuseAtLight x) scene.lights

Although I’m sure technically we should do the same for specular light, in my experience it looks like ass with these sharp shadows, so I keep specular light around. I’ll let you judge for yourself which is better. At some stage we’ll implement more advanced lighting and shadowing and we won’t need to decide. So here’s the scene rendered earlier with our shadow model. On the left if with specular lighting shadowed, the right not shadowed. I prefer the right.

The banding effect on the shadows occurs because I included a third light to demonstrate that shadows can have multiple levels of darkness, depending on how many lights are blocked.

In the next article we will finally start actually ray tracing, which means reflections! We’ll get to play with recursion and we’ll also start using shapes other than spheres. The source code so far is available here.

Tagged , , , , ,

Introducing Trip, an open-source UML editor for Eclipse

UML Editing in Trip

In my final year of university we were tasked with building a capstone project. The aim was a round-trip engineering tool based on UML as an Eclipse plugin. Round-trip engineering is the process of taking code, producing some kind of graphical (or otherwise) model of that code, manipulating it and being able to push your changes through to the underlying source code, and vice versa.

My group’s project was the most successful in the class, and I decided to continue working on it in an open-source capacity. I have done almost nothing since the final grading so far, but I have plans to continue work immediately.

Some of the code is kind of sketchy. We were learning a lot about design as we went so it’s kind of a dependency nightmare, but honestly it’s no worse than production code I run into every day at work. My initial plans for the project are to first improve the design. I intend to use modern design techniques like dependency injection and inversion of control to get the dependencies to a more manageable and testable state. I will also be incorporating many more unit tests and fixing some of the “unit tests” in the project which are actually more like integration tests.

I intend to use this project as a way to improve my own OO design skills, as well as learning more discipline in areas like TDD. I may use it to experiment with modern methodologies such as BDD, and I intend to dig deeper into the modelling platform on which it’s built and perhaps try my hand with a DSL, but that will happen later in the project.

Take a look at the project on BitBucket. Trip is licensed under the Eclipse Public License (EPL).

Trip is built on the Eclipse platform using the Graphical Editing Framework. Special thanks go to my wonderful class mates from Canterbury university, Karen Harris, Will Gittoes, Myse Elmadani, Peter Empson and Ed Robinson.

Tagged

Ray Tracer in F# – Part II

In the previous post we set up the bones of a ray caster and got it rendering a very basic “sphere” with no shading or anything. Today we’re going to make the sphere look more like a real sphere by applying a Phong shading model to it.

If we want to shade each pixel on the sphere correctly then we’re going to need to get some information about the point of intersection between the ray and the sphere. Before getting into code, we’ll just start with the mathematical description of our lighting model. This will be brief and I encourage you to investigate further if you have never done any kind of shading or lighting before.

We are going to be working with three different kinds of lighting:

  1. Ambient – ambient lighting is a general light that applies to every surface in the scene, regardless of orientation, shadows, etc. Ambient light has no source. You can think of ambient light as the base-line or “zero-point” of the scene’s lighting. It’s the amount of light that a portion of the scene will have when in complete shadow. Although there is no basis for ambient light in physics, without it things just look unrealistic. Perfectly black shadows look wrong.
  2. Diffuse – When light from a source strikes a surface, the surface reflects that light in the same quantity in every direction (think of this as a surface reflecting photons in perfectly random directions). The strength of a diffuse reflection is dependent on the angle of the surface with the light travel path. If a surface is perfectly perpendicular to the ray of light then the reflection will be total (100%). The reflectivity drops as the surfaces turns its face on an angle to the light, until it reaches 0% when the surface is 90′ to the light source (this will be described in more detail soon).
  3. Specular – Specular lights are those little highlights in a scene, particularly on curved surfaces. Like diffuse reflections, specular highlights are dependent on the angle of the surface, but in a different way. Specular lighting will be described in detail towards the end of this article.

The three kinds of light add together to form a fully lighted scene:

So let’s define a new data type in our program that represents an intersection between a ray and our sphere. For now we’re still working under the assumption that the scene will only ever have a single sphere. I find working iteratively like this is a good workflow for F#, especially with F# interactive.

type Intersection = { normal:Vector3D; point:Point3D; ray:Ray; sphere:Sphere; t:float }

An intersection has a surface normal, a point in space, an associated ray and a t value, which describes how far along the ray the intersection happens. We can just derive t from point (or vice versa) but I find it convenient to store both.

Now we’re going to want our castRay function to return Intersection objects instead of true or false. It should be capable of returning multiple intersections for a given ray. A sphere will usually have two intersections with a ray, and it’s not difficult to imagine shapes that will have more than two. So we’ll return a list of intersections.

In the previous article all we did for the ray-sphere intersection was compute the discriminant of a quadratic, but in order to find the exact point in space of the intersection we’ll need to solve the full equation for both solutions (if you don’t know the quadratic formula, read this). Here is the modified castRay function. I have also included a small utility function which we’ll make use of, pointAtTime, which just computes the point of some line given by p = a + bt for a given value of t.

/// Get the position of a ray at a given time
let pointAtTime ray time =
    ray.origin + time * ray.direction

let castRay ray (scene:Scene) =
    let s = ray.origin - scene.sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + scene.sphere.radius*scene.sphere.radius
    if discr < 0.0 then []
    else
        let normalAtTime t = norm (pointAtTime ray t - scene.sphere.center)
        let (t1,t2) = (-sv + sqrt(discr), -sv - sqrt(discr))
        [ (t1, { normal = normalAtTime t1; point = pointAtTime ray t1; ray = ray; sphere=scene.sphere });
          (t2, { normal = normalAtTime t2; point = pointAtTime ray t2; ray = ray; sphere=scene.sphere }) ]

Just for a sanity check let’s modify the guts of the nested for-loop that computes the image just to make sure that what we have got is still producing a correct image.

for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            match castRay ray scene with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> bmp.SetPixel(x,y,Color.Red)

The output is now the same image we were getting at the end of the previous article. Now for some shading. What we need to do is create a function that will transform a list of intersections into the correct colour value. The process for this will be

  1. Get intersection with the lowest t value (closest to the camera).
  2. Add the ambient, diffuse and specular lighting components for this ray on this sphere with this surface normal.
  3. Return the aggregate colour value

The fact that I can easily break the function into 3 parts like that suggests it shouldn’t be a single function, but we are going to refactor later so let’s just go for it.

Before we can light something, we need to give it a colour, so let’s give our sphere some colour. Because we’re going to be doing some operations on colours we’ll define our own colour type. The built-in .NET colours don’t have affordance for multiplying or adding colours together or multiplying colours by scalar values.

type Color(r: float, g: float, b:float) =
    member this.r = r
    member this.g = g
    member this.b = b
    static member ( * ) (c1:Color, c2:Color) =
        Color (c1.r*c2.r, c1.g*c2.g, c1.b*c2.b)
    static member ( * ) (c:Color, s:float) =
        Color (c.r*s, c.g*s, c.b*s)
    static member ( + ) (c1:Color, c2:Color) =
        let r = Math.Min (c1.r+c2.r, 1.0)
        let g = Math.Min (c1.g+c2.g, 1.0)
        let b = Math.Min (c1.b+c2.b, 1.0)
        Color (r,g,b)
    static member Zero = Color(0.0,0.0,0.0)

Here we’re defining a simple 3-tuple for RGB colour values where 0.0 <= colour <= 1.0. We're defining operations on these types in the same way you would in C# or other OO languages. We've defined a multiplication operation for both scalar and colour multiplication.

Quick note on spelling. I’m from New Zealand where we use mostly English spelling for words like “colour”. However in my code I use American spelling because I feel that is the de factor worldwide standard for programming.

One last thing to do before computing some shading here is to actually place a light in the scene. We’ll also set the scene’s ambient light. In this code the Light type is new, the Scene and Sphere types are modified.

type Sphere = { center:Point3D; radius:float; diffuseColor:Color }
type Light  = { position:Point3D; color:Color }
type Scene  = { camera:Camera; sphere:Sphere; ambientLight:Color; light:Light }

Also modify our scene declaration a bit

    // scene
    let light = { position=Point3D(0.0,3.0,4.0); color=Color(0.8,0.8,0.8) }
    let scene = { camera=camera; sphere=sphere; ambientLight=Color(0.2,0.2,0.2); light=light }

Okay, let’s do some light computation. We’ll start with the simplest one – ambient light. Ambient light is given by the expression
I = {k_d}{I_a}, ~ 0 <= k_d <=1
where I_a is the intensity of our ambient lighting in the scene and k_d is the diffuse reflectivity of the sphere’s material. Sometimes people model ambient light with k_a, giving objects a different diffuse and ambient reflectivity. In my opinion this case is rare enough that we can skip it for now.

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    Ia * kd

And a small modification to the nested for-loop to make this all work.

    for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            let intersects = castRay ray scene
            match intersects with
            | [] -> bmp.SetPixel(x,y,Color.Gray)
            | _ -> let color = colorAt intersects scene
                   bmp.SetPixel(x,y, Color.FromArgb(255, (int)(color.r*255.0), (int)(color.g*255.0), (int)(color.b*255.0)))


We should really create a new function to do the transformation between our Color type and the built-in one, but for now we’ll leave it. Running the program now results in the image on the right. This is the sphere lit only by our 20% ambient light.

Now for diffuse lighting. The diffuse portion is the diffuse colour of the sphere multiplied by the projection of a vector from the intersection point to the light source (normalized) on the normal vector (normalized) of the intersection point. In other words, the dot product of these two vectors. Mathematically, diffuse lighting is given by
I = {k_d}(L . N)I_d
where k_d is the material’s diffuse reflectivity (as before), L is the vector between the light source and the intersection point (light.position – intersection.point), N is the normal of the intersection point (both normalized) and I_d is the intensity of the light. (That’s a dot product between L and N, amazingly my maths library doesn’t seem to have a dot product character!).

So we add diffuse reflection and ambient together:

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    let Id = Math.Max(0.0, Vector3D.DotProduct(norm (scene.light.position - closest.point), norm closest.normal))
    let ambient = kd * Ia
    let diffuse = kd * Id
    ambient + diffuse

We’re getting there! See the image on the left. I’d like to take a moment to explain that List.maxBy call above, because it is quintessential functional programming. If you have never done functional programming before it probably looks a bit strange. Something to understand about this paradigm is that functions are first-class objects. This means that functions can be passed around as parameters and returned from other functions. Essentially we can treat functions as though they were data members in functional languages.

List.maxBy is a function that takes two parameters – the first parameter is another function which takes some object of the list type (T’) and returns some other object of type U’ with which we make the comparison. The second parameter is a list of type T’ and the return type is the T’ which has the maximum value of all of the U’s associated with the T’s in the list. In this case T’ is type Intersection and U’ is type float (the t value of our intersection). So we’re asking the List.maxBy function to give us the Intersection in this list which has the highest t value. Understand? It’s a little tricky if you’ve never seen it before, but it’s a very powerful technique.

All we have left to compute is the specular component and we have a complete lighting model. Specular light can be given as
I = {k_s}{(N . H)^alpha}I_s
where k_s is the specular reflectivity of the material (the same as ambient/diffuse reflectivity earlier, but usually a white colour), N is the surface normal at intersection (normalized), I_s is the specular intensity of the light source, alpha is a material constant (usually on the order or 100-500) and H is a vector called the “half-way vector”. H is the vector half way between the view vector (camera.position – intersection.point) and the light vector (light.position – intersection.point). I won’t go into detail about the geometry that makes this work because it has been covered all over the place (see Blinn-Phong shading model), but suffice it to say that this expression will give us nice-looking specular highlights :) You can vary the value of alpha to change the size and sharpness of the specular highlight – this value is often referred to as the “shininess” value of a material. So here is the modified code to include all three shading modes.

let colorAt intersections scene =
    let closest = List.maxBy(fun i -> i.t) intersections
    let kd = closest.sphere.diffuseColor
    let Ia = scene.ambientLight
    let L = norm (scene.light.position - closest.point)
    let Id = Math.Max(0.0, Vector3D.DotProduct(L, closest.normal))
    let V = scene.camera.position - closest.point
    let H = norm (L + V)
    let Is = Math.Pow(Math.Max(0.0, Vector3D.DotProduct(H,closest.normal)), 500.0)
    let ambient = kd * Ia
    let diffuse = kd * Id
    let specular = Color(1.0,1.0,1.0) * Is
    ambient + diffuse + specular

This is a little messy, but we’ll tidy up later! I have actually cheated here and just used pure white light for I_s. We will implement a material type in the next article.

Notice how when we add all of these shading methods together we get the expression
I = {k_a}{I_a} + {k_d}(L . N)I_d + {k_s}{(N . H)^alpha}I_s
When dealing with multiple light sources, this will become
I = {k_a}{I_a} + sum{m subset lights}{}({k_d}(L_m . N)I_dm + {k_s}{(N . H)^alpha}I_sm)
Which is the Phong shading model! (modified to use the Blinn-Phong specular method). We will implement this multiplicity (of lights and shapes) in the next article (now available). Stay tuned!

Source code so far.

Tagged , , , , ,

Ray Tracer in F# – Part I

In this series of posts I’m going to demonstrate how to implement a ray tracing engine in F#. The purpose of these posts isn’t just to build a ray tracer – instead, they are put together as a tutorial on the F# language. This program will demonstrate most (if not all) of F#’s major features and show you how to use standard F# patterns and idioms.

I don’t know how long this series will be at this stage. I chose to use a ray tracer for a couple of reasons. Firstly, I already implemented a basic ray tracer in F# for a university assignment last year (see image on right), so I have the skeleton of the code available to me already (although the final product will be much more polished). Another reason is that ray tracing is an excellent application of functional programming. It’s a process that is conducive to immutable data structures, side-effect-free functions and stateless programming. It’s also simple enough to be able to blog about, but complex enough to not seem like a “toy example” to demonstrate F# in.

You should be warned before going into this that there will be some mathematics. For the purposes of learning F# it’s not imperative that you understand the maths, but if you want to understand the ray tracer itself then it will certainly help. I will explain the ray tracing background to everything that I do, but not in excruciating detail, because I want F# to be the focus here.

So let’s begin. First of all, what is a ray tracer? Ray casting (not tracing) is the process of casting a “ray” from some imaginary eye point, through a screen and into a scene. Whatever the ray strikes in the scene is what we will display at the point on the screen where that ray passed through it. If we repeat this process at every point on the screen that will correspond to a pixel, then we can build an image of the scene. This image might help illuminate things.

These images come from Richard Lobb's ray caster demo, used with permission.


The eye point is the black sphere on the left. You can see a ray for each pixel on the “screen” passing from the eye point and through the center of each pixel. Where the ray intersects an object in the scene (everything behind the screen) the pixel that ray ran through becomes coloured appropriately. There will also be a default “background” colour for rays which intersect nothing in the scene.

I should note that some people refer to this as “reverse ray casting, which actually makes a bit more sense. Obviously in real life rays travel from light sources and into the eye. In our ray caster we do that backwards. Ray tracing is the logical extension of this idea when we take into account things like reflection and refraction. We will define this more rigorously when we get there, but for this iteration we will stick to just ray casting.

So that’s the extremely brief introduction to the geometry of ray casting, before we get into the code we need just one more very quick intro lesson, and that’s lighting. I encourage you to explore this further yourself because I’m going to be very brief here. We’re going to use a simple Phong illumination model (actually it’s a slightly modified Phong model known as Blinn-Phong) to light our objects. This takes into account the ambient, specular and diffuse components of light, and we’ll (eventually) be applying this to each of multiple light sources in a scene. The referenced Wikipedia articles are actually a pretty good start to understanding this.

So let’s get started. The very first thing we are going to do is just render a flat sphere with no shading at all. It’s not very impressive but getting something onto the screen ensures we have a decent understanding of the ray tracing basics.

Let’s set up some simple data types for our ray tracer.

open System
open System.Windows.Media.Media3D
open System.Drawing
open System.Windows.Forms

type Sphere = { center:Point3D; radius: float }
type Camera = { position:Point3D; lookAt:Point3D; lookUp:Vector3D }
type Scene  = { camera:Camera; sphere:Sphere }
type Ray    = { origin:Point3D; direction:Vector3D }

We’re going to start very basic, with a scene that has no lighting and just a simple sphere. These type declarations give us a jumping-off point. When I’m writing F# code, especially when I’m just getting started, I like to throw everything into F# interactive, just to give myself a sanity check that my types all look right. You can quickly do this by hitting ctrl-A to select all of your code and hitting alt-enter to execute it in F# interactive.

One annoying thing about the System.Windows.Media.Media3D.Vector3D type is that there is no built-in immutable way to normalize vectors. Calling the Normalize method will actually mutate the vector you call it on, so let’s quickly define our own functional version of normalization.

let norm (v:Vector3D) =
    let abs = sqrt (v.X * v.X + v.Y * v.Y + v.Z * v.Z)
    v / abs

Unfortunately for this ray tracer there’s a fair bit of annoying plumbing code to make things work. I keep this function at the bottom of the main source file. Most of what is in it is not difficult to understand, it’s just necessary.

do
    let width = 480
    let height = 320

    // Vertical and horizontal field of view
    let hfov = System.Math.PI/3.5
    let vfov = hfov * float(height)/float(width)

    // Pixel width and height
    let pw = 2.0 * System.Math.Tan(float(hfov/2.0))/float(width)
    let ph = 2.0 * System.Math.Tan(float(vfov/2.0))/float(height)    

    let box = new PictureBox(BackColor = Color.White, Dock = DockStyle.Fill, SizeMode = PictureBoxSizeMode.CenterImage)
    let bmp = new Bitmap(width,height)

    // sphere
    let sphere = { center=Point3D(1.0,1.0,1.0); radius=0.4 }

    // camera
    let camera = { position=Point3D(0.0,0.0,0.0); lookAt=Point3D(1.0,1.0,1.0); lookUp=Vector3D(0.0,1.0,0.0) }

    // scene
    let scene = { camera=camera; sphere=sphere }

    // set up the coordinate system
    let n = norm (camera.position - camera.lookAt)
    let u = norm (Vector3D.CrossProduct(camera.lookUp, n))
    let v = norm (Vector3D.CrossProduct(n, u))
    let vpc = camera.position - n

    for x in 0..(width-1) do
        for y in 0..(height-1) do
            let rayPoint = vpc + float(x-width/2)*pw*u + float(y-height/2)*ph*v
            let rayDir = norm (rayPoint - scene.camera.position)
            let ray = { origin = scene.camera.position; direction = rayDir }
            match castRay ray scene with
            | true -> bmp.SetPixel(x, y, Color.Red)
            | false -> bmp.SetPixel(x, y, Color.Gray)

    bmp.Save("output.jpg")

I’ll explain some of this, because it’s not all completely obvious, although there’s nothing particularly fancy in F# terms in here.
hfov andvfov define our horizontal and vertical fields of view. Vertical field of view is defined in terms of the horizontal field of view and the aspect ratio (determined by the width and height of our screen). This ensures a correct aspect ratio in our images.

pw and ph are pixel width and pixel height. Because we want to send our rays through the center of each pixel, we can’t think of pixels are infinitely small. Each pixel has some height (defined in terms of world coordinates) and width. Hopefully you understand enough trigonometry to see how we got these values in terms of the field of view (an angular measurement).

The next few lines define some really basic things like where the camera is and where it’s pointing, where our sphere is, etc.

Next is the really tricky part (if you have never done 3D graphics before). If you don’t understand how points and vectors interact in terms of addition and subtraction you may have some trouble understanding this. Basically what we’re after here is a set of three lines which are mutually perpendicular and a single (world) unit in length. These lines will define our camera’s coordinate system. So we start by defining an arbitrary line, n, as being the vector between the point the camera is looking at and the point where the camera is located, normalized. This gives us some vector as long as the camera is not located exactly where it is looking.

We then find a vector orthogonal to this one by taking the cross product of n and the camera’s look-up vector. Because a cross product gives a vector mutually perpendicular to the input vectors, this is guaranteed to be perpendicular to n. The constraint on this is that the camera must not be looking directly upwards, or u will be zero in magnitude and impossible to normalize.

To find a third orthogonal vector we simple take the cross product of the former two vectors and normalize. There we have it, three mutually orthogonal normalized vectors! Note also that n is negative in the direction that the camera looks. We use this to create vpc, or “view port center”, a point one unit along the camera look-at vector.

The nested for-loop construct should be fairly straight-forward (if not especially functional, but don’t worry, we’ll refactor this as we go). We’re filling in a bitmap grid so we’re just going to loop over each pixel in the grid. For each pixel we are calculating rayPoint, which is some arbitrary point on the ray, which allows us to calculate rayDir. Studying these two lines for a few minutes should allow you to see how this works – the maths isn’t terribly complicated. From this ray direction we can compose a ray our of the camera position and the ray direction. We then cast the ray, and if it hits we render the pixel red, otherwise black.

I’d like to take a small digression to explain that match syntax. This keyword allows for pattern matching, a very powerful technique in F#. We will get into more complex pattern matching scenarions later in this series. The basic idea is that you can pass some expression into a match function along with a series of pipe-separated options for possible matches. The code under the option that matches the expression first is the branch that will execute. This may sound somewhat basic (and with mine looking pretty much like an if-statement, it is in this case) but this actually facilitates some really cool stuff. If you just can’t wait until my next couple of posts, I highly recommend doing some Googling to find out what makes this so interesting.

So pretty much all we have left to do is the castRay function. Again I will post the code and then explain.

let castRay ray scene =
    let s = ray.origin - scene.sphere.center
    let rayDir = norm ray.direction
    let sv = Vector3D.DotProduct(s,rayDir)
    let ss = Vector3D.DotProduct(s,s)
    let discr = sv*sv - ss + scene.sphere.radius*scene.sphere.radius
    if discr < 0.0 then false
    else true

The gist of this is that if the ray intersects the sphere we will return true, otherwise false. For general ray-casting or ray-tracing this isn’t especially interesting, and we’ll change it later, but for now it allows us to at least render our sphere.

I’ll quickly go over the maths of this function because if you haven’t done a lot of geometry you’re probably wondering WTF is going on there. My crappy variable names probably aren’t helping.

We can define the ray parametrically as r(t) = r_0 + {r_d}t with r_0 = [x_0, y_0, z_0] and r_d = [x_d, y_d, z_d].
Further, we can define our sphere as the set of points [x_s, y_s, z_s] such that (x_s - x_c)^2 + (y_s - y_c)^2 + (z_s - z_c)^2 = {S_r}^2 where [x_c, y_c, z_c] is the sphere’s center point and S_r is the sphere’s radius. Solve by substituting the ray equation into the sphere equation to get
Ray:
x = x_0 + {x_d}t
y = y_0 + {y_d}t
z = z_0 + {z_d}t
Sub into the sphere equation for x, y and z
(x_0 + {x_d}t - x_c)^2 + (y_0 + {y_d}t - y_c)^2 + (z_0 + {z_d}t - z_c)^2 = {S_r}^2
if you expand this out you’ll find you get
At^2 + Bt + C = 0
A = {x_d}^2 + {y_d}^2 + {z_d}^2
B = 2(x_d (x_0 - x_c) + y_d (y_0 - y_c) + z_d (z_0 - z_c))
C = (x_0 - x_c)^2 + (y_0 - y_c)^2 + (z_0 - z_c)^2 - {S_r}^2

With a bit of inspection you should be able to see how this fits into our F# code above. If the ray direction is normalized then A will always be 1. You can see that B is the dot product of the ray’s direction and a vector between the ray origin and the sphere’s center (sv in the code) and C is the vector between the ray’s origin and the sphere’s center, dotted with itself (ss in the code). Google “sphere ray intersection” for more detail.

So now that we have an expression for the intersection points of a ray and a sphere, all we’re doing is seeing if the intersection actually exists by solving the quadratic, using the regular quadratic formula. In this case we’re just checking for the discriminant, because if it’s negative then there is no intersection, otherwise there are two. For now we don’t care where the intersections happened, just that they did.

Now we put all of that together and we get this:

Okay, so it’s not exactly thrilling, but it is a genuine ray caster. The next step is to add some lighting to make this look like a real sphere. The next article will be posted within a weekis now available. To whet your appetite, here is an image and a quick video of what the product will look like after just a few articles.

Source code so far

Tagged , , , , ,

Poor UI Design can Kill

On January 20, 1992, Air Inter Flight 148 crashed into the Vosges Mountains while approaching Strasbourg Airport in Strasbourg, France. 87 of the 96 crew and passengers on board were killed. So why am I writing about a horrible accident on my programming blog? Because of the cause of the crash.

Many factors came together to cause Air Inter Flight 148 to hit a mountain. However one of the primary causes should be of concern to anyone developing software or designing the UI over top of it. The Airbus A320 had the most advanced cockpit in the world. The majority of the gauges and instruments in front of the pilots in this machine are digital.

When the captain of the aircraft was approaching the runway through thick clouds, he thought he had set a descent angle of a mild 3.3 degrees for a gentle descent above the Vosges Mountains. However, unbeknownst to him the plane was descending at a rate of 3,300 feet per minute, much faster than the descent would have been at 3.3′. The aircraft struck a mountain top and exploded. Air accident investigators noticed the similarities between those two numbers – the pilot’s intended descent rate (3.3′) and the actual descent rate (3,300 fpm).

As it turns out, the airbus A320 used a single display for two different descent modes – Vertical Speed (VS) Mode and Flight Path Angle (FPA) mode. The display showed FPA as two digits separated by a decimal point, and VS as just two digits (and both had a minus sign where their values were negatives). To the right is an image of the display showing an FPA of -2.8′. I wasn’t able to find a picture of the display in VS mode, but you can see from the pictures in this study how similar the two displays are – nearly impossible to tell apart.

This simple design error led the pilot to believe he was in one mode while he was in another. It was not the only factor in the accident, but the crash almost certainly would not have happened had the aircraft’s user interface been more clear. This UI design error is one of a well-known class of errors called mode errors.

Despite the fact that anyone who has taken a semester in HCI knows that modes are a Bad Thing, they are still ubiquitous in today’s software, and they’re still as annoying as ever. Unfortunately it’s simply not easy to find alternatives to mode-based interactions sometimes. On the left is an image of Google Chrome in two different modes – regular and ‘private browsing’ mode. The difference between these modes is very subtle – a small picture with little contrast in an area of the window I never look. But the implications of this mode are not trivial. If you close a tab in ‘private browsing’ mode, you can’t simply ctrl-shift-T to bring it back from the dead. It’s gone forever. The browser will not remember your login details, your Amazon shopping cart or your browsing history. Obviously this is the whole point of private browsing, but it’s also significant that this mode is barely distinguishable from regular browsing mode unless you have already made the mistake of closing a tab you still wanted.

I faced my own problems with mode errors when building my university capstone project, a round-trip UML modelling tool. To draw a relationship between two classes, the user selects a relationship type from the palette on the right and then draws the relationship. But how should the tool behave after that? Is it better to immediately jump out of “relationship drawing mode” after the user draws the relationship? Or should the tool remain available for further use? Due to strict time and resource constraints we were never able to perform much usability testing, but from what I saw neither of those answers was correct. They were both awkward and they both led to errors frequently. The modal paradigm is simply broken beyond repair, in my humble opinion.

So I’m not exactly the first one to say it, but I guess the moral of the story is avoid modes whenever possible. Oh, and in case you’re wondering, the new Airbus A320 cockpit displays show Flight Path Angle as a two-digit number while Vertical Speed is shown with a 4-digit representation. It’s not a perfect fix, but it’s a heck of a lot better.

Tagged , ,

Writing Testable Web API Wrappers

Writing a wrapper for a web API seems like a pretty simple thing to do. I mean, most of the code is already written for you, all you are doing is adapting the response from some web sever into your own object model. In most cases your object model looks very similar to the server’s web service representation, so what’s difficult about this?

In fact there are several pitfalls and architectural decisions to be made when writing web API wrappers. I intend to write about some of these in the coming months, but today I’m going to talk about writing web API wrappers in a testable way, following a Test-Driven Development (TDD) approach to the problem. This basically means that before I write a lick of code, I must first write the tests to exercise the code.

My sample application is going to be very simple, with a single API method and a single test, against the Flickr API. It is not designed as a demonstration of how to structure an API wrapper, it is merely for the purposes of demonstrating the principles of this article.

Before we jump into the code, let’s carefully consider exactly what it is we want to test. Are we testing that the Flickr API gives us an XML response that looks like what we expect, given a valid input? NO!! We approach this problem under the assumption that the Flickr team has already thoroughly tested their API, and that given a valid input we will get the correct output. Do we test that our HTTP layer is in fact able to connect to the Flickr service and send a request? Again, no! Presumable whoever wrote the HTTP library we are using has tested this.

In preparing for this article I looked at a ton of REST API wrapper projects in the .NET, Ruby and Python communities, wrapping Twitter, Flickr, Chargify and others. Nearly all of them run “unit tests” that involve actual communication with the servers they are working against. This is not a unit test! These end-to-end tests are called ‘integration tests’ (or if they are very broad, ‘smoke tests’). They are valuable, but they are not unit tests. Unit tests need to be very fast and very reliable to run, in order to ensure developers run them frequently. Testing over a network connection and a server half way across the world is neither fast nor reliable, particularly when running hundreds of tests at a time.

So now that we have that rant out of the way, let’s look at how we structure our API wrapper in such a way that we can test it independently of the API itself. As I said, we’re using the Flickr API and I’m going to be calling a method that returns the public ‘favorites’ for a particular User ID, flickr.favorites.getPublicList. I’ll be working in C# with the Visual Studio testing tools, but the principles here apply to almost any language and framework.

Click to enlarge


First I create a new solution with a C# class library project and a unit test project. Don’t forget to reference the API project from the unit test project. After I delete the goo that comes with new projects and create a couple of my own classes my project looks like the image on the right.

Now we need to write the test. This has to happen before we write any code if we are following TDD principles. Let’s consider exactly what we want to test. We’re going to test a method on the Flickr class that should return a list of photos given a user ID. We are not going to test the HTTP layer or the service. This means that we need to mock up the service/HTTP stuff. Here is the code for our test:

using System.Collections.Generic;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using TestableApi;
using Moq;
using System.Xml.Linq;

namespace UnitTests
{
    [TestClass]
    public class FlickrTests
    {
        private const string ApiKey = "[My API key]";

        [TestMethod]
        public void GetFavouritesTest()
        {
            // Arrange
            string userId = "22860405@N02";
            string url = "http://api.flickr.com/services/rest/";
            var args = new List<KeyValuePair<string,string>>
            {
                new KeyValuePair<string,string>("method", "flickr.favorites.getPublicList"),
                new KeyValuePair<string,string>("api_key", ApiKey),
                new KeyValuePair<string,string>("user_id",userId)
            };
            var helper = new Mock<IWebHelper>();
            helper.Setup(q => q.HttpGet(url, args)).
                Returns(GetFavouritesResponse);

            Flickr flickr = new Flickr(helper.Object);

            // Act
            var result = flickr.GetFavouritesFor(userId);

            //Assert
            Assert.IsNotNull(result);
            Assert.AreEqual(1, result.Count);
            Assert.AreEqual(1, result.First().Id);
        }

        // Helpers
        private string GetFavouritesResponse
        {
            get
            {
                return new XDocument(
                    new XDeclaration("1.0", "UTF-8", "yes"),
                    new XElement("photos",
                        new XAttribute("page", "1"),
                        new XAttribute("pages", "1"),
                        new XAttribute("perpage", "100"),
                        new XAttribute("total", "1"),
                        new XElement("photo",
                            new XAttribute("id", "1"),
                            new XAttribute("owner", "22860405@N02"),
                            new XAttribute("secret", "abc"),
                            new XAttribute("server", "123"),
                            new XAttribute("farm", "1"),
                            new XAttribute("title", "title"),
                            new XAttribute("date_faved", "1200000000")))).ToString();

            }
        }
    }
}

I know this looks like a bit much for a unit test, but it’s really quite simple.

Arrange

This is where most of the action is happening. I’m using the Moq framework for mocking objects. If you haven’t used this before then it may look a little confusing. But read at the first few lines of GetFavouritesTest() like this: Given valid user ID, API endpoint URL and API arguments, my mock implementation of the IWebHelper interface should return valid XML. This is how we mock the service – we assume that given the right input we will get the expected output. The GetFavouritesResponse property is just some XML we would expect the server to send us given these arguments. So what we’re really testing is our class’s ability to transform this response into something useful, in this case a collection of Photo objects.

Act

This is nice and simple, we just call the method and store the output in our result variable.

Assert

And here we just check that the output is what we would expect, given the input.

So now we get to writing the actual code. I’m going to start with the Photo class, just to get it out of the way. Note that while doing true TDD you will be writing a test for Photo as well, before you create this Photo class. In this demonstration I’m going to skip this, and give a simplified version of Photo.

using System;
using System.Xml.Linq;

namespace TestableApi
{
    public class Photo
    {
        public Photo(XElement xmlNode)
        {
            Id = Convert.ToInt32(xmlNode.Attribute("id").Value);
            // other properties from XML
        }

        public int Id { get; set; }
        // all other Photo fields
    }
}

The Photo class takes an XElement argument and sets the properties as needed. Whether this is a good design or not is up for debate and is something I will be discussing in a future blog post, but it’s quick and it works, so it fits perfectly with the TDD mantra, Red-Green-Refactor.

Next we’re going to look at the IWebHelper interface.

using System.Collections.Generic;

namespace TestableApi
{
    public interface IWebHelper
    {
        string HttpGet(string url, IEnumerable<KeyValuePair<string,string>> args);
    }
}

In a proper system we would also have a POST method, but this works for now. Finally, Flickr class. For now we write just enough code to allow us to compile and run the test. We are expecting it to fail.

using System.Collections.Generic;
using System.Xml.Linq;

namespace TestableApi
{
    public class Flickr
    {
        private readonly IWebHelper webHelper;

        public Flickr(IWebHelper webHelper)
        {
            this.webHelper = webHelper;
        }

        public ICollection<Photo> GetFavouritesFor(string userid)
        {
            return null;
        }
    }
}

Notice how we are using dependency injection (in this case, constructor injection) to allow consumers of the API to inject their own implementations of our dependencies through the constructor. This is crucial to making your libraries testable. Without the ability to inject dependencies we are simply unable to provide mock services.

Now we run our tests, and as expected:

Cool! We have a failing test. Now it’s time to get to work in making it pass. Remember, in TDD we write just enough code to pass the tests, then STOP! In my opinion this discipline leads us to simple, easy-to-read code and saves lots of time in fluffing about writing code that is never fully exercised. I’ll just go ahead and paste in the complete Flickr class:

using System.Collections.Generic;
using System.Xml.Linq;

namespace TestableApi
{
    public class Flickr
    {
        private const string ApiKey = "[My API Key]";
        private readonly IWebHelper webHelper;

        // Uncomment this when we create and test the real concrete
        // WebHelper
        //public Flickr() : this(new WebHelper()){ }

        internal Flickr(IWebHelper webHelper)
        {
            this.webHelper = webHelper;
        }

        public ICollection<Photo> GetFavouritesFor(string userid)
        {
            var result = new List<Photo>();

            var args = new List<KeyValuePair<string,string>>();
            args.Add(new KeyValuePair<string,string>("method", "flickr.favorites.getPublicList"));
            args.Add(new KeyValuePair<string,string>("api_key", ApiKey));
            args.Add(new KeyValuePair<string,string>("user_id", userid));

            string resp = webHelper.HttpGet("http://api.flickr.com/services/rest/", args);
            var doc = XDocument.Parse(resp);
            foreach (var node in doc.Element("photos").Elements("photo"))
                result.Add(new Photo(node));

            return result;
        }
    }
}

A quick note about the constructors: notice how I made the injectable constructor internal. Whether you do this or not is up to you and will depend on the nature of your project and your dependencies. In this case, I don’t want to expose the IWebHelper interface to my library users and they will have no need to use this constructor. Clever readers may have noticed that the IWebHelper interface is publicly exposed. I have to do this so that the Moq mocking framework will be able to mock it in the test. If someone knows a way around this so that I can keep IWebHelper internal, I’d love to hear it. Also if you’re using C#, to make internal members available to the unit test project, simply add this line to your project’s AssmbleInfo.cs file:

[assembly: InternalsVisibleTo("UnitTests")]

(Replace “UnitTests” with your unit test project’s name).

Now when we run the tests:

:)

Summary and Remarks

First, a couple of remarks on this code. Yes, there absolutely should be more unit tests in this code. Obviously this is a demonstration project, so take it for what it is – not production code. Some people may also take issue with the granularity of the unit test I provided. This test is technically exercising more than one functional unit of work. It tests that our method actually calls into the web helper class, and it also tests that it handles the output of this class appropriately. I absolutely agree that this should be refactored and broken into more tests. The purpose of this article was to show how a project can be structured in such a way that it is testable, so again, take it for what it is.

To summarize, here are the steps we took to make our project nicely unit testable.

  • Identify exactly which parts of the system we want to test, and which parts we don’t
  • Where we are interacting with parts of the system that are not being tested, hide them behind interfaces
  • Provide affordance for your class library consumers to inject their own implementations of these dependencies, as seen in the Flickr class constructors above
  • In the test, mock these interfaces and provide appropriate output from the mock given the input. Don’t forget to run tests with mocks that behave correctly given bad input as well as good.

Hope this helps, thanks for reading.

Tagged , , ,