Skip to content

The Broken Java Set Interface

Finding flaws in the Java Collections framework is like shooting fish in a barrel. From an OO design point of view the framework is just plain bad. From the infamous “vector is a list” gaffe to consistent use of inheritance for implementation, it’s enough to send shivers down the spine of any object fanatic.

Today I want to post about an issue I came across in the Java collections hierarchy that I had never seen before. This is a trap that is easy to fall into, so I was surprised that I had never read about it in the many articles I have read about Java collections. The problem is simple and easy to understand, yet it is insidious enough to potentially lead you into a debugging rabbit hole. The problem arises from Oracle’s neglect of contracts (sorry Oracle, when you acquired Sun you acquired their baggage too!).

Java Set Implementations

The wonderful thing about OO design, abstraction and interfaces is the implicit guarantee that if I have some object that conforms to some interface, I don’t need to care what kind of object I’m dealing with. It doesn’t matter if I’m driving a Honda Accord or a Mac truck, when I turn the wheel clockwise I expect my vehicle to turn right accordingly. The fact that the Mac truck implements this through a huge steering box while my little Accord relies on a flimsy rack and pinion system doesn’t matter to me, the driver. The vehicle has exposed an interface to me – a steering wheel – which means they have agreed that it will behave like every other steering wheel I have used. Simple, elegant and the essence of object design.

So let’s take a look at the two main implementations of Java.Collections.Set. We have HashSet and TreeSet, two fundamentally different implementations with very different performance characteristics. All I know is that if someone gives me a Set, it better behave like every other Set I have used.

So what’s the problem here? The problem is, these Sets don’t behave the same, in a subtle but important way. Let’s explore by example. Inspect the following code and try to predict the outcome.

import java.util.*;

public class SetTest {

	Set<Thing> hashSet;
	Set<Thing> treeSet;

	public SetTest() {
		hashSet = new HashSet<Thing>();
		treeSet = new TreeSet<Thing>();

		Thing t1 = new Thing("Thing 1");
		Thing t2 = new Thing("Thing 2");

		hashSet.add(t1);
		hashSet.add(t2);
		treeSet.add(t1);
		treeSet.add(t2);

		System.out.println("Before t1.setString");
		System.out.println("HashSet contains t1: " + hashSet.contains(t1) + ", t2: " + hashSet.contains(t2));
		System.out.println("TreeSet contains t1: " + treeSet.contains(t1) + ", t2: " + treeSet.contains(t2));
		System.out.println();

		t1.setString("Different String");

		System.out.println("After t1.setString");
		System.out.println("HashSet contains t1: " + hashSet.contains(t1) + ", t2: " + hashSet.contains(t2));
		System.out.println("TreeSet contains t1: " + treeSet.contains(t1) + ", t2: " + treeSet.contains(t2));
		System.out.println();
	}

	public static void main(String[] args) {
		new SetTest();
	}
}

class Thing implements Comparable<Thing> {
	private String str;

	public Thing(String str) {
		this.str = str;
	}

	public void setString(String newStr) { str = newStr; }

	public int compareTo(Thing o) { return str.compareTo(o.str); }

	@Override
	public boolean equals(Object o) {
		// flawed equals method for simplicity
		if (o instanceof Thing) return ((Thing)o).str.equals(str);
		return false;
	}

	@Override
	public int hashCode() {
		// flawed hashcode method for simplicity
		return str.hashCode();
	}
}

For those who aren’t in the mood for reading code, here’s what’s happening. We have some class Thing, with a String field, str. This class identifies as its field, str. This means it must also hash on this field. There have been many words written on the theory of hashCode/equals, so I’ll leave it to you to Google around if you don’t know why it’s important to define these methods together.

We now instantiate two Thing objects and two Sets of Things, one HashSet and one TreeSet. The Things are added to each set, and we check that they are indeed in the sets. We then change the name of one Thing and check again. The result?

Before t1.setString
HashSet contains t1: true, t2: true
TreeSet contains t1: true, t2: true

After t1.setString
HashSet contains t1: false, t2: true
TreeSet contains t1: true, t2: true

The HashSet no longer thinks it contains the Thing! So why did this happen? It’s probably fairly obvious by now, but I’ll make it explicit. A HashSet uses a hash table to store values. It uses the hash algorithm to figure out if a value is contained in the table. This means that a HashSet is not only a Set, it’s also a Map, but the details of the Map are hidden behind the Set interface. This seems logical until you consider the contract for a Map.

Maps rely on key values that do not change. It is imperative that the key in a mapped object is immutable in order to find the object later. When we changed that str field in Thing, we implicitly altered the key (the hash code of the previous String). Unfortunately this change was not abstracted away properly in the Set. When we use the contains method we are asking the hash table for an object that is identical to an object already in the set, but that hashes differently to how it hashed when first added.

Incidentally this bug also allows us to add duplicate values to a HashSet.

The result of all of this is that we cannot rely on all Java Sets to behave the same. In our own code we could just start using TreeSets if we know we want to have mutable hashable values. But what if we are given a Set by some other piece of code? The whole point of OO encapsulation and information hiding is that we aren’t supposed to care how it’s implemented. Unfortunately we simply can’t behave like this in the Java Collections framework.

It should be noted that Sun/Oracle did seem to consider, at some point, the issues with mutable Set members. In the Set API they state

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

but then fail to follow through with anything at all in the HashSet API description. Regardless, this business of noting in your API, “by the way, depending on the implementation things might break” is not acceptable OO practice, in my opinion.

So what is the solution? If we were to re-implement the Set interface and implementations, what would we do differently? One hacky solution is to force the HashSet to perform a linear search on its hash table when contains() misses. This seems to solve the problem, but we lose the main advantage of a HashSet, O(1) adding, removal and containment checking. Is there a better solution to this problem? What if we were to rewrite the Set hierarchy from scratch, what would we do differently? I don’t have the answers right now, but it’s something I’ll be thinking about.

Tagged , , , ,

Implementing the Mandelbrot set in F#

Mandelbrot Set SmallOne of my favourite things to do when learning a new language (or graphics package for that matter) is to write an implementation of a Mandelbrot set viewer. So here’s my F# version, annotated to help out anyone who is trying to get to know this language.

I’m still quite new to F# so if you spot something that just looks plain wrong (I know how you functional sticklers can be!) then let me know!

This program will help you understand

  • Interacting with .NET in F#, particularly Windows Forms
  • Recursion in functional programming
  • Working with the new .NET 4 Complex built-in numeric type

An understanding of complex numbers is assumed.

I’m going to start with the core functions of this program – the functions that will determine which points are in the set and which aren’t. I apologize in advance for the code coloring, I haven’t been able to find a code coloring Wordpress plugin for F#. If you know of one, let me know! Thanks a lot to Christian Abildsø for suggesting a working F# syntax highlighter in the comments! Take a look at his blog for some more great F# posts.

// Recursively performs the transform Z -> Z^2 + c for the given number
// of iterations or until escapes the Z > |2| threshold
let rec transform (originalNum:Complex) (currentNum:Complex) maxIterations iterationNumber =
  if (iterationNumber = maxIterations) then 0
  else if (currentNum.Magnitude > 2.0) then iterationNumber
  else
    let sqr = Complex.Multiply(currentNum, currentNum)
    let newNum = Complex.Add(sqr, originalNum)
    let newIter = iterationNumber + 1
    transform originalNum newNum maxIterations newIter

// Returns whether or not the given complex number is in the set to the
// best possible accuracy we can get from the given number of iterations
let inMandelbrotSet (num:Complex) maxIterations =
  let iters = transform num num maxIterations 1
  if (iters = 0) then true
  else false

The inMandelbrotSet function takes a complex number and an int and outputs a boolean (Complex -> int -> boolean in Curried notation). You might notice I often prefer curried parameters rather than tuples – this is a personal preference and there really isn’t a good reason to choose one over the other. If you don’t know the difference then you should read this excellent blog entry.

Essentially all this function does is pass on the work of figuring out whether it’s in the set onto the transform function (Complex -> Complex -> int -> int -> int). transform recursively calls itself and checks if the value of the complex number “escapes” the threshold (I don’t want to turn this post into a tutorial on the Mandelbrot set, but suffice it to say that if you apply the Z -> Z^2 + c transformation an infinite number of times and at some point it gives a result |Z| > 2 then the value can not possibly be in the set). If it escapes it returns the number of iterations it took to escape, otherwise it returns 0.

So basically now we have a function, inMandelbrotSet that will tell us whether or not a complex number is in the set.

Let’s set up a Windows Form and a bitmap image to display the set.

// Setting up the UI components
let mainForm = new Form(Width = xSize, Height = ySize, Text = "Mandelbrot Set")
let box =
  new PictureBox
  (BackColor = Color.White, Dock = DockStyle.Fill,
   SizeMode = PictureBoxSizeMode.CenterImage)
mainForm.Controls.Add(box)

// Sets up the canvas and generates the set on it
let generateSet (width:int) (height:int) =
  let bmp = new Bitmap(width, height)
  let gfx = Graphics.FromImage(bmp)
  for x in 0..(width-1) do
    for y in 0..(height-1) do
      let c = complexCoordinates(float(x),float(y))
      if (inMandelbrotSet c 200) then bmp.SetPixel(x, y, Color.Black)
  bmp

xSize and ySize are constants we’ll define in a minute. generateSet is a function that creates a bitmap, builds the Mandelbrot set on it and returns the bitmap. The other stuff behind it is fairly self-explanatory, just setting up an ordinary .NET Windows form. Take a look at the constructor for the PictureBox – notice we can define values inline. This is pretty cool, it means that we can treat the PictureBox as an immutable object even though it really isn’t. Just helps to make things feel more “functional”.

Now basically the only thing left to do is define some constants and transform our Bitmap coordinates to numbers on the complex plane. Here’s the remaining code:

open System
open System.Drawing
open System.Windows.Forms
open System.Numerics

// Size of window and canvas
let xSize = 640
let ySize = 480

// Min and max complex coordinates to display on the canvas
let xMin = -2.0
let xMax = 1.0
let yMin = -1.2
let yMax = 1.2

// Size of each horizontal and vertical pixel in terms of complex numbers
let xFactor = (xMax - xMin) / float(xSize)
let yFactor = (yMax - yMin) / float(ySize)

// Transforms pixel positions into corresponding complex numbers
let complexCoordinates(x,y) =
  new Complex((x*xFactor + xMin), (y*yFactor - yMax))

The constants at the top of the file define the ratio of a pixel to a complex number on the plane – the xFactor and yFactor. This is for mapping bitmap pixels to complex numbers. Finally, complexCoordinates does the actual mapping, turning two integers into a complex number.

Finally we give the program an entry point:

// Kick start the program
[<STAThread>]
do
    box.Image <- ((generateSet xSize ySize) :> Image)
    Application.Run(mainForm)

That funny :> operator is a cast in F#. This particular version is an unchecked runtime cast. The arrow notation is a way of changing an object’s properties – generally something you don’t want to do in functional programming (where everything is immutable) but of course F# has to interact with the .NET framework, so it’s really a hybrid language.

Put it all together and this is what you get:

Mandelbrot Set in F#

Mandelbrot Set in F#

Tagged , , , ,

Salvaging a Bad Photograph in Photoshop

I was at the Northern end of Lewis Pass when I came across the scene. What I saw was an ancient and proud late-spring mountain towering over the valley. She looked almost angry that any man would dare scar her age old nest with a callous and rude roadway. The brooding sky lay a dramatic drape over her.

What I captured on the sensor of my camera… well, I’ll let the image speak for itself.

Original Image


Flat, uninspired and altogether disappointing. But the image I saw when I passed through stuck with me, and I decided it was time to have a crack at making my photograph reflect more accurately how I felt when I took it.

I’m going to start this with my perennial piece of advice – always shoot in RAW. I’m not going to go into detail about why this is because it has been discussed to death all over the internet. If you don’t know what RAW format is or don’t know why you should be using it, enroll yourself in Google University and find out – you’ll thank yourself for the results, trust me.

Now thankfully I had the presence of mind when shooting this scene to take two exposures. I could see that although my eye saw a dark and foreboding sky, that wasn’t going to be reflected in the final image due to the limited dynamic range of the camera’s sensor. So I shot one image to correctly expose the sky, and one image to correctly expose the road and mountains. This is not going to be an HDR image in the traditional sense, instead I am combining two separate images by manually blending them. On the right is the original image correctly exposed for the sky.

After doing the basic RAW conversion (very little was done in this step in this particular case so I won’t go into detail) I open both images in Photoshop. Don’t worry if you don’t have the latest and greatest CS4, all of the tools I’ll be using have been in PS for several years (except Smart Sharpen at the end, but you can substitute Unsharp Mask). I select the entire light image (Ctrl+A) and paste it on top of the dark image, so it’s on a new layer (see left). I create a new Layer Mask for this new layer (see right). This mask allows me to use a brush to determine which parts of this layer are actually visible on top of the layer beneath.

I select the layer mask by clicking on the white box that appears next to the layer thumbnail and simply start drawing on the canvas with the Brush tool. I use a large brush with zero hardness to ensure that the edges of my mask can’t be easily picked out in the final image. Jobs like this are most easily done with a drawing tablet (I personally use a Wacom Bamboo) but you can get by without one. In fact my tablet was not available for me to use when I made this image, so I did it all with just a mouse. After masking I have an image that looks like this:

and with a mask that looks like this:

I want to pull a little more contrast out of the sky, so I inserted a new Levels Adjustment Layer in between the background and new layer (Layer -> New Adjustment Layer -> Levels…). The name of the game here is non-destructive processing. We try to do everything in layers, and this allows us to make changes to the final product without affecting the original image. An adjustment layer is a layer that makes certain adjustments to all visible layers beneath it. The advantage is that the adjustment layer can be hidden or deleted and the original images remain intact. I chose a Levels adjustment so that I could change the output white, mid and black levels with respect to the input levels, thereby increasing contrast and effectively “stretching” the dynamic range of the sky. On the right is my levels adjustment.

Here’s how the levels dialog works. That graph represents the brightness of the image – on the X axis are levels of brightness, from 0 to 255 (where 0 is black, 255 is white). The Y axis gives the number of pixels at that brightness level. So on the histogram on the right you can see we have a bunch of pixels at around 20-30 units of brightness, and another peak at around 170-190 units. The little sliders beneath the graph allow us to change the “black point” and “white point”. If we slide the left (black point) slider to the right then anything to the left of the slider becomes black, and similarly for white. Then the part of the histogram between the sliders is essentially stretched across the full 256 levels. The mid point does something similar with the middle of the histogram, and although it’s not as intuitively obvious how it will behave I encourage you to play with it to see how it affects contrast. Remember I’m using this layer just above the background so it will only affect the sky in this image.

My next step is to create a new layer at the top of this image. My goal here is to add more depth to the photograph and use lighting effects to draw the viewer in and also to try to give the scene that overbearing look that made me feel positively tiny when I was there. I use subtle light tricks to give the mountains a sense of scale that wasn’t present in the original image. I will use a special type of layer called an Overlay Layer. I click Layer->New Layer… and in the dialog box under Mode I select Overlay. This opens an option below the list box that says “Fill with Overlay-neutral color (50% gray)”. Tick this box. And overlay layer is a layer that affects the layers beneath it by changing the brightness of the lower layers depending on the brightness of the overlay layer. So areas where the overlay layer is brighter than 50% gray become brighter, and conversely for darker areas. This is why a 50% gray color is overlay-neutral, and why creating this layer has no effect on the final image at first.

We want to be relatively subtle with our effects on the overlay layer. A heavy hand here leads to an image that is so blatantly “Photoshopped” that that’s all people will see in your product. I use the dodge and burn tools (on the toolbar, hotkey O) with an exposure around 3-7% to change the overly layer. I have to be sure I’m actually working on the overlay and not on the image itself – remember, I’m working non-destructively here (and besides, the dodge and burn tools won’t give the look I’m after if applied to the image itself). Two useful hotkeys here are ‘[' and ']‘, which change the size of your brush tools. Again, this job is much easier with a tablet but it’s not totally necessary.

I use dodge to make the image lighter in certain places – just above the road and on the portion of the mountain on the upper right third of the image. I burn other parts of the mountain to make them darker. By carefully choosing where to dodge and burn I can give the image a sense of depth and scale. It’s mostly a matter of experience and experimentation that gives the knowledge of where to dodge and burn. I never studied aesthetics so I can’t quantify what it is that makes an image look “deep”, I just know from doing it many, many times. If you need a concrete explanation of how to do this you’ll need to speak to some BA student in a black turtle-neck, because I fly this by the seat of my pants.

I want to add a couple of specific touches on this overlay layer as well. I burn the road heavily to give it a “black top” look, instead of the bright and washed out look it had in the original. I just think it looks cool. I also use a large burn brush to give the image a vignette, an effect that is used to draw the viewer into an image in a kind of “tunnel vision” way. Please, be gentle with the vignette brush. A harsh vignette is highly distracting and in my opinion just looks silly. This image is about as far as I’d ever go with it. After all of these steps I’m left with this image

and this overlay layer

My final step is sharpening. Images pulled from RAW are generally quite soft. This is because a good RAW image has no processing done to it, and it’s a good thing. It means that we, the photographers, have full control over the sharpening step, rather than some mysterious algorithm designed by some software developer of indeterminate skill doing the work for us. I’m a big fan of the Photoshop Smart Sharpen (I think it has been around since CS3). I don’t usually even bother with the advanced settings as I find that changing just the Amount and Radius settings give me as fine a degree of control as I need in most cases. Filter -> Sharpen -> Smart Sharpen. For this image I used something like Amount = 150%, Radius = 1.4 pixels (I’m sorry, I can’t remember the exact settings but it’s around here). Experiment to find the settings that work for your picture. Generally if you have very fine, small details you’ll want to keep the radius very low (1 pixel or less in some cases), while having large details will allow you to bump the radius quite high, sometimes above 2.5 pixels.

Again I stress non-destructive editing, so before I apply any sharpening I hit Ctrl+Alt+Shift+E to create a new layer on top that is a merge of all visible layers below it. I then apply the sharpening to this new layer. Here’s another view of the original image, followed by the final product. I hope this article was helpful, and encourage you to leave a comment if you want to discuss things further.

Drama at Lewis Pass

Drama at Lewis Pass

Tagged , ,

Using the Google Reader API – Part 3

Google Reader API series

In part 3 of this series on using the Google Reader API I will discuss using the POST API to edit your Google Reader data. Although the REST architecture and HTTP in general doesn’t enforce it, Google appears to have chosen to make any commands which alter the state of your Reader account POST (as opposed to GET) requests.

In Part I I went over how to program for the HTTP POST requests so for now I’ll just go over the API. Arguments to POST calls are stored in a separate message body, unlike GET arguments which are stored in the URL exclusively. This is the main differentiating factor between the two types of calls. All POST calls to the Google Reader API take a single GET argument in the URL string – the client argument, which is the same as the client argument seen in all of the API calls in Part II of this series – just use a unique string identifying your own software package if in doubt.

For the purposes of this tutorial I’ll split the calls into a number of categories.

  • Subscription editing – adding, editing and removal of subscriptions and their folders/tags
  • Item editing – editing of items, such as adding/removing tags and stars, “liking” items, sharing (broadcasting) items, keeping items unread, etc.
  • Email – Emailing items using your Google Gmail account


Subscription Editing

URL: http://www.google.com/reader/api/0/subscription/edit?client=[your client]

POST Data

Argument Description
s The feed identifier. This is the URL of the feed preceeded by feed/, for example feed/http://blog.martindoms.com/feed/ for this blog’s RSS feed.

ac The action to take on this feed. Possible values are subscribe to subscribe, unsubscribe to unsubscribe and edit to edit the feed.

t The title to give the feed, only relevant in subscribe and edit actions.

r A label to remove from the feed. This is in the format user/[UserID]/label/[LabelName]. As usual UserID can be replaced with a single dash. For example, to remove the label “MyLabel” you’d use the string user/-/label/MyLabel

a A label to add to the feed. See the notes for the r argument above.

T Your token (see Part I if you’re unsure of what this is). This is an argument in every POST call.

Examples

Adding a feed – let’s add this blog’s main RSS feed to our subscriptions, but give it the title “Mr Doms’ Blog”. We’ll POST the following data to the relevant URL (at the top of this section). You’ll need to URL encode this data before posting but I’ve left it unencoded for ease of reading. Let’s assume I’ve already got myself a token which looks like abcde.ABCD1234abc
s=feedhttp://blog.martindoms.com/feed/&ac=subscribe&t=Mr Doms’ Blog&T=abcde.ABCD1234abc
Editing a feed – we want to add this feed to our “Infrequently Updated Blogs” folder on Google Reader to keep things organized. POST this data:
a=user/-/label/Infrequently Updated Blogs&s=feed/http://blog.martindoms.com/feed/&ac=edit&
T=abcde.ABCD1234abc

Removing a feed – finally, let’s remove this blog from our subscriptions out of frustration at Martin’s lack of regular updates.
s=feed/http://blog.martindoms.com/feed/&ac=unsubscribeT=abcde.ABCD1234abc


Item Editing

URL: http://www.google.com/reader/api/0/edit-tag?client=[your client]

POST Data

Argument Description
a A new tag to add. This can take one of several formats depending on the type of tag you’re adding to the item. See the notes at the bottom of this table for details.
r Remove an existing tag. This tag can take one of several formats depending on the type of tag you’re removing from the item. See the notes at the bottom of this table for details.
async Can be either true or false. This appears to be a utility argument for Google’s own website, as it doesn’t seem to make a difference to my API calls.
s The feed that this item belongs to. See the previous section for information on the format of this argument.
i The item you want to tag. This is in the format tag:google.com,2005:reader/item/[item identifier].
T As usual this is your token. See Part I if you’re unsure about what this is.

Tag identifier formatting – to add or remove a tag on an item you need to use a tag identifier string in the a or r argument. Google doesn’t seem to have a fully consistent way of formatting these tag identifiers, but here’s what I’ve figured out.

  • Add/remove a custom tag (item label): user/-/label/[NewTag]
  • Mark an item read or unread: user/-/com.google/read
  • Star an item: user/-/state/com.google/starred
  • Add or remove a “keep unread” tag: user/-/state/com.google/tracking-kept-unread
  • “Like” or un-”like” an item: user/-/state/com.google/like
  • Share or unshare an item with Google Reader friends: user/-/state/com.google/broadcast

Examples

To use any tag features you’ll need to retrieve an item identifier. Presumably if you’re displaying items in your software then you’ve already got an item identifier. If you don’t know how to retrieve these items, see Part II for the listing API. For the purpose of these examples I’ll be working with an item that represents Part I of this series, which happens to be item 0816b6dbac1d768d. We’ll also be assuming a token string abcde.ABCD1234abc as before.
Add a star – to add a star we send the POST data a=user/-/state/com.google/starred&async=true&s=feed/http://blog.martindoms.com/feed/&i=tag:google.com,2005:reader/item/0816b6dbac1d768d&T=abcde.ABCD1234abc
Mark as unread – we want to remove the “read” tag to mark an item as unread. Use this POST data: a=user/-/state/com.google/read&async=true&s=feed/http://blog.martindoms.com/feed/&i=tag:google.com,2005:reader/item/0816b6dbac1d768d&T=abcde.ABCD1234abc

The item tagging doesn’t get any more complicated than that. Easy!


Emailing

URL: http://www.google.com/reader/email-this?ck=[CurrentUnixTime]&client=[YourClient]
(current Unix time argument is optional)

POST Data

Argument Description
i The item identifier. See the previous section for details.
emailTo The email addresses to send the item to. You can send to multiple recipients. These addresses are in the format , ,
comment The email body
subject The email subject
ccMe Whether to send the email to your own GMail account also. Can be true or false.
T Your token. See Part I if you’re unsure of what this is.

Example

Here’s some example POST data to send an email about Part I of this series of articles to the address example@example.com
i=i=tag:google.com,2005:reader/item/0816b6dbac1d768&emailTo= <example@example.com>, &comment=Check this out&subject=Google Reader API&ccMe=false&T=abcde.ABCD1234abc

Tagged , , , , ,

Using the Google Reader API – Part 2

Google Reader API series

In this post I will discuss the listing API for Google reader. This encapsulates all of the calls you’ll need to list items and feeds in various ways. The table below summarizes the API URLs you’ll be using. Simply follow the table from left to right to find your target API handle. Click on a terminal item to see a summary of what it does and how to use it.

Note that for the user items I have found that substituting a dash for the user number works in every case I’ve tested, because a user needs to be logged in with a valid token to use these calls anyway.

If this is your first foray into using the Google Reader API then you’ll want to read my previous post which describes how to acquire a valid token and cookie, and make calls to the API. Unlike part 1 of this series, this part is totally language agnostic.

URL structure: http://www.google.com/…

/reader/api/0/
unread-count
user-info
stream/contents/
feed/[feedurl]
user/[userNumber]/
label/[labelName]
state/com.google/
reading-list
starred
broadcast
created


unread-count

Gets a list of feeds for the current user and the number of unread items in each feed. Also returns a list of labels that the user has created.
Parameters

  • allcomments=[true|false] : I’m not sure what this parameter does, please leave a comment if you figure this one out.
  • output=[json|xml] : Determines the format of the response. Default: xml
  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching
  • client=[your client] : You can use the default Google client (scroll), but it doesn’t seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I’d advise using your own unique string to identify your software.

Sample Output (XML)

GET http://www.google.com/reader/api/0/unread-count?allcomments=true&output=xml&ck=1255643091105&client=scroll
<object>
    <number name="max">1000</number>
    <list name="unreadcounts">
        <object>
            <string name="id">feed/http://www.pwop.com/feed.aspx?show=dotnetrocks</string>
            <number name="count">3</number>
            <number name="newestItemTimestampUsec">1255568508813466</number>
        </object>
        <object>
            <string name="id">feed/http://rssnewsapps.ziffdavis.com/audioblogs/crankygeeks/cg.audio.xml</string>
            <number name="count">1</number>
            <number name="newestItemTimestampUsec">1255578279704094</number>
        </object>
        <object>
            <string name="id">user/01723985652832499840/label/myLabel</string>
            <number name="count">4</number>
            <number name="newestItemTimestampUsec"> 1256030778673945 </number>
        </object>
    </list>
</object>

Sample Output (JSON)

GET http://www.google.com/reader/api/0/unread-count?allcomments=false&output=json&ck=1255643091105&client=myApplication
{
"max":1000,
"unreadcounts":
    [
    {"id":"feed/http://www.pwop.com/feed.aspx?showu003ddotnetrocks","count":3,"newestItemTimestampUsec":"1255568508813466"},
    {"id":"feed/http://rssnewsapps.ziffdavis.com/audioblogs/crankygeeks/cg.audio.xml","count":1,"newestItemTimestampUsec":"1255578279704094"},
    {"id":"user/01723985652832499840/label/myLabel","count":4,"newestItemTimestampUsec":"1256030778673945"}
    ]
}


user-info

Gets information on the current user
Parameters

  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching
  • client=[your client] : You can use the default Google client (scroll), but it doesn’t seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I’d advise using your own unique string to identify your software.


Sample Output

GET http://www.google.com/reader/api/0/user-info?&ck=1255643091105&client=myApplication
{
    "userId":"01723985652832499840",
    "userName":"username",
    "userProfileId":"123456789123456789123",
    "userEmail":"username@gmail.com",
    "isBloggerUser":true,
    "signupTimeSec":1234515320
}


feed/[feedurl]

Gets items belonging to a particular feed.
Parameters

  • ot=[unix timestamp] : The time from which you want to retrieve items. Only items that have been crawled by Google Reader after this time will be returned.
  • r=[d|n|o] : Sort order of item results. d or n gives items in descending date order, o in ascending order.
  • xt=[exclude target] : Used to exclude certain items from the feed. For example, using xt=user/-/state/com.google/read will exclude items that the current user has marked as read, or xt=feed/[feedurl] will exclude items from a particular feed (obviously not useful in this request, but xt appears in other listing requests).
  • n=[integer] : The maximum number of results to return.
  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching.
  • client=[your client] : You can use the default Google client (scroll), but it doesn't seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I'd advise using your own unique string to identify your software.


Sample Output

GET http://www.google.com/reader/api/0/stream/contents/feed/http://astronomycast.com/podcast.xml?ot=1253052000&r=n&xt=user/-/state/com.google/read&n=20&ck=1255646178892&client=myApplication
{
    "id":"feed/http://astronomycast.com/podcast.xml",
    "title":"Astronomy Cast",
    "continuation":"CLbwxoSAsZsC",
    "self":[{"href":"http://www.google.com/reader/api/0/stream/contents/feed/http://astronomycast.com/podcast.xml?"}],
    "alternate":[{"href":"http://www.astronomycast.com","type":"text/html"}],
    "updated":1255415774,
    "items":[
    {
        "crawlTimeMsec":"1255415774867",
        "id":"tag:google.com,2005:reader/item/bfbd19b1936fcccf",
        "categories":
        [
            "user/01723985652832499840/state/com.google/reading-list",
            "user/01723985652832499840/state/com.google/fresh"
        ],
        "title":"Ep. 156: Famous Stars","published":1253491200,
        "updated":1253491200,
        "enclosure":
        [
            {
                "href":"http://feedproxy.google.com/~r/astronomycast/~5/4TPW83-0h4Y/AstroCast-090921.mp3",
                "type":"audio/mpeg","length":"17610000"
            }
        ],
        "alternate":
        [
            {
                "href":"http://feedproxy.google.com/~r/astronomycast/~3/89_Ck5HN--w/",
                "type":"text/html"
            }
        ],
        "mediaGroup":{"content":[{"url":"http://feedproxy.google.com/~r/astronomycast/~5/4TPW83-0h4Y/AstroCast-090921.mp3"}]},
        "summary":
        {
            "direction":"ltr",
            "content":"This week we're going to talk about famous stars. But not those boring ..."
        },
        "author":"info@astronomycast.com (Fraser Cain \u0026 Dr. Pamela Gay)",
        "likingUsers":[],
        "comments":[],
        "annotations":[],
        "origin":
        {
            "streamId":"feed/http://astronomycast.com/podcast.xml",
            "title":"Astronomy Cast",
            "htmlUrl":"http://www.astronomycast.com"
        }
    },
    {
        "crawlTimeMsec":"1255415774867",
        "id":"tag:google.com,2005:reader/item/008d8a870f2a9dee",
        ...
    }]
}

Output summary
This request returns an object that contains:

  • Feed id string
  • Feed title string
  • Continuation string - what is this? Leave a comment if you know
  • Self - an array of objects which contain a link to this list
  • Alternate - an array of objects which contain non-Google links to the feed (usually the author's RSS)
  • Updated timestamp
  • Items - an array of item objects which contain
    • Crawl time - when the item was crawled by Google Reader
    • Id - a unique identifier for the item
    • Categories - an array of categories to which this item belongs. These appear to be user-specific
    • Item title string
    • Updated timestamp
    • Enclosure - an array of enclosure objects. These are used mostly for podcasts/videocasts, they contain media items enclosed in the feed item. Enclosure items are composed of
    • An array of alternate links to the item
    • Media Group - appears to be an object storing an array of direct links to the enclosure media
    • Summary - an object containing a description of the feed item and a direction (ltr or rtl) for reading. Is this direction a byte ordering hack? Someone let me know in the comments.
    • Author string
    • An array of users who "like" the item
    • An array of comments on the item
    • An array of annotations on the item
    • An origin object which stores
      • The stream id - this is the feed/[feedurl] that Google uses to identify the feed
      • The feed title string
      • The HTML url to the feed's homepage


label/[labelName]

Lists all items from a given label.
Parameters

GET http://www.google.com/reader/api/0/stream/contents/user/-/label/label?ot=1253066400&r=n&xt=user/-/com.google/read&n=20&ck=1255660064860&client=myApplication

Sample Output - See feed list


reading-list

List all items in the user's reading list. This is a list of all of the items from all of the user's feeds, excluding items depending on the xt (exclude target) parameter.
Parameters

GET http://www.google.com/reader/api/0/stream/contents/user/-/state/com.google/reading-list?ot=1253066400&r=n&xt=user/-/state/com.google/read&n=20&ck=1255660536125&client=myApplication

Sample Output - See feed list


starred

List all items that the user has marked as "starred"
Parameters

  • n=[integer] : The maximum number of results to return.
  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching.
  • client=[your client] : You can use the default Google client (scroll), but it doesn't seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I'd advise using your own unique string to identify your software.
GET http://www.google.com/reader/api/0/stream/contents/user/-/state/com.google/starred?n=20&ck=1255660749447&client=myApplication

Sample Output - See feed list


broadcast

List all items that the user has chosen to share publicly.
Parameters

  • n=[integer] : The maximum number of results to return.
  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching.
  • client=[your client] : You can use the default Google client (scroll), but it doesn't seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I'd advise using your own unique string to identify your software.
GET http://www.google.com/reader/api/0/stream/contents/user/-/state/com.google/broadcast?n=20&ck=1255660975267&client=myApplication

Sample Output - See feed list


created

List all notes that the user has created.
Parameters

  • n=[integer] : The maximum number of results to return.
  • ck=[unix timestamp] : Use the current Unix time here, helps Google with caching.
  • client=[your client] : You can use the default Google client (scroll), but it doesn't seem to make a difference. Google probably uses this field to gather data on who is accessing the API, so I'd advise using your own unique string to identify your software.

Sample Output

GET http://www.google.com/reader/api/0/stream/contents/user/-/state/com.google/created?n=20&ck=1255661349906&client=myApplication
{
    "id":"user/01723985652832499840/source/com.google/post",
    "title":"\"post\" via iamaelephant in Google Reader",
    "self":
    [
        {"href":"http://www.google.com/reader/api/0/stream/contents/user/01723985652832499840/state/com.google/created?n\u003d20\u0026ck\u003d1255661572704\u0026client\u003dscroll"}
    ],
    "author":"iamaelephant",
    "updated":1255661564,
    "items":
    [
        {
            "crawlTimeMsec":"1255661564095",
            "id":"tag:google.com,2005:reader/item/2dc5e875a623fe5e",
            "categories":
            [
                "user/01723985652832499840/source/com.google/post",
                "user/01723985652832499840/state/com.google/broadcast",
                "user/01723985652832499840/state/com.google/read",
                "user/01723985652832499840/state/com.google/fresh"
            ],
            "published":1255661564,
            "updated":1255661564,
            "related":
            [
                {"href":"http://www.google.com/reader/shared/01723985652832499840","streamId":"user/01723985652832499840/state/com.google/broadcast"}
            ],
            "alternate":
            [
                {"href":"http://www.google.com/reader/item/tag:google.com,2005:reader/item/2dc5e875a623fe5e","type":"text/html"}
            ],
            "content":
            {
                "direction":"ltr",
                "content":"this is the note text"
            },
            "author":"iamaelephant",
            "likingUsers":[],
            "comments":[],
            "annotations":[],
            "origin":
            {
                "streamId":"user/01723985652832499840/source/com.google/post",
                "htmlUrl":"http://www.google.com/reader/shared/01723985652832499840"
            }
        }
    ]
}

That's an awful lot of information for a little note, but most of it corresponds closely to item information, so if you're unsure of how to interpret a field, have a look at the output summary for the feed/[feedurl] call.

Phew! That mostly does it for the listing methods. There are some other minor calls I've left out for brevity, but most of what you'll want to achieve is in here, and I intend on releasing complete documentation when I have finished my .NET Google Reader API library. In my next post I'll discuss the POST methods so you can start altering the contents of your Google Reader list, star/un-star items, etc. Stay tuned!

Tagged , , , , ,

Using the Google Reader API – Part 1

Google Reader API series

Google have never officially released API documentation for Google Reader, so this information is unofficial and subject to change.

When I first started looking around for API docs to interface with Google Reader, this site seemed to be pretty much the only resource available. Unfortunately a lot of the information in that document is outdated, so I set about trying to figure out the API myself, with some success.

To interface with Google reader, you’ll need to collect the following things from their authentication system:

  • SID – A session ID, which remains valid until you log out
  • Token – Similar to a session ID, but expires quickly. Used to access direct API calls
  • Cookie – An ordinary cookie that uses your SID to authenticate your session on API calls

The code samples I’m using will be in C#, but should be easily translatable to any language.

To get an SID, we send a GET request to https://www.google.com/accounts/ClientLogin with arguments service=reader&Email=[your Google username]&Passwd=[your Google password]
This will return some text with key=value pairs for LSID, SID and User. SID is the only part of this we need, so use your string library to grab this information.

Now that we have an SID we need a token. We send a GET request to http://www.google.com/reader/api/0/token with a cookie we generate from the SID. The cookie information is

  • Name: SID
  • Value: [your SID]
  • Path: /
  • Domain: .google.com

This request returns just the token text.

So here is the code I wrote for getting SID and token:

using System.Net;
using System.IO;

namespace GoogleReader.NET
{
    public class GoogleReader
    {
        private string _sid = null;
        private string _token = null;
        private Cookie _cookie = null;

        private string _username;
        private string _password;

        public GoogleReader(string username, string password)
        {
            _username = username;
            _password = password;

            connect();
        }

        private bool connect()
        {
            getToken();
            return _token != null;
        }
        private void getToken()
        {
            getSid();
            _cookie = new Cookie("SID", _sid, "/", ".google.com");

            string url = "http://www.google.com/reader/api/0/token";

            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
            req.Method = "GET";
            req.CookieContainer = new CookieContainer();
            req.CookieContainer.Add(_cookie);

            HttpWebResponse response = (HttpWebResponse)req.GetResponse();
            using (var stream = response.GetResponseStream())
            {
                StreamReader r = new StreamReader(stream);
                _token = r.ReadToEnd();
            }
        }
        private void getSid()
        {
            string requestUrl = string.Format
                ("https://www.google.com/accounts/ClientLogin?service=reader&Email={0}&Passwd={1}",
                _username, _password);
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(requestUrl);
            req.Method = "GET";

            HttpWebResponse response = (HttpWebResponse)req.GetResponse();
            using (var stream = response.GetResponseStream())
            {
                StreamReader r = new StreamReader(stream);
                string resp = r.ReadToEnd();

                int indexSid = resp.IndexOf("SID=") + 4;
                int indexLsid = resp.IndexOf("LSID=");
                _sid = resp.Substring(indexSid, indexLsid - 5);
            }
        }
    }
}

From here were can make interface calls (I plan to document the complete set of API calls in the very near future, but for now I’ll give a couple of examples). To add a subscription, make an HTTP POST to http://www.google.com/reader/api/0/subscription/quickadd?client=scroll with POST arguments quickadd=[url of feed]&T=[your token] and don’t forget to include your cookie. I wrote a couple of quick helper methods for making POST and GET calls first:

private HttpWebResponse httpGet(string requestUrl, string getArgs)
        {
            string url = string.Format("{0}?{1}", requestUrl, getArgs);

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.Method = "GET";
            request.CookieContainer = new CookieContainer();
            request.CookieContainer.Add(_cookie);

            try
            {
                return (HttpWebResponse)request.GetResponse();
            }
            catch
            {
                // handle error
                return null;
            }
        }

        private HttpWebResponse httpPost(string requestUrl, string postArgs)
        {
            byte[] buffer = Encoding.GetEncoding(1252).GetBytes(postArgs);

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(requestUrl);
            request.Method = "POST";
            request.CookieContainer = new CookieContainer();
            request.CookieContainer.Add(_cookie);
            request.ContentType = "application/x-www-form-urlencoded";
            request.ContentLength = buffer.Length;

            Stream PostData = request.GetRequestStream();

            PostData.Write(buffer, 0, buffer.Length);
            PostData.Close();

            try
            {
                return (HttpWebResponse)request.GetResponse();
            }
            catch
            {
                //handle error
                return null;
            }
}

Now for the add subscription code:

        public bool AddFeed(string feedUrl)
        {
            string data = String.Format("quickadd={0}&T={1}", feedUrl, _token);
            string url = "http://www.google.com/reader/api/0/subscription/quickadd?client=scroll";
           
            HttpWebResponse response = httpPost(url, data);
            if (response == null) return false;
            return true;
        }

We can add a label to a feed by POSTING to http://www.google.com/reader/api/0/subscription/edit?client=scroll with arguments a=user/-/label/[new label]&s=feed/[feed url]&ac=edit&T=[token] like so:

        public bool AddLabelToFeed(string label, string feedUrl)
        {
            string data = String.Format
                ("a=user/-/label/{0}&s=feed/{1}&ac=edit&T={2}", label, feedUrl, _token);
            string url = "http://www.google.com/reader/api/0/subscription/edit?client=scroll";

            HttpWebResponse response = httpPost(url, data);

            if (response == null) return false;
            return true;
        }

There are of course listing methods for getting feeds. So far I’ve only been able to figure out how to get feed information in JSON format, but if anyone knows how to grab XML data please leave a comment. To list items we send a GET request to http://www.google.com/reader/api/0/stream/contents/user/-/state/com.google/reading-list?ck=[current UNIX time] . Of course there are many more otpional arguments for all of these methods, which I will be documenting. Here’s the list code:

public void ListAll()
        {
            string url = string.Format
                ("http://www.google.com/reader/api/0/stream/contents/user/-/" +
                 "state/com.google/reading-list");
            string args = string.Format
                ("ck={0}", getUnixTimeNow());
            HttpWebResponse response = httpGet(url, args);

            Stream str = response.GetResponseStream();
            StreamReader sr = new StreamReader(str);
            string s = sr.ReadToEnd();
            // Handle JSON data in s
           
            sr.Close();
        }

getUnixTimeNow() is a small helper method I wrote for getting the current Unix time:

private long getUnixTimeNow()
        {
            TimeSpan ts = (DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0));
            long unixTime = (long)ts.TotalSeconds;
            return unixTime;
        }

This has been a small taste of what’s to come as I work my way through this API. Stay tuned for more. If you want to get hacking on it yourself, the easiest way is to download Fiddler, a great web debugging tool, and watch the HTTP requests as you use Google Reader.

Part 2 now available.

Tagged , , , , , , , ,

JD’s The Perils of Travel Video transcribed

The following is the text transcript from a video posted at JD’s Man Stories titled The Perils of Travel. This is not my content, but I felt the video was such a terrible medium for such a great story that a text only version was necessary. I hope you enjoy it as much as I did, and be sure to visit JD’s blog, it’s a lot of fun. Video at the bottom.

During the Summer of 2007, I had the opportunity to backpack around Europe for 2 weeks.

I talked about it often before I left. My girlfriend however, although great in many respects, was not the world’s greatest listener.

I left on Friday June 1st. Despite even calling her to say goodbye the night before, she never realized I left. When I arrived home 2 weeks later, I had several emails from her, waiting in my inbox…

June 1st 11:31am
Subject: Maddy Babe!

*Smooch*

Hey doll, me and the girls are gonna hit up the maddy tonight. I’ll call you in a bit, you and the boys should come out too.

Love ya,
Em

June 1st 4:40m
Subject: Come out come out, hehe

Hey hun, me again.

Tried calling you cell a few times today but it kept going right to your voice mail. You’re not screening me are you? ;)

Give me a call when you get this.

My cell was off for the entire trip so that I wouldn’t get insane roaming charges in Europe.

June 2nd 9:12am
Subject: Missed you! (frowny face)

JDddddddddd… I missed you last night! There were these two suepr creepy guys who kept trying to talk to us all night. I let the one guy buy me a drink and as soon as he handed it to me I told him he reminds me of my boyfriend, ha ha ha.

I tried you cell again but again no answer, having cell problems dear? Anyway, Maria and me are going out for brunch to try and get over our hangovers…. tonight, you and me, dinner at your place.

Call me today.

Later Baber,
Em

June 3rd 10:33am
Subject: Uh… hello…?

JD, wtf. Why are you not responding to my calls and emails? Where are you?! I waited all night for you to get in touch with me. I’m not happy here babe… call me ASAP!!!

June 3rd 8:41pm
Subject: (no subject)

June 5th 5:50pm
Subject: WTF

What the fuck is going on?! Why are you avoiding me? You’re not answering your cell, you won’t return me texts… JD wtf?!

I know you’re around! Your friend Jeremy is such a bullshitter. When he said he hadn’t seen you all weekend I could hear you talking in the background!
I thought you were different fuck nut!

Last chance… call me tonight with the best excuse of your life or I walk asshole.

Apparently she call one of my friend to find out where I was on the weekend. I have no idea what she’s talking about here though.

June 5th 8:11pm
Subject: Read this asshole!

I tried calling you, I’ve tried calling your friends, you mom. What the fuck did I do? Was it just time? Time to dump me? You could at least tell it to my face asshole.

We’re through… don’t call me, don’t text me, don’t bother now. You’ll never know what you lost, I was the one, and now I feel sorry for you because you’ll never have that again. I feel so sorry for you, ha ha ha.

She left messages at my mom’s demanding I call her back. My mom would’ve called her to tell her I was in Europe except she didn’t leave her number and my mom doesn’t have it.

I just got dumped while I was on a different continent than my girlfriend…

I’ve been in Europe for only 5 days…

And the emails are just beginning.

June 8th 1:07am
Subject: (no subject)

I hate you.

June 10th 3:44am
Subject: FUUUUUUUUUUCK U

Hey fuck fcee,

remmembe r that friend of mine that I was you were jealous of who said that nuffin ever will happen with well I was crying with him about you and he told me how amazing I was, how he always thought so and so i fucked him to show you I’m right! Now who’s the stuipd one? I can’t get any guy I want and whatr are you doing just sitting at home crying over how you lost me? Well don’t cry for me because you’ve already lost me asswhole! Ha ha ha ha ha

I would say I’ve been cheated on… but technically I’m single at this point

June 10th 4:01am
Subject: (no subject)

June 12th 2:20pm
Subject: Read ASAP

JD, call me, we need to talk.

June 12th 8:11pm
Subject: Just listen…

Ok fine, you don’t want to call me then just listen. I’m mad and hurt right now. I really felt something between us and now you’ve gone and thrown it all away and I have no idea why… JD, we were amazing together weren’t we?

We always had fun, and I tried to be so easy going and happy with yo. We were the coupel that could spend and evening out with our friends or laze about on the couch and either way end it all in fantastic sex and confessions of love. I know that something has happened to change all that, but you have to admit that you still feel something for me. To deny that is to deny your very soul.

I know you’ll call me tonight. We have a lot to discuss. A lot of bad and good. It may not change thing and we may still be broken up, but at least you owe me a conversation. A chance.

Em

I’ve been in Europe for 12 days. I am coming home on Saturday. On this day I go shopping in Rome and get a necklace for Em and write a postcard to my grandparents.

June 14th 7:01am
Subject: I tried…

I tried to reach out to you JD, I really did. But I take back all those nice things I said. I’m glad we’re broken up. You’re boring as shit to be with. I pretended so many times to like the stupid shows you like, to watch the stupid movies you like, to enjoy spending time with your asinine friends. I’ve moved on. I realize that you are not the right person for me in any way whatsoever. You bring out the worst traits in me and I’m a million times better without you.

I’m bringing a bunch of your stuff to your mom’s house. So long JD, I’d like to say it was fun but it really wasn’t. Believe me when I say I never want to hear from you again.

Em

June 14th 8:21am
Subject: OPEN FIRST!!! DO NOT READ ANY OTHER EMAILS!!!

If you love me, you will delete every email I’ve sent over the past week without reading it. JD please I am begging you that when you get back you do not read any email but this one. We’ve all made mistakes while you’ve been away. I can explain it all to you, call me ASAP. I love you with all my heart and soul!

Em

When she went to my mom’s house to drop off my stuff, she bumped into my mom… my mom told her I was in Europe until Saturday.

I read the emails… Em and I are no longer together. I learned two important lessons from this whole ordeal…

1. Careful when you date passionate people, because passion swings both ways. Sometimes they’ll love you, but other times they’ll hate you. And when they hate you… boy do they hate you.

2. When you go to Europe for 2 weeks, leave your fucking phone on.

jdsmanstories.blogspot.com

Tagged , , ,

Regular Expressions in C – Having Trouble with the Pipe ‘|’ Character?

I was working on an assignment for my C class recently and spent an inordinate amount of time solving a simple problem with regular expressions. Hopefully this quick post will help save someone else the frustration.

Let’s say you have the regular expression Cat|Dog|Horse (should match Cat OR Dog OR Horse). You compile the regex using

char* regStr = "Cat|Dog|Horse";
regex_t reg;
regcomp(&reg, regStr, REG_ICASE);

So far so good. But now we try testing this regex against the word Dog

int result = regexec(&reg, "Dog", 0, NULL, 0);
printf("Result: %d", result);

And it fails! Well don’t despair, the solution is simple. The pipe ‘|’ operator in C regular expression is only included in the extended set of regex operators. When compiling the regular expression using regcomp simply use REG_EXTENDED in the last argument. If you still want your regular expression to be case insensitive you can pipe two arguments together, like

regcomp(&reg, regStr, REG_ICASE|REG_EXTENDED);

Hope that helps!

Tagged , ,

Be Careful with Random Numbers in .NET

[private void ColorWindowButton_Click(object sender, RoutedEventArgs e)
{
// todo fix this, crashes if window previously closed
_colorView.Show();
}[

I’m working on a little game for some friends that involves rolling five dice on a board and came across an interesting little problem, which I’ll illustrate with a very small dummy program. Here we have a simple die class for creating individual die objects:

using System;

namespace DiceRoller
{
    class Die
    {
        public int Face { get; private set; }

        private const int maxRoll = 6;
        private Random ran = new Random();

        // Constructor
        public Die()
        {
            Roll();
        }

        // Roll the die
        public int Roll()
        {
            Face = ran.Next(maxRoll) + 1;
            return Face;
        }
    }
}

And a small console program that will instantiate 5 dice and roll all five of them, 5 times.

using System;

namespace DiceRoller
{
    class Program
    {
        static void Main(string[] args)
        {
            Die[] dice = { new Die(),
                           new Die(),
                           new Die(),
                           new Die(),
                           new Die()
                         };

            for (int j = 0; j < 5; j++)
            {
                for (int i = 0; i < dice.Length; i++)
                {
                    Console.Write(dice[i].Roll());
                }
                Console.WriteLine("\n");
            }

            Console.ReadLine();
        }
    }
}

And here is the output.
Output

Whoa! Not very random. The default .NET random number generator appears to be using the system time to generate the random numbers. Because our calls to ran.Next() are so close together, we’re getting mostly the same result. There are a couple of ways to fix this. One option is to tell the thread to stop executing for some time before the roll, by using Thread.Sleep(10) at the start of the Roll() method. This works, but it’s ugly.

A better solution is to make the Random object in the Die class a static member. This way, every die will share the same Random, and so each time ran.Next() is called it iterates the random number on the same object. This solution is perfect. You only need to change one line in the Die class:

private static Random ran = new Random();

And the result:
Output

Perfect! In this case it was easy to see the problem and very simple to fix, but the moral of the story is, be careful with the RNG if you’re using it for anything non-trivial, especially when you’re generating multiple numbers from different instances of Random.


Tagged , , ,

Elephantcatch – WPF podcast catcher and aggregator

This program aggregates podcast RSS feeds, automatically checks them for new items and downloads the audio and video files. Built from the ground up in WPF using the Model-View-ViewModel pattern. UI colors and transparency can be customized. This software is still in very early stages and contains many bugs.

Download! (requires Windows XP, Vista or 7, .NET Framework 3.5 SP1)
Location 1
Location 2

Get the code!
WebSVN
SVN

ElephantCatch UI

Customize the UI colors

Customize the UI colors

ElephantCatch