Prefix tree (trie) in F#

What is a prefix tree?

A prefix tree - otherwise known as a trie (pronounced “tray”) - is a data structure designed for fast searching over prefixes - for example given the list of words [CAT, CATS, CART, CARTS], which words are prefixed with “CAR”? Performing this search on a hashset would require you to check every value (O(n)) complexity). A sorted list would need to perform a fast search for the first item with the prefix and then scan or search the list until no matching items remain - O(logn) + O(logm) where m is the sublist which has the prefix - which is fast, but a sorted list is slow to build, slower than a hashset to search for single items and slow to tell me if a given string is a valid prefix to any word in the list.

With a prefix tree this search is performed in O(m*p) time (again m is the sublist of prefixed strings, and p is the length of the average result string - typically very small compared to the string list) and search for a single string in the prefix tree is performed in O(p) where p is the length of the string to search for (not as fast a a hash set, but very near for most cases). At a glance this seems worse than a sorted list, but in typical use cases, such as storing dictionary words, the length of the word you’re searching for is a much smaller number than the number of words you’re storing.

And the coolest thing about a prefix tree is that it’s actually really simple. A prefix tree is a node with (optionally) a value, and a map of values to subnodes. To construct a prefix tree, start with a node with no value and add a string by appending that node’s map with the first character, mapping to a new trie with that character as its value, and add the rest of the string to that node recursively in the same way.

Implementation

I have implemented a simple prefix tree in F#. The code is available here and you can find it on nuget. This trie works well in C# and F# (and any other CLR language) and unlike other F# prefix trees I’ve spotted around the internet, mine is completely immutable.

Here’s an example of usage from F#:

open FTrie
[<EntryPoint>]
let main argv = 
    let t = Trie(["abcde";"abba";"ababab"])
    printfn "%A" (t.getWords()) // seq ["ababab"; "abba"; "abcde"]

    let t2 = t.withWord("cdfge");
    printfn "%A" (t2.getWordsPrefixedBy("ab")) // seq ["ababab"; "abba"; "abcde"]
    0

And in C#

using FTrie;
using System;
class Program
{
    static void Main(string[] args)
    {
        Trie t = new Trie(new[] { "cars", "carts", "cats", "cartoons" });
        Console.WriteLine(string.Join(", ", t.getWordsPrefixedBy("car"))); // cars, cartoons, carts

        var t2 = t.withWord("carve");
        Console.WriteLine(t.isPrefix("car")); // True
        Console.WriteLine(t.isPrefix("cra")); // False;
        Console.ReadLine();
    }
}

A better add function

This immutable implementation creates a new trie when you call the withWord() function instead of adding the word to the existing data structure. It does this by just pulling out every existing word and calling the Trie constructor with the plus the new one. With very large tries this could become quite memory and computationally intensive.

You could implement a better add function using the following algorithm. I didn’t bother because I didn’t need this functionality, so I leave it as an exercise for you (feel free to open a pull request into my repo if you find a good solution).

add(word)
1. currentNode = root
2. while currentNode contains first letter of word
3.   currentNode = currentNode.children[first letter of word]
4. clone currentNode except with a new children map containing the rest of this word chain
5. replace the chain of parents with new tries that reference the new nodes

What you’ll end up with is two trees that reference identical nodes in all but one branch and have different roots. This would be much faster than creating whole new prefix trees from scratch, and use less memory. Because all branches are immutable no one with a reference to another trie could mess around with data in your branches. Don’t forget to account for the EOW bit!

New scriptcs script pack - MemberPrint

I have just published a really simple little ScriptCs (I'm still not sure if I should be capitalising that or not) script pack. It's intended to be used on the REPL. All it does it print some details about the objects and types to help you navigate the REPL more easily. As usual, the code is available on GitHub.

To install, run the following command:

scriptcs -install scriptcs.memberprint

Then just run scriptcs. The easiest way to figure out the API is to run it over itself, like this:

c:\test>scriptcs
scriptcs (ctrl-c or blank to exit)
> var print = Require();
> print.Methods(print);
+   Constructors(Type t, BindingFlags flags, String regex, Boolean verbose) : Void
+   Constructors(Object o, Boolean verbose) : Void
+   Constructors(Object o, String regex, Boolean verbose) : Void
+   Constructors(Type type, Boolean verbose) : Void
+   Constructors(Type type, String regex, Boolean verbose) : Void
+   Constructors(Object o, BindingFlags flags, Boolean verbose) : Void
+   Constructors(Type t, BindingFlags flags, Boolean verbose) : Void
+   Constructors(Object o, BindingFlags flags, String regex, Boolean verbose) : Void
+   Equals(Object obj) : Boolean
+   Events(Object o, Boolean verbose) : Void
+   Events(Object o, String regex, Boolean verbose) : Void
+   Events(Type t, Boolean verbose) : Void
+   Events(Type t, String regex, Boolean verbose) : Void
..... (etc)

MemberPrint supports a few filtering options. You can filter using BindingFlags, just as though you were doing the reflection yourself:

> print.Methods(new System.Text.RegularExpressions.Regex(""), BindingFlags.Static|BindingFlags.Public);
+ # CompileToAssembly(RegexCompilationInfo[] regexinfos, AssemblyName assemblyname) : Void
+ # CompileToAssembly(RegexCompilationInfo[] regexinfos, AssemblyName assemblyname, CustomAttributeBuilder[] attributes) : Void
+ # CompileToAssembly(RegexCompilationInfo[] regexinfos, AssemblyName assemblyname, CustomAttributeBuilder[] attributes, String resourceFile) : Void
+ # Escape(String str) : String
+ # get_CacheSize() : Int32
+ # IsMatch(String input, String pattern) : Boolean
..... (etc)

You can also filter using regular expression text.

> print.Methods(new System.Text.RegularExpressions.Regex(""), "^Is.+");
+ # IsMatch(String input, String pattern) : Boolean
+ # IsMatch(String input, String pattern, RegexOptions options) : Boolean
+ # IsMatch(String input, String pattern, RegexOptions options, TimeSpan matchTimeout) : Boolean
+   IsMatch(String input) : Boolean
+   IsMatch(String input, Int32 startat) : Boolean

You can run it over types as well as instances. Can't remember the constructors for System.DateTime?

> print.Constructors(typeof(DateTime))
+   .ctor(Int64 ticks)
+   .ctor(Int64 ticks, DateTimeKind kind)
+   .ctor(Int32 year, Int32 month, Int32 day)
+   .ctor(Int32 year, Int32 month, Int32 day, Calendar calendar)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, DateTimeKind kind)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Calendar calendar)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Int32 millisecond)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Int32 millisecond, DateTimeKind kind)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Int32 millisecond, Calendar calendar)
+   .ctor(Int32 year, Int32 month, Int32 day, Int32 hour, Int32 minute, Int32 second, Int32 millisecond, Calendar calendar, DateTimeKind kind)

There are multiple overloads for each of the following methods:

  • Methods(object o)
  • Properties(object o)
  • Events(object o)
  • Constructors(object o)
  • Members(object o) (this one just calls all of the others)

The code is a little rough around the edges at the moment, but so far I have found it useful enough that I decided to share the script pack.

Solutions to some problems hosting Windows Workflow Foundation applications in AppFabric

I'm trying to stand up some Windows Workflow Foundation applications in AppFabric for a spike on an application I'm building and so far it has been... unpleasant. Here are a couple of solutions to some frustrating problems I encountered. Hopefully I can help save someone else a bit of time.

Workflow persistence is not functional because the net.pipe binding is not enabled for this web site

Encountered while trying to configure my WF service on an IIS virtual directory. In particular I encountered this error on the Workflow Persistence tab and a similar one onthe Workflow Host Management tab in the Configure WCF and WF for Application dialog.

Enable the net.pipe protocolNow, in theory this will be automatically fixed when you click Apply. And in theory, even if it's not then it should be resolved by taking the following steps:

  • Right click the web site that hosts your service in the IIS Connections pane, click Manage Websites -> Advanced Settings and add ,net.pipe to the end of the "Enabled Protocols" setting (no spaces!)
  • With the same site selected, click Edit Bindings on the Actions pane and add a net.pipe binding with Binding Information = *

And in theory, practice and theory are the same. But in practice, unfortunately, they are not. If you're not running Windows Server (or your install is a bit wonky) then there's an extra step to take, because you'll find that net.pipe is not an available binding type in the bindings dialog.

To resolve: Open the "Turn Windows features on or off" dialog (find it by searching in the Start menu/screen, under Settings). Open up the Microsoft .NET 3.5 folder and tick "Windows Communication Foundation Non-HTTP Activation". This should make the net.pipe binding available to you. You may need to run iireset in a console as admin first.


Error: Unable to start debugging on the Web Server

or, if you try running without debugging

Error: Cannot obtain Metadata from xamlx The remote server returned an unexpected response: (405) Method Not Allowed.

or, if you drill further into that error

HTTP Error 404.3 - Not Found The page you are requesting cannot be served because of the extension configuration. If the page is a script, add a handler. If the file should be downloaded, add a MIME map

The honest truth is I'm not sure what caused this error. I tried a whole bunch of stuff. I added some handlers to my root Web.config, installed some Workflow hosting modules in IIS, played with permissions, all kinds of stuff and I couldn't get the site to handle the .xamlx request. Ultimately here's what resolved the issue for me. The first step may not be necessary, I honestly don't know (I recommend reading the linked article). But it worked.

  • Run the following command: "%WINDIR%\Microsoft.Net\Framework\v3.0\Windows Communication Foundation\ServiceModelReg.exe" -r
  • Uninstall and reinstall the following Windows features from the "Turn Windows features on or off" dialog: Windows Communication Foundation HTTP Activation, Windows Communication Non-HTTP Activation. They are both under the .NET Framework 3.5 feature.

applicationHost.config : Unrecognised attribute: 'serviceAutoStartMode'

After I got my workflow up and running I realised that every other service and website on my machine was broken. It seems that AppFabric stepped all over my applicationHost.config file and left an invalid attribute in there. Take a look at this thread for the solutions to this issue.

Building a scriptcs script pack

If you haven't seen scriptcs yet you should check it out. It's a scripting environment for C#, but more interestingly it incorporates a module system similar to what you'd see in node.js using NuGet as a packaging mechanism. I won't go into an intro to scriptcs, because others like Scott Hanselman and the man himself, Glenn Block (and contributors) have done a better job that I could have.

Instead I'll talk about a concept that (so far) hasn't had a lot of documentation love - script packs.

What is a scriptcs script pack?

The basic idea is that a script pack is an object (generally compiled in a DLL and available in your script's bin directory) that either exposes some functionality to the scripting environment or simplifies the usage of some other functionality. This seems like an overly broad definition so let's start with an example of script pack usage. From the scriptcs.WebApi script pack.

public class TestController : ApiController {
  public string Get() {
    return "Hello world!";
  }
}
var webApi = Require();
var server = webApi.CreateServer("http://localhost:8080");
server.OpenAsync().Wait();
Console.WriteLine("Listening...");
Console.ReadKey();
server.CloseAsync().Wait();

That's the complete script - no using statements and ridiculously simple API. The script pack takes care of "injecting" using statements into any script that uses the script pack (ie, calls that Require method above - we'll get into that) and returning a script-friendly object to the shell.

Let's get started building a script pack. I want to create a script pack that allows me to pull down arbitrary strings from an HTTP request - let's keep it really nice and simple.

Fire up Visual Studio (note: you don't actually need Visual Studio for this, but I'll assume if you're comfortable enough to use other tools then you'll know when to adjust the steps) and create a new Class Library project. Delete that damn Class1.cs file, open the Package Manager console and execute the command

Install-Package scriptcs.contracts

The contracts library has a couple of interfaces we'll need to implement to make our script pack work. Let's start by creating our "convenience object". We'll call it Http, and all it's going to do is make GET requests and return strings. So let's implement this class.

using System.Net;
using ScriptCs.Contracts;
namespace ScriptCs.Http
{
    public class Http : IScriptPackContext
    {
        public string Get(string uri)
        {
            var client = new WebClient();
            return client.DownloadString(uri);
        }
    }
}

That IScriptPackContext interface there is a marker interface that scriptcs seems to use to identify objects that are capable of being returned by script packs. It has no members, so there's nothing to implement.

The next step is to implement the script pack itself. This object will be picked up by the scriptcs engine and is capable of generating and returning IScriptPackContext objects. Because our script pack is so simple we don't need to do much in the implementation of this interface:

using ScriptCs.Contracts;
namespace ScriptCs.Http
{
    public class HttpScriptPack : IScriptPack
    {
        IScriptPackContext IScriptPack.GetContext()
        {
            return new Http();
        }
        void IScriptPack.Initialize(IScriptPackSession session)
        {}
        void IScriptPack.Terminate()
        {}
    }
}

This is fairly self-explanatory. The scriptcs engine will cache all available IScriptPack objects and when a particular script pack is requested (see below) then the appropriate script pack context will be returned.

The Initialize method here is interesting. We're not doing anything with it in this particular program, but it's worth going over what can be done. This method gives us an IScriptPackSession object which has two important methods:

  • AddReference - use this method to add library references to be available in your script. After your script pack is loaded into a script the specified references will be available for your use with no further code inside the script.
  • ImportNamespace - this method can import namespaces for use in your scripts. It's just a nice convenience so that your user's scripts can stay nice and clean. We don't need to do this in our current script pack, but if you refer back to the WebApi script above you'll notice we're using the ApiController class without any using statements. This is because the WebApi script pack imports this namespace for you.

Obviously the Terminate method should be used for cleaning up any resources after yourself and leaving the machine in a good state.

So that's pretty much all there is to building a script pack. Now we'll go ahead and use it. Build your library, grab the DLLs from your bin directory and move it into a test folder like c:\myscripts\bin. Create a new script that looks like

var http = Require();
Console.WriteLine(http.Get("http://google.com"));

Save it into your scripts folder (c:\myscripts\myscript.csx in my above example) and execute it using

scriptcs myscript.csx

And observe the output. All done! Now all that's left is to package your script up in a NuGet package and publish.

.NET Daily Quiz Archive - Week 9

.NET Daily Quiz #041

Can you predict the output of this program? Do you know the rules determining the order in which these constructors are called?

using System;
public class MyApplication
{
    public static void Main(string[] args)
    {
        Console.WriteLine(Foo.i);
        Console.WriteLine(Foo.i);
        new Foo();
        new Foo();
        new Foo();
        Console.WriteLine(Foo.i.ToString());
        Console.WriteLine(Foo.i.ToString());
    }
}
class Foo
{
    public static int i = 0;
    static Foo()
    {
        i++;
    }
    public Foo()
    {
        i++;
    }
}

Answer

The output is

1
1
3
2

Here's what's happening. When

Console.WriteLine(Foo.i);
is called, the constructed type Foo is realized. This constructed type contains a static variable i=0. Now according to the C# spec (section 10.12):

The execution of a static constructor is triggered by the first of the following events to occur within an application domain (emphasis mine):

  • An instance of the class type is created.
  • Any of the static members of the class type are referenced.

Next we call

Console.WriteLine(Foo.i);
and according to the C# spec (same section) a new static constructor execution is called because we have a new closed type:

A static constructor is a member that implements the actions required to initialize a closed class type.

We now have two closed types, each with their own static members. From here the execution is trivial to predict - each closed type acts on its own static members.


.NET Daily Quiz #042

See comments in code for question…

using System;
public class Dynamics
{
    public static void Main(string[] args)
    {
        // Why doesn't this work...
        IGeneric c = new Generic();
        // ... but this does?
        string s = new Generic().Thing();
    }
}
interface IGeneric<out T>
{
    T Thing();
}
class Generic : IGeneric
{
    public T Thing()
    {
        return default(T);
    }
}

Answer

The dynamic type in C# is essentially compiler smoke and mirrors. When you compile this:

// Why doesn't this work...
IGeneric c = new Generic();

All the CLR sees is:

IGeneric c = new Generic();

Covariance and contravariance on object is not allowed (because there is no implicit conversion from object to anything else), so the above code fails. There is an implicit conversion from string to object, so swapping the type arguments works:

// The following works - hooray?
IGeneric c = new Generic();

The next piece of code has a lot more going on:

// ... but this does?
string s = new Generic().Thing();

When compiled, the CLR sees something like this:

IGeneric c = MagicalDynamicRuntimeTypeCast(new Generic().Thing());

The upshot is that dynamic is not a feature of the CLR — it’s a fancy trick by the C# compiler to generate the appropriate runtime code (which itself will instantiate the compiler). The CLR has no knowledge that a dynamic object is being used.

This means that as far as the CLR knows, the runtime type of dynamic is just object (or as Eric Lippert says, ‘"dynamic" as a type is nothing more than "object" with a funny hat on’). So any covariance/contravariance operations with a dynamic constructed typed behave exactly as if you replaced dynamic with object.


.NET Daily Quiz #043

This one might be a bit of a brain-bender. Think carefully.

Is the call to method M bound statically or dynamically (ie, is the call site for this method call determined at compile time, or is it dynamically dispatched)? Can you describe the line of reasoning that leads to your conclusion?

using System;
using System.Collections.Generic;
public class C1
{
    public static void Main(string[] args)
    {
        IEnumerable strings = null;
        IEnumerable dynamics = null;
        var result = C1.M(strings, dynamics);
        result.NoCompilerErrorHere();
    }
    public static T M(IEnumerable a1, IEnumerable a2)
    {
        return default(T);
    }
}

Hint: This program compiles. This information is relevant.

Answer

The correct answer is a static dispatch, although the reasoning is quite subtle (if you want to answer without diving into reflector or ildasm.exe). Most of my reference comes from Chris Burrows’ excellent blog. Burrows is the guy who designed much of and implemented almost all of dynamic in C#.

First you need to understand the rules behind two features: overload resolution with dynamic and assignment conversions.

Overload resolution — any method call with a dynamic argument is dispatched dynamically. This is simple.

Assignment conversions — assignment conversions are a new type of conversion introduced into C# to support dynamic. Just as implicit conversions are a subset of explicit conversions, assignment conversions are a superset of implicit conversions and a subset of explicit conversions. So the conversion hierarchy says that all implicit conversions are assignment conversions and all assignment conversions are explicit conversions. See the image to the right.

conversions

Why this was done is out of the scope of this quiz — see this article.

The main point here is the set of rules for dynamic conversions (emphasis mine):

  1. There is an implicit conversion to dynamic from every type (modulo the weirdos, like pointer types). Basically, if it has an implicit conversion to object, it has an implicit conversion to dynamic too.
  2. There are no implicit conversions from dynamic to any type aside from itself and object.
  3. However (this is the clever part), there are implicit conversions from any dynamic expression (not type!) to any type.
  4. There are implicit conversions between types (in both directions) when they differ only by object and dynamic (as in the exception in rule 2).

Now to answer the quiz question. Our first argument is IEnumerable. Now string is a candidate for type T. The second argument is of type IEnumerable, so dynamic is a candidate for T. The compiler does not throw an exception so the call must be unambiguous, which proves that there must be a conversion from one type to the other in the set {string,dynamic}, but not the other way.

From rule 1 above we know there must be an implicit conversion from type string to type dynamic. From rule 3 we know that the type dynamic is NOT implicitly convertible to type string. Because we’re dealing with types and not expressions then the call must be static — if we used expressions (eg, if we used naked dynamic and string objects) then this whole analysis would be deferred until run time — the static type analysis would never be performed because of the overload resolution rule given above.


.NET Daily Quiz #044

Continuing on with our exploration of dynamic. What is the result of this program, and why?

using System;
using System.Collections;
using System.Dynamic;
public class DynamicConverter : DynamicObject
{
    public override bool TryConvert(ConvertBinder binder, out object result)
    {
        Console.WriteLine("Converting to type {0}", binder.Type.Name);
        result = null;
        return true;
    }
    public static void Main(string[] args)
    {
        dynamic d = new DynamicConverter();
        IEnumerable i1 = d;
        IEnumerable i2 = (IEnumerable)d;
    }
}

Answer

The application prints the first conversion but crashes attempting the second. But why?

This one is slightly tricky, even if you know dynamic well. It’s important to note that DynamicObject is just a convenience object that hides a lot of the complexity of creating dynamic types (the more difficult but more flexible way is to implement IDynamicMetaObjectProvider, which is not trivial).

As a result, DynamicObject has a weird and interesting way of defining call sites for what appear to be dynamic methods. If you call a method that exists statically on a DynamicObject then the static call site will be used — basically, if the underlying language (whether it be C# or something else) gets first dibs on resolving any method calls, and this includes conversions.

So what’s happening in my example code? When we call IEnumerable i1 = d; the C# compiler says, “do I have an implicit conversion from variable d of type DynamicConverter to IEnumerable?” The answer is no, so the compiler dispatches the call dynamically. No problem. Next, we call IEnumerable i2 = (IEnumerable)d; and the compiler asks itself if it can perform this conversion. The answer is “yes!” because in C# there is always an explicit conversion from any class type to any interface type. Because C# can handle this conversion, it is bound statically and fails at run time.

This is an important point to keep in mind if you’re using DynamicObject. If you try to do something like define and call a method named ToString then you should be aware that C# will bind this method statically at compile time — you will not get the dynamic behaviour you might expect!


.NET Daily Quiz #045

Why is my application crashing? I know for a fact that dynamicCollection implements get_Count!

using System;
using System.Collections;
public class DynamicCount
{
    public static void Main(string[] args)
    {
        PrintCount(new int[10]);
    }
    public static void PrintCount(ICollection collection)
    {
        dynamic dynamicCollection = collection;
        Console.WriteLine(collection.Count);
        Console.WriteLine(dynamicCollection.Count);
    }
}

Result:

10
Unhandled Exception: Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: 'System.Array' does not contain a definition for 'Count'
   at CallSite.Target(Closure , CallSite , Object )
   at System.Dynamic.UpdateDelegates.UpdateAndExecute1[T0,TRet](CallSite site, T0 arg0)
   at DynamicCount.PrintCount(ICollection collection)
   at DynamicCount.Main(String[] args)

Answer

Count is implemented explicitly on instances of Array.

When a method is bound at runtime due to dynamic arguments, the runtime type of the dynamic argument is used. In this case the runtime type is System.Array, while the compile-time type is ICollection. The statically bound call to ICollection.Count succeeds because System.Array explicitly implements the ICollection interface. The dynamically bound call fails because… well for the same reason — ICollection is explicitly implemented, so Count is not available through the System.Array type.