.NET Daily Quiz archive — week 6

.NET Daily Quiz #026

Welcome to Javascript week! This week it's all JS, which is good for me because coming up with Javascript gotchas is like shooting fish in a barrel.

Let's start with a softball. What does this function return?

function helloThere(name) {
    return
    {
        theName: name
    };
}

Answer

The function returns 'undefined'.

In Javascript the expression to return must start on the same line as the return keyword. In the example I gave the expression started on the next line. As a result, the Javascript language requires a semicolon insertion on return. The resulting code looks like

return;
{
    theName:name
};

and the returned value is undefined. The solution is to write the statement like so:

return {
    theName:name
};

You should know the rules for semicolon insertion by heart - not because you are relying on the interpreter to insert your semicolons, but so that you can explicitly use semicolons in every case. Never rely on semicolon insertion. If you follow this rule then any time you do spot a case of semicolon insertion in your JS you can assume it's a bug and fearlessly change it.

Semicolon insertion rules:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf (PDF warning, page 26)

7.9.1 Rules of Automatic Semicolon Insertion
There are three basic rules of semicolon insertion:

1. When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one LineTerminator.
- The offending token is }.

2. When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.

3. When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation "[no LineTerminator here]" within the restricted production (and therefore such a token is called a restricted token), and the restricted token isseparated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.

However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted
automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become
one of the two semicolons in the header of a for statement (see 12.6.3).


.NET Daily Quiz #027

There’s a major problem with this function. Do you know what it is?

function getDocument(txt, abstr) {
    var c = "lower";
    for (var i = 0; i < txt.length; i++)
        if (txt.charAt(i) == txt.charAt(i).toUpperCase()) c = "mixed";
    return {
        text: txt,
        abstract: abstr,
        native: true,
        case: c,
        export: function () {
            // export to .zip'd JSON
            return zip;
        }
    };
}
var doc = getDocument("This is my document", "brief abstract");

Warning: this will work in some implementations, but it’s broken according to the spec.

Answer:

Believe it or not, the following names from the example are reserved words in Javascript: abstract, native, case, export

In fact, Javascript has reserved a whole slew of words that are not used in the language and are illegal if used as identifiers. Following is the complete list of reserved words in JS.

abstract 	else  		instanceof	super
boolean  	enum  		int  		switch
break  	 	export  	interface 	synchronized
byte  	 	extends  	let  		this
case  	 	false  		long  		throw
catch  	 	final  		native  	throws
char  	 	finally  	new  		transient
class  	 	float  		null  		true
const  	 	for  		package  	try
continue 	function  	private  	typeof
debugger 	goto  		protected 	var
default  	if  		public  	void
delete   	implements  return  	volatile
do  	 	import  	short  		while
double   	in  		static  	with  

http://www.javascripter.net/faq/reserved.htm


.NET Daily Quiz #028

This Javascript program works perfectly:

(function () {
    function printArgs(arr) {
        for (var i = 0; i < arr.length; i++) {
            console.log(arr[i]);
        }
    }
    function getLastTwoValues(arr) {
        return arr.slice(-2);
    }
    var arr = [1, 2, 3, 4, 5];
    printArgs(arr);
    console.log(getLastTwoValues(arr));
})();

Does this one behave the same? If yes, how? If not, why not?

(function () {
    function printArgs() {
        for (var i = 0; i < arguments.length; i++) {
            console.log(arguments[i]);
        }
    }
    function getLastTwoValues() {
        return arguments.slice(-2);
    }
    printArgs(1, 2, 3, 4, 5);
    console.log(getLastTwoValues(1, 2, 3, 4, 5));
})();

Note: I’m aware that the console object is not cross-browser compatible. Unless otherwise specified you can always assume my JS quizzes target the latest mainstream version of Chrome. V8 Engine is best engine.

Answer:

The second program will fail to execute inside of getLastTwoValues(), and here's why:

Every function call supplies two extra arguments additional to what it declares - this and arguments. this is dependent on scope (we'll get into this in a future quiz) and arguments is an "array" representing all of the arguments passed into a function, even if no arguments are declared in the function signature.

The problem is that arguments isn't actually an Array - it looks and acts like one in some cases, but has Object prototype and happens to contain a length property. Calling slice() (or any other Array methods) on arguments will fail.

If you really need to apply Array methods on your arguments parameter you can convert arguments to an array. The most common way of doing this is using the following idiom:

var actualArray = Array.prototype.slice.apply(arguments);

This is kind of hacky, but it seem to be common enough (Crockford uses it fairly extensively in his famous book, Javascript: the Good Parts).

Array.prototype.slice is a function that returns a subarray. The apply method is on the Function prototype and applies a given function, using the first argument as the this variable and subsequent arguments as function arguments. slice always returns an array, even when no arguments are supplied, so when we apply slice to arguments the returned value is an array with every element from arguments.

The complete code for our function becomes:

function getLastTwoValues() {
    var slice = Array.prototype.slice;
    return slice.apply(arguments).slice(-2);
}

In my opinion using Array.prototype.slice.apply is hackish and relies on implementation details of Array.slice() that we shouldn't be aware of. However, so much code now relies on this behaviour that it's unlikely to change in the near future. I'll use it, but I'll shower afterwards.


.NET Daily Quiz #029

My program is not behaving as expected. What is the console output and why? How can I fix it?

var myValue = 0;
(function () {
    for (; myValue < 10; myValue++) {
        console.log(myValue);
    }
    var myValue = 0;
    console.log(myValue);
}());

Answer:

The output look liks

0

The issue here is a concept called variable hoisting. Because we're prefixing myValue inside the function with var we are creating a new inner variable that hides the outer myValue (a variable that is never actually used). This inner myValue is then hoisted to the top of the function and assigned an undefined value, as per the Javascript specification. The resulting code looks something like:

var myValue = 0;
(function () {
    var myValue;
    for (; myValue < 10; myValue++) {
        console.log(myValue);
    }
    myValue = 0;
    console.log(myValue);
}());

Here the loop immediately terminates because "myValue < 10" evaluates to "undefined < 10" which is always false.

As an addendum, here’s the correct code if you really want myValue to remain in the global namespace (or the current function scope if my entire program happened to be wrapped in a function of its own):

var myValue = 0;
(function () {
    for (; myValue < 10; myValue++) {
        console.log(myValue);
    }
    myValue = 0;
    console.log(myValue);
}());

The only change I have made here is removing the var from the inner reference to myValue. This prevents the creation of a shadowed copy which prevents the hoisting of myValue to the top of the function. The program now behaves as expected.


.NET Daily Quiz #030

Can you explain the behaviour of this program?

(function () {
    var MyObject = function (id) {
        this.id = id;
        this.printId = function () {
            console.log("My id is " + id);
        }
    }
    var obj1 = new MyObject(123);
    var obj2 = {
        id: 123,
        printId: function () { console.log("My id is " + this.id); }
    };
    obj1.printId();              // prints "My id is 123"
    obj2.printId();              // prints "My id is 123"
    setTimeout(obj1.printId, 1); // prints "My id is 123"
    setTimeout(obj2.printId, 1); // prints "My id is undefined"
}());

Answer:

As usual with Javascript it comes down to the behaviour of the this variable. What’s happening here is that setTimeout uses the Function.apply (or maybe Function.call) method to apply the given function. Function.apply takes as its first argument a variable to set to the this context (see quiz #028). Because the id field is not set on the this supplied (the global object) it is undefined.

The reason obj1 is behaving “correctly” is because there’s a bug in the constructor function! We are binding the id argument to this.id but this is not the field we are using in printId() — we are using the captured id argument in a closure! Changing the id field on obj1 has no effect because we are not closed over that field. We can fix the MyObject constructor by writing it as

var MyObject = function (id) {
    this.id = id;
    this.printId = function () {
        console.log("My id is " + this.id);
    }
}

And make out code work correctly by calling the function in setTimeout like this:

setTimeout(function () { obj1.printId(); }, 1); // prints "My id is 123"

Javascript is a subtle language. There were two bugs (intentional) in the original code, but those bugs appeared to make the broken object work correctly and the correct object look broken.

.NET Daily Quiz archive - week 5

.NET Daily quiz #021

I wrote an application with two assemblies:

// Assembly 1
using System;
namespace Assembly1
{
    class Program
    {
        static void Main(string[] args)
        {
            Assembly2.Class1.PrintNumber();
            Console.ReadLine();
        }
    }
}
// Assembly 2
using System;
namespace Assembly2
{
    public class Class1
    {
        public static void PrintNumber(int optional = 10)
        {
            Console.WriteLine(optional);
        }
    }
}

When I compile this application I get the expected output — 10. Later I recompile and deploy Assembly 2 with the following change:

public static void PrintNumber(int optional = 20)
{
      // etc

The output I receive when I run the application is not what I expected. Why not?

Answer:

(This answer is a direct copy & paste from my colleague Ben Fox, whose answer had more detail than I prepared)

Optional parameters are a compiler service — the C# compiler puts metadata attributes on the optional parameters to indicate how they should be treated. E.g.

// Library assembly
static void Test(int i, int j = 0, string s = "this")
{
    Console.WriteLine(s, i, j);
}

Compiles to something like:

static void Test(int i, [Optional, DefaultParameterValue(0)] int j, [Optional, DefaultParameterValue("this")] string s)
{
    Console.WriteLine(s, i, j);
}

The compiler which compiles a call to this method automatically substitutes in the appropriate values in the calling assembly. E.g.

// Calling assembly
static void Main()
{
    Assembly2.Thing.Test(10);
    Assembly2.Thing.Test(11, j: 10);
    Assembly2.Thing.Test(12, s: "that");
    Assembly2.Thing.Test(13, s: "the other", j: 23);
}

Compiles to something like:

private static void Main()
{
    Assembly2.Thing.Test(10, 0, "this");
    Assembly2.Thing.Test(11, 10, "this");
    Assembly2.Thing.Test(12, 0, "that");
    Assembly2.Thing.Test(13, 23, "the other");
}

Thus the behaviour described in the quiz: because the old default values are also compiled into the calling assembly you must recompile both assemblies in order to reflect the new default values.

My meagre addition
The upshot is that optional parameters, when omitted by client code, are essentially passed through as a const argument from the client assembly. Optional parameters are syntactic sugar and importantly that sugar is on the client code, not the receiving end! Avoid changing optional parameters, especially when it’s a breaking change. Consider when you should prefer optional arguments to overloads and vice versa — more on this in an upcoming quiz.


.NET Daily Quiz #022

I write the following program:

using System;
using System.Collections.Generic;
class Program
{
    static void Main(string[] args)
    {
        var strings = new List();
        DoStuff(strings);
        var objects = new List();
        DoStuff(objects);
        DoStuff(new[] {"hello", "there"});
        Console.ReadLine();
    }
    static void DoStuff(object o)
    {
        Console.WriteLine("I found an object!!");
    }
    static void DoStuff(IEnumerable objects)
    {
        Console.WriteLine("I found a collection of objects!!");
    }
}

I’m using the C# 4 compiler. What is the result when I compile targeting .NET 3.5? What about .NET 4? If they’re different, why?

Answer:

Output (.NET 3.5):
I found an object!!
I found a collection of objects!!
I found a collection of objects!!

Output (.NET 4):
I found a collection of objects!!
I found a collection of objects!!
I found a collection of objects!!

The reason is that in .NET 3.5 generic parameters were not covariant. This is just a fancy way of saying that IEnumerable<T> (and all other generic types) had no implicit conversions to IEnumerable<G> where G is a super-type of T. Keeping in mind that overload resolution is determined at compile time, this means that while List<object> has an implicit conversion to IEnumerable<object> (because a List<object> is an IEnumerable<object>) there is no covariant conversion from List<string> to IEnumerable<object> (or even IEnumerable<string> to IEnumerable<object>).

Importantly you can see that there is a conversion from string[] to IEnumerable<object> — arrays have been covariant since .NET 1.0. The compiler sees a conversion from string[] to IEnumerable<object> because there is a covariant conversion from string[] to object[] and an object[] is an IEnumerable<object>.

Since .NET 4 all generic type parameters in interfaces (and delegates) have respected covariance (and contravariance, to be introduced in a future quiz). The compiler recognizes the covariant conversion from IEnumerable<string> to IEnumerable<object>.

There are a ton of good resources for covariance and contravariance on the web. For a .NET perspective you could start at this article.


.NET Daily Quiz #023

Most people know that finalization is significantly more costly than disposing of an object. Do you know why? Under what conditions should you prefer a finalizer to IDisposable?

Answer:

Before garbage collecting an object the garbage collector will look for a finalizer. If the object contains a finalizer then instead of collecting the object the GC will place the object on the finalizer queue.

At some time in the future (finalizer thread is non-deterministic) the finalizer thread will execute, going through each item on the queue and calling Finalize. The object is popped from the finalizer queue and marked as finalized. The GC is now free to collect the object on its next sweep.

Finalization keeps dead objects around for much longer than necessary in most cases. This is why it is common practice to call GC.SuppressFinalize() in your Dispose method - it ensures the GC will never places your object on the finalization queue.

Finalization is complex and if done incorrectly can lead to deadlocks and other really weird stuff (like zombie objects that bring themselves back to life during finalization). Writing finalizers should be avoided in most cases.

For some really candid discussion on finalizers from Eric Lippert, check out these two StackOverflow answers.
http://stackoverflow.com/questions/5223325/garbage-collection-and-finalizers-finer-points
http://stackoverflow.com/questions/6652044/c-sharp-language-garbage-collection-suppressfinalize


.NET Daily Quiz #024

I have the following application:

using System.IO;
using System.Threading;
using System;
class Program
{
    static object syncLock = new object();
    static void Main(string[] args)
    {
        var t1 = new Thread(Thread1);
        t1.Start();
        // ... do some long-running stuff
        // ... at some point in the future:
        t1.Abort();
        t1 = new Thread(Thread1);
        t1.Start();
    }
    static void Thread1()
    {
        while (true)
        {
            lock (syncLock)
            {
                // do some stuff
            }
        }
    }
}

This program runs fine in production, but sometimes it deadlocks when running in debug mode (this has never happened in production). Why?

Hint: this one is subtle — it exploits a bug in the C# language itself.

Answer:

The lock(x)... statement translates precisely to the following (C# spec 8.12):

System.Threading.Monitor.Enter(x);
try {
      ...
}
finally {
      System.Threading.Monitor.Exit(x);
}

except that x is evaluated only once.

A consequence of this generated code is that in debug mode the compiler may insert a "no-op" instruction between Enter(x) and the try block as a natural place to put a breakpoint. If a thread context switch occcurs during this no-op and the thread is aborted then the finally block will not be called, Exit(x) will never execute and the lock is held indefinitely, leading to a deadlock.

This no-op instruction is considered a bug. According to a co-worker this has been resolved in C# 4 - I'm not 100% sure if this is the case (haven't done my research).


.NET Daily Quiz #025

Is this syntax legal?

int myInt = MyMethod(out myInt);

What about this?

var myVar = MyMethod(out myVar);

Why?

Answer:

The first example is perfectly legal (albeit strange and probably not recommended). The compiler knows the type of myInt and because it's a value type (Int32) it has a default and can be used before it is explicitly initialized.

The second example is not legal. Imagine we had multiple overloads of MyMethod - without more information it's not possible for the compiler to know the expected output of MyMethod (in the general case) and therefore it can't know the input type for the out parameter.

.NET Daily Quiz archive - week 4

.NET Daily Quiz #016

Today we'll think about program design. Can you defined the following terms?
1. The no-throw guarantee
2. The strong exception guarantee
3. The weak exception guarantee

Which guarantee does this snippet offer, assuming RetrieveTaxRate could throw an exception?

var employees = GetAllEmployees();
var taxReport = employees.Select(e => e.RetrieveTaxRate());

What about this snippet?

var employees = GetAllEmployees();
var taxReport = employees.Select(e =>
      {
            var rate = e.RetrieveTaxRate()
            e.TotalTax += rate;
            return rate;
      });

Consider some code you have written recently. Which guarantees do you offer? Is there something you can do with your application to make stronger guarantees?

Answer:

These terms refer to the guarantees that your individual methods make with regards to their approach to exception-handling:
1. No-throw guarantees that a method will not throw an exception under any non-fatal circumstances. In my opinion (take it for what it’s worth) even a no-throw method could potentially throw OutOfMemoryExceptions and other very fatal exceptions.
2. The strong exception guarantee states that the method may throw an exception, but if it does then the containing object and any data it is working on will remain in the state it was in before the method was called. Any changes to internal data structures will be rolled back before the exception is (re-)thrown.
3. The weak exception guarantee states that an exception may be thrown, but that the object will be guaranteed to be in a useful but unknown state. The object and its internal data structures may have changed, but they will not be invalid. For example, maybe a method will be operating on a bunch of Money objects, and at some intermediate stage the Money objects will have the internal value of _dollars = -1 (an illegal value). One way of satisfying the weak exception guarantee would be to set these value to _dollars = 0 — this value is legal, but not necessarily correct.

Conversely to these guarantees we could consider the zeroeth exception guarantee — that all internal data structures cannot be guaranteed to be in a well-defined, legal state after an exception occurs. It is not possible to do meaningful work on a program that has reached this state — if an object does not make one of the 3 exception guarantees then an exception should crash the application or re-initialize it to a known good state.

The first code snippet offers a strong exception guarantee. The select statement makes no changes to the employee data. If RetrieveTaxRate throws, taxReport is never populated and nothing is changed from before the program was run.

The second code snippet offers a weak guarantee. The employees data is modified inside the select delegate (you should never do this, by the way). After the exception is thrown, each employee will have a legal TotalTax value, but not necessarily a correct one (especially if the exception handler retries the method).


.NET Daily Quiz #017

In C++ and some other languages, a common way to increase the performance of a program is to strategically inline particular functions. Is this capability available in C#? If so, how can we go about it? If not, why not?

Answer:

Since .NET 4.5 the System.Runtime.CompilerServices.MethodImplAttribute attribute has offered the AggressiveInlining flag, which will aggressively attempt to inline the function (although it’s still not guaranteed).

However like the C++ inline modifier, this should be used sparingly, if ever. For compiler-automated inlining, a method must be non-virtual, small in size (where “small” is defined by the compiler/JIT) and without complex control structures such as try/catch. I’m not sure if there are any more concrete rules behind this, I believe the JIT/compiler implementer is free to make this determination.

Bottom line, the compiler is (usually) smarter than you — but this doesn’t mean you shouldn’t understand what it’s doing.


.NET Daily Quiz #018

How well do you know your standard .NET data structures? Let's explore System.Collections.Generic, starting today with SortedDictionary<TKey,TValue>

The following questions assume a reasonable IComparer is supplied.

Do you know the underlying implementation of SortedDictionary<K,V>?
What is the computational complexity (big-O) for retrieval from an instance of this class?
What is the worst-case complexity for retrieval?

What is the computational complexity (big-O) for insertion into an instance of this class?
What is the worst-case complexity for insertion?

What is the computational complexity (big-O) for removal from an instance of this class?
What is the worst-case complexity for removal?

Do you know roughly how much memory a SortedDictionary<K,V> with n items occupies?

Answer:

The documentation for SortedDictionary<TKey,TValue> tells us it is a binary search tree. If we open SortedDictionary in Decompiler we can see the underlying data structure is a TreeSet<KeyValuePair<TKey,TValue>> which derives from SortedSet<T>, a Red-Black Tree.

Red-Black trees offer O(log n) complexity guaranteed across all retrieval, insertion and deletion operations. The worst case for any operation is a re-balance involving 2 rotations and 3 re-colorings of the tree, an O(1) amortized process.

Memory usage of SortedDictionary<T,K> is approximately 40 bytes per member (one object instance, a bool (red/black) and two references (left/right child)) plus the cost of the members and object references.
Memory is not required to be contiguous, so heap fragmentation does not get out of control with large numbers of inserts/deletes.


.NET Daily Quiz #019

What is the output of this application, and why? Hint: this isn't really another tricky threading question, think in terms of the IEnumerator state machine generated by the compiler.

using System;
using System.Collections.Generic;
using System.Threading;
class Program
{
    private static IEnumerator _enumerator;
    private static bool _moved = false;
    static void Main(string[] args)
    {
        _enumerator = GetStrings();
        var t1 = new Thread(MoveEnumerator);
        var t2 = new Thread(MoveEnumerator);
        t1.Start();
            Thread.Sleep(100);
        t2.Start();
        while (!_moved)
        {
        }
        Thread.Sleep(500);
        Console.WriteLine(_enumerator.Current);
        Console.ReadLine();
    }
    static IEnumerator GetStrings()
    {
        Thread.Sleep(1000);
        yield return "hello";
        yield return "world!";
    }
    static void MoveEnumerator()
    {
        _moved = _enumerator.MoveNext();
   }
}

Answer:

The reason we see the first item in the Enumerator block and not the second, despite having called MoveNext twice is explained by the rules of the state machine that the C# compiler generates for Enumerator blocks.

The following is a very brief, simplified version of the Enumerator state machine. For full details, check out the C# spec, 10.14.4 (bold for emphasis on the solution to this quiz).

- If enumerator state is before, MoveNext is called
- Change enumerator state to running
- Execute enumerator until interruption (either block finishes, exception is encountered or iterator block yields control)
- If enumerator state is running, result of MoveNext is unspecified
- If enumerator state is suspended, MoveNext is called
- Change state to running
- Resume execution of iterator from yielded position until interruption
- If enumerator state is after, MoveNext is called, return false

In the iterator block:
- When yield return is encountered
- Evaluate return expression
- Suspend execution of iterator
- Change state of iterator to suspended
- MoveNext returns true
- When yield break is encountered
- Execute finally blocks as necessary
- Change state of enumerator to after
- MoveNext returns false
- When iterator block is completed
- Change state of enumerator to after
- MoveNext returns false
- When an exception is encountered
- Execute finally blocks as necessary
- Change state of enumerator to after
- MoveNext returns false

The key here (in bold) is executing MoveNext while the state is running: according to the spec the result is unspecified. In this case it appears that the call is simply ignored by the iterator and MoveNext is returning false by the second call (which actually returns first).


.NET Daily Quiz #020

Users of my library are reporting resource leaks. What can I do to fix it?

namespace MyLibrary
{
    public interface IDoThings
    {
        void Things();
    }
    public class WorkDoer where T : IDoThings, new()
    {
        public void Thinger()
        {
            var t = new T();
            t.Things();
        }
    }
    public class WorkDoerWithMember where T : IDoThings, new()
    {
        T _t;
        public void MemberThinger()
        {
            if (_t == null) _t = new T();
            _t.Things();
        }
    }
}

Answer:

The supplied type parameter for either of our classes could implement IDisposable and this is a possibility we haven't considered in our class design. To work around this we need to defensively dispose of resources in the appropriate place.

For the class WorkDoer, T is only used locally in a method, so we can simply do it this way:

public class WorkDoer where T : new(), IDoThings
{
      public void Thinger()
      {
            var t = new T();
            using (t as IDisposable)
            {
                  t.Things();
            }
      }
}

Believe it or not this works correctly for both types that implement IDisposable and types that don't (which kind of surprised me).

Our second class uses T as a member so it's a bit more hairy. We need to ensure our class implements IDisposable itself and disposes or resources correctly, whether T implements IDisposable or not.

public sealed class WorkDoerWithMember where T : new(), IDoThings, IDisposable
{
      T _t;
      public void MemberThinger()
      {
            if (_t == null) _t = new T();
            _t.Things();
      }
      public void Dispose()
      {
            var d = _t as IDisposable;
            if (d != null)
            {
                  d.Dispose();
            }
      }
}

Note that I have marked the class as sealed. This ensures that subclasses can't reuse my Dispose method - to get around this, simply implement the full Dispose pattern (I skipped that for brevity). Also note that Dispose could be called more than once on our T object. We can't just set _t to null because our class supports value types. All IDisposable implementers must be able to handle multiple calls to Dispose, as per MSDN guidelines:

"If an object's Dispose method is called more than once, the object must ignore all calls after the first one. The object must not throw an exception if its Dispose method is called multiple times."

Consider better ways to design these classes, particularly in light of yesterday's dependency inversion session.

.NET Daily Quiz Archive - week 3

Daily Quiz #011

This code works correctly when I debug, but fails in release mode. Why? How can I fix it?

using System;
using System.Threading;
internal class Program
{
    public static void Main(string[] args)
    {
        bool complete = false;
        var t = new Thread(() =>
        {
            int count = 0;
            while (true)
            {
                if (complete)
                {
                    break;
                }
                count++; // Do some work.
            }
        });
        t.Start();
        Thread.Sleep(200);
        complete = true;
        if (!t.Join(2000))
        {
            t.Abort();
            Console.WriteLine("Fail - background thread did not complete.");
        }
        else
        {
            Console.WriteLine("Success");
        }
        Console.ReadLine();
    }
}

Answer:

The C# compiler optimizer only ever considers the single-threaded scenario. In this case complete is loaded into a register and never re-read because as far as the compiler is concerned it is never modified in the context of the closure in which it is checked.

There are two possible solutions — you can hoist the variable into an instance field and mark it as volatile:

volatile bool complete = false;
public static void Main(string[] args)
{
  // Etc.
}

The volatile keyword tells the compiler to never optimize away read access to this field (method variables cannot be marked as volatile, hence the instance field). In other words, every time the value of the field is accessed it is always read from main memory. Note that in this case even hoisting the variable to a non-volatile field fixes the issue. I can’t explain why this is, but you can’t rely on this behaviour. See C# spec 10.4.3.

The other solution to this issue is to use either a lock or MemoryBarrier object, but in this case this is overkill and I wouldn’t advise it. Don’t worry, more on MemoryBarrier in tomorrow’s quiz.


Daily Quiz #012

Given the following class:

class MyClass
{
    int _result;
    bool _complete;
    void Thread1()
    {
        _answer = 123;
        _complete = true;
    }
    void Thread2()
    {
        if (_complete) Console.WriteLine(_answer);
    }
}

If we execute methods Thread1 and Thread2 on separate concurrent threads, is it ever possible for Thread2 to print 0 instead of 123? If so, how does this happen?

Answer:

This is another case of the C# compiler only considering the single-threaded case when optimising your code. In some cases the compiler will re-order the execution of a method if it detects that the order of execution is not important. The CLR, compiler or even your CPU may also introduce caching optimisations that result in variable assignments being invisible to some threads for some time.

These optimisations will never affect single-threaded code, but beware of shared memory between threads for this reason.

The solution to this issue is to apply memory barriers around critical pieces of code. For our example:

class MyClass
{
    int _result;
    bool _complete;
    void Thread1()
    {
        _answer = 123;
        Thread.MemoryBarrier();
        _complete = true;
        Thread.MemoryBarrier();
    }
    void Thread2()
    {
        Thread.MemoryBarrier();
        if (_complete) Console.WriteLine(_answer);
        Thread.MemoryBarrier();
    }
}

A memory barrier indicates an area of code that must be executed in the order in which is it written. For (much, much) more detail, I recommend this excellent website. If you’re going to be writing a lot of asynchronous/parallel code then it’s an excellent resource.

One more important point — a lot of .NET code will implicitly generate memory barriers. These include lock statements, Interlocked members and anything that relies on signalling (including Task constructs). There is a performance overhead in using MemoryBarriers, but in almost all cases it’s negligible and worth taking.


Daily Quiz #013

Today we're moving away from multi-threading and into: Language fundamentals! Consider the following code:

using System;
public class MyFirstType {
       // some stuff
}
public class MySecondType {
       private MyFirstType _inner;
       // implicit conversion to MyFirstType
       public static implicit operator MyFirstType(MySecondType t) {
              return t._inner;
       }
       // some stuff
}
public static class MyFactory {
       public static object GetObject() {
              return new MySecondType();
       }
}
public class Program {
       public static void Main(string[] args) {
              object o = MyFactory.GetObject();
              // first conversion
              try {
                     MyFirstType t1 = (MyFirstType)o;
                     // do stuff
              }
              catch {
                     // deal with conversion error
              }
              // second conversion
              MyFirstType t2 = o as MyFirstType;
              if (t2 != null) {
                     // do stuff
              }
              else {
                     // deal with conversion error
              }
       }
}

Will either of these conversions succeed? If no, why not? If yes, which one(s) and how?

Answer:

Both of these conversions will fail.

First conversion:
Casts will apply implicit user-defined conversions, so you may expect this to succeed. However implicit conversions are applied only at compile time. If an object's compile-time type is not known to have an implicit conversion or inheritance relationship with the target type then the cast will fail. In this case the compile-time type of the object is System.Object, the compiler is not aware of an implicit conversion, so it fails. This does not cause a compiler error, because the compiler will allow any cast that it cannot prove will fail.

Second conversion:
The is and as operators do inspect the runtime type of an object, but they do not apply implicit conversions. These operators only care about inheritance relationships and boxing/unboxing conversions between types. Just because a MySecondType can be a MyFirstType doesn't mean that it is a MyFirstType.


Daily Quiz #014

Some people believe that simple class data should be exposed in fields and converted to properties when required. Why is this a terrible, terrible idea? (There are multiple reasons, but you should be able to justify them well). Note: I’m not asking for the reasons that properties are superior to fields — this is obvious. Think of the case of trying to change a field to a property in an existing (potentially large, disparate, distributed) application.

Answer:

The main issue here is binary compatibility. Access to a field generates different IL to property access. This means that if I deploy two assemblies, one of which references a field in the other, and I change that field to a property, then the first assembly will break unless it is compiled and redeployed against the changed one. This isn’t a problem if you always deploy your assemblies together, but it breaks if you are deploying assemblies separately.

Another problem is that fields and properties are inherently different, and they follow different rules in the C# language. For example, fields may be used as ref parameters, while properties may not. Even worse, there are subtle cases where changing a field to a property changes the functional meaning of code without issuing any warnings or errors. Consider this code:

using System;
public struct MyValueType
{
    public int MyThing { get; set; }
    public void ChangeMyThing(int newThing)
    {
        MyThing = newThing;
    }
}
public class MyReferenceType
{
    public MyValueType Value;
}
class Program
{
    static void Main(string[] args)
    {
        MyReferenceType a = new MyReferenceType();
        a.Value.ChangeMyThing(123);
        Console.WriteLine(a.Value.MyThing);
        Console.ReadLine();
    }
}

This works fine — the output is ‘123’. But if we change the line

public MyValueType Value;

to
public MyValueType Value { get; set; }

The output is ‘0’, because the property getter returns a copy of the original value type.

For more information on the topic of why we should (almost) always be using properties for public-facing interfaces, check out Jon Skeet’s article, Why Properties Matter. It’s an easy read and well worth it.


Daily Quiz #015

using System;
using System.Collections.Generic;
public struct MyValueType
{
    private string _name;
    public string Name
    {
        get { return _name; }
        set { _name = value; }
    }
}
public class Program
{
    public static void Main(string[] args)
    {
        var list = new List();
        var t = new MyValueType();
        t.Name = "Martin";
        list.Add(t);
        var t1 = list[0];
        t1.Name = "Doms";
        Console.WriteLine(list[0].Name);
    }
}

The above program outputs "Martin".

using System;
using System.Collections.Generic;
public interface INameable
{
    string Name { get; set; }
}
public struct MyValueType : INameable
{
    private string _name;
    public string Name
    {
        get { return _name; }
        set { _name = value; }
    }
}
public class Program
{
    public static void Main(string[] args)
    {
        var list = new List();
        var t = new MyValueType();
        t.Name = "Martin";
        list.Add(t);
       var t1 = list[0];
        t1.Name = "Doms";
        Console.WriteLine(list[0].Name);
    }
}

This program outputs "Doms".

Why?

How about the more interesting case - try the same using non-generic collections (System.Collections.ArrayList for example). Cast the output as normal. The outcome is the same, but for a different reason. What's happening in this case?

Answer:

First the solution to the initial problem:
In the first program List allocates an array under the covers of type MyValueType[]. This array is heap-allocated (like all arrays) and each item is a value type. When an item is added, a copy of the added item is placed in the array. When the item is accessed via list[0] no boxing occurs — the List simply copies the first item in the array and returns it. We modify the copy that it returns and throw it away.

The second program results in an allocation of a INameable[] and value types added to the list are boxed. Accessing the first item as an INameable object returns the boxed reference type, and modifications to this reference box type are preserved, hence the “Doms” output. Additionally it should be noted that all items exist on the heap in both cases, because List allocates an array on the heap.

In the problem where we use non-generic collections, things are slightly more interesting. The ArrayList class allocates an object[] on the heap. All value types are boxed in this case, so why does the unboxing not result in the throwing away of the value? The answer is in the boxing object. When an object is boxed, the box reference type implements all interfaces that the original object implements. Because we’re only interested in the interface, the entire box is returned by the ArrayList access — a reference type, not a value type. Without the interface we get yet another unboxed copy. Again, all objects are heap-allocated.

Head hurt yet? The moral of the story from today and yesterday’s quizzes is for heaven’s sake, do not make your value types mutable. Look at this quiz and yesterday’s quiz — all of our problems result from using mutable value types. Just don’t do it.

.NET Daily Quiz Archive — week 2

I was at TechEd during most of this week so some of the quizzes are a bit half-assed. Sorry.

Daily Quiz #006

using System;
using System.Collections.Generic;
public class Program
{
    static void Main(string[] args)
    {
        var stringFactories = new List<Func>();
        var urls = new List
        {
            new Uri("http://google.com"), new Uri("http://bing.com")
        };
        foreach (var i in urls)
        {
            stringFactories.Add(() => i.ToString());
        }
        foreach (var i in stringFactories)
        {
            Console.WriteLine(i());
        }
    }
}

What is the output of this program when compiled in C# 4.0? What about C# 5.0? Why are they different?

Answer:

In C# 4.0:

http://bing.com
http://bing.com

In C# 5.0:

http://google.com
http://bing.com

Why?
The weird output in C# 4.0 (and all earlier versions going back to 1.0) is cause by the infamous "access to modified closure". The generated compiler output for the first foreach loop looks like

{
	IEnumerator e = ((IEnumerable)urls).GetEnumerator();
	try
	{
		Uri m; // here's your problem
		while(e.MoveNext())
		{
			m = (Uri)e.Current;
			stringFactories.Add(() => i.ToString());
		}
	}
	finally
	{
		if (e != null) ((IDisposable)e).Dispose();
	}
}

Note the Uri m is on the outside of the loop and is reused in each iteration. The problem is that the functions in stringFactories are executed on-demand and contain a closure. The closure includes a reference to that variable m (it is not copied). When m is changed by the enumerator the captured reference still points to the same memory address, so both Funcs end up pointing to the same modified closed variable.

This problem has been fixed in C# 5.0. This is the only breaking change in C# 5.0. If your code relies on this behaviour then either you're an evil genius or you have done something wrong - in either case, you should fix it.

For completeness, here is the generated code from C# 5.0:

{
	IEnumerator e = ((IEnumerable)urls).GetEnumerator();
	try
	{
		while(e.MoveNext())
		{
			Uri m; // much better
			m = (Uri)e.Current;
			stringFactories.Add(() => i.ToString());
		}
	}
	finally
	{
		if (e != null) ((IDisposable)e).Dispose();
	}
}

Now each closure references a fresh variable. For more information check out Eric Lippert's excellent blog post, Closing over the loop variable considered harmful.


Daily Quiz #007

Which is faster? Which uses less memory? Which should you use?

var s = string.Empty;
var s = "";

Answer:

They're the same, as near as makes no difference. Using "" results in a single heap allocation the first time you use it, but due to string interning this never occurs again. It's inconsequential.

Which uses less memory? Again due to string interning it makes no difference. The common misconception is that string.Empty will use less memory because it's a static field while "" will result in an allocation for each use, but interning results in a single allocation for "".

Which should you use? This is the most important bit. Use whichever you like, but establish a convention in your team and stick to it. I agree with some people that "" could be mistaken for " " but honestly with a fixed-width font I don't see it happening.

If you haven't read about string interning and you are working on an application that does a lot of string allocation/manipulation then I strongly recommend you Google it on Bing.


Daily Quiz #008

public struct Foo {
  public int i;
  public string j;
  public double k;
}
public class Program {
  public static void Main(string[] args) {
    IEnumerable foos = ReadFooFile();
    Foo referenceFoo = ReadReferenceFoo();
    foreach (var f in foos) {
      if (f.Equals(referenceFoo)) {
        Console.WriteLine("Bar");
      }
    }
  }
  // other methods
}

Why is my application so slow? What is one very simple thing I can do to speed it up?

Answer:

The default override for value types of object.Equals uses reflection to compares each field individually. This is a major performance problem if you are comparing a large number of value types. The default implementation for object.Equals on reference types is a simple reference equality.

For more details check out my old blog post (shameless plug).


Daily Quiz #009

Today's daily quiz brought to you by a real-life situation faced by a colleague. This one is tricky but important.

public partial class MainWindow : Window
{
    TaskScheduler _ui;
    public MainWindow()
    {
        InitializeComponent();
        _ui = TaskScheduler.FromCurrentSynchronizationContext();
        Loaded += new RoutedEventHandler(MainWindow_Loaded);
    }
    void MainWindow_Loaded(object sender, RoutedEventArgs e)
    {
        Task.Factory.StartNew(() =>
        {
            DoLongOperation();
        })
        .ContinueWith(t =>
            {
                DoQuickOperation();
                Task.Factory.StartNew(() =>
                {
                    DoLongOperation();
                });
            }, _ui);
    }
    // etc...
}

When we execute this code we find that the first call occurs in a background thread as expected, but the second call (nested inside a continuation) occurs on the UI thread and blocks the UI. Why? How can we resolve this?

Answer:

(This answer comes directly from a co-worker, Ben Fox. His explanation was better than my prepared one).
We're explicitly specifying a TaskScheduler which was created from the UI Synchronization Context in the first ContinueWith call, the Tasks library is helpfully re-using that context for further tasks created within that context.

It’s a feature, not a bug! (but I’m not sure why — seems unintuitive to me).

You can override this behaviour in subsequent nested calls to Task.Factory.StartNew by passing it TaskScheduler.Default, which will cuse the task to be scheduled on (I believe) the thread pool.
Read more.

And here's the correct code I prepared:

public partial class MainWindow : Window
{
    TaskScheduler _ui;
    // Store the default scheduler up front
    TaskScheduler _backgroundScheduler = TaskScheduler.Default;
    public MainWindow()
    {
        InitializeComponent();
        _ui = TaskScheduler.FromCurrentSynchronizationContext();
        Loaded += new RoutedEventHandler(MainWindow_Loaded);
    }
    void MainWindow_Loaded(object sender, RoutedEventArgs e)
    {
        Task.Factory.StartNew(() =>
        {
            DoLongOperation();
        })
        .ContinueWith(t =>
            {
                DoQuickOperation();
                Task.Factory.StartNew(() =>
                {
                    DoLongOperation();
                }, CancellationToken.None, TaskCreationOptions.None, _backgroundScheduler);
            }, _ui);
    }
    // etc...
}


Daily Quiz #010

using (var proxy = new MyWcfServiceProxy())
{
    proxy.Open();
    var result = proxy.MyServiceMethod();
    proxy.Close();
}

What have I done wrong?

Answer:

WCF clients should generally not be wrapped in using blocks because a faulted service channel will throw an exception when Close() is called (which is implicitly called by Dispose()).

If you really want to use using blocks for your WCF service clients then check out this excellent article. Otherwise, here is one correct way of managing a WCF service client.

public class CommunicationWrapper : IDisposable
    where T : ICommunicationObject
{
    private readonly T _object;
    public T Object
    {
        get { return _object; }
    }
    public CommunicationWrapper(T communicationObject)
    {
        _object = communicationObject;
    }
    public void Dispose()
    {
        ServiceUtils.Dispose(Object);
    }
}
public static class ServiceUtils
{
    public static void Dispose(ICommunicationObject communicationObject)
    {
        if (communicationObject != null)
        {
            try
            {
                if (communicationObject.State != CommunicationState.Faulted)
                    communicationObject.Close();
                else
                    communicationObject.Abort();
            }
            catch (CommunicationException)
            {
                communicationObject.Abort();
            }
            catch (TimeoutException)
            {
                communicationObject.Abort();
            }
            catch (Exception)
            {
                communicationObject.Abort();
                throw;
            }
        }
    }
}