.NET Daily Quiz Archive - week 7

.NET Daily Quiz #031

Note: this question and the next one generated a lot of intense discussion in my office, which was awesome. See if they do in yours, too!

I need my Javascript application to be fast. Really fast. There are some major performance bottlenecks for this program running on V8. Where are the bottlenecks and why does this cause my program to run slow?
(note: let’s pretend my algorithms are fine. We’re looking for bottlenecks in terms of the way my application is interpreted and compiled)

var Vector = function (x, y) {
    this.x = x;
    this.y = y;
    this.length = function() {
        return Math.sqrt(this.x * this.x + this.y + this.y);
    }
}
var vectors = [];
// some stuff to populate a huge contiguous list of points
for (var i = 0; i < vectors.length; i++) {
    // do some vector processing
    if (vector[i].length() > 10) {
        vector[i].z = 5.0;
    }
}
// do some stuff with my vectors
for (i = 0; i < vectors.length; i++) {
    if (vectors[i].length() > 15.0) {
        vectors[i].label = "large";
    }
}
vectors.push({ x: 1.0, y: 2.0 });
// keep working with vector list

Answer

This one is a little more thorny than most quiz questions, but I wanted to share some interesting stuff I found in a Google IO talk from earlier in the year, Breaking the Javascript Speed Limit with V8. I specified V8 for this question because I know for a fact that this is how it works, but I suspect other engines will generate machine code in a similar way. It probably can’t hurt to apply these techniques regardless of your target platform. It would be really interesting if someone put together some benchmarks to show whether or not this stuff has an effect cross-platform.

I’m not going to go into as much detail as Daniel Clifford did in his talk, so I highly recommend watching it.

The most significant and obvious bottleneck in this example occurs inside the first loop. We are conditionally appending a z field to Vector instances. This is a no-no. The first time the Vector constructor is called, the interpreter will generate a hidden class (at run time, not compile time) that represents the data inside a Vector. This class will be used when working with Vectors going forward. Now if we modify any instance of Vector a new hidden class must be generated.

This is a costly operation and it also results in us losing any optimizations that have been optimistically applied to our Vectors. V8 will apply optimizations under the assumption that things aren’t going to change from under it. If things do change (such as a hidden class change) these optimizations must be thrown out.

Another hidden class change is applied further down when we apply the label field to some vectors.

A couple more little issues that may or may not be lurking in this code:
Are x and y doubles or integers? Integers in V8 are represented as “Small Integers”, or SMIs. A SMI is a 31-bit integer and they are very fast to work with. When a supplied numeric value cannot be represented as a SMI (it’s not an integer or it overflows) it must be represented as a “boxed double”. In this case the memory previously used for the SMI is now a 31-bit object pointer to a boxed double-precision float.

How is the array being prepared?
If an array is initialized to be very big but sparse, for example by using "new Array(128);" then the compiler will use a less efficient data structure because it expects a sparse array (because that’s exactly what you’ve told it to expect). Similarly you should always fill an array in order instead of leaving lots of holes. Contiguous arrays are much faster than sparse arrays when you’re expecting contiguous data at the end.

This has already gone on much longer than I expected, so I’ll leave it here. Expect more Javascript performance questions in the near future. In the meantime, prepare yourself by watching Breaking the JavaScript Speed Limit with V8


.NET Daily Quiz #032

Back to C# (for now).
Is this a valid overload?

interface Foo
{
    void F();
    void F(int i);
    void F(out int i);
}

How about this?

interface Foo
{
    void F();
    void F(int i);
    void F(ref int i);
}

Is this?

interface Foo
{
    void F();
    void F(int i);
    int  F(out int i);
}

Or this?

interface Foo
{
    void F();
    void F(int i);
    void F(out int i);
    void F(ref int i);
}

Why/why not? Do you know the specific rules that determine a method signature?

Answer

The only overload that will fail is

interface Foo
{
    void F();
    void F(int i);
    void F(out int i);
    void F(ref int i);
}

The reason this overload fails is because of a special rule in the C# spec (section 3.6, Signatures and overloading). The definition of a signature is (basically) composed of the method name, type (generic) parameters and the ordered list of parameters (taking into account type and kind, ie value, reference and output). The method return type is NOT part of the method signature. Neither is a params modifier (the params parameter is part of the signature, the modifier is not), nor are type parameter constraints.

A correctly defined method overload must have a unique method signature. Although ref and out modifiers are considered part of the method signature and are distinct, this is an additional rule on overloads — method overload signatures must not differ only in ref and out modifiers. More specifically, a compile-time error occurs if all ref modifier are changed to out modifiers and there is a method signature collision.

Interestingly, the reason for this rule is so that C# programs can be easily translated to run on the CLI. The CLI doesn’t support the same ref and out semantics as C#.

For more detail check out the C# spec section 3.6.


.NET Daily Quiz #033

How many boxing/unboxing operations occur in the following program?

using System.Collections;
using System.Collections.Generic;
using System;
public class Boxer
{
    public static void Main(string[] args)
    {
        int i = 0;
        int j = 0;
        i.ToString();
        ((IConvertible)i).ToInt32(null);
        List l1 = new List();
        l1.Add(i);
        j = l1[0];
        List l2 = new List();
        l2.Add(i);
        j = (int)l2[0];
        ArrayList l3 = new ArrayList();
        l3.Add(i);
        j = (int)l3[0];
        DoThings(i);
    }
    static void DoThings(T t)
    {
        Console.WriteLine(t.ToString());
        List l = new List();
        l.Add(t);
    }
}

Answer

The quick answer here is 3 boxings and 2 unboxings.

Int32 implements its own version of ToString() so no boxed copy is required (more on this in tomorrow’s(?) quiz).

Casting i to IConvertible causes a boxed copy to be created (ToInt32() is implemented explicitly on all numeric types in .NET, so the cast is necessary).

Inserting a retrieving items from List requires no boxed copies — the internal data structure is a simple int[], itself a reference type but capable of containing value types.

List uses object[] as its internal data structure. Clearly this array must store references to types (otherwise how could it possibly allocate the correct amount of memory for any type of object). All value types placed inside a List must be boxed into a reference type for storage and unboxed for retrieval.

The same argument for the prior applies to ArrayList, which is backed by object[].

What about DoThings()? The answer is, there is no boxing or unboxing applied here. Thanks to the magic of generics, even the List is instantiated correctly as a class backed by int[] and therefore storing the correct value types. ToString() is called virtually (as always) but a value type dereference results in no boxing.

Two things to consider about this relatively simple example:

  • All .NET numeric types implement IConvertible explicitly. What are the performance implications of this? Some people believe this was a mistake in the design of the framework — do you agree?
  • You could very easily have answered this quiz by opening the compiled application in ildasm.exe and doing a search for “box” and “unbox”. Is it always this easy to find instances of boxing? (hint: no — wait for tomorrow’s(?) quiz to see why).

.NET Daily Quiz #034

Following on from yesterday’s quiz, is there a boxing operation in the following code? If yes, where and why?

public class Boxer
{
    public static void Main(string[] args)
    {
        ValueType t = default(ValueType);
        var str = t.ToString();
        System.Console.WriteLine(str);
    }
}
public struct ValueType
{
}

Answer

The answer is yes, there is indeed a boxing operation. If you compile this code and open it in ildasm you’ll find the following IL in the main method:

.method public hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       31 (0x1f)
  .maxstack  1
  .locals init (valuetype ValueType V_0,
           string V_1)
  IL_0000:  nop
  IL_0001:  ldloca.s   V_0
  IL_0003:  initobj    ValueType
  IL_0009:  ldloca.s   V_0
  IL_000b:  constrained. ValueType
  IL_0011:  callvirt   instance string [mscorlib]System.Object::ToString()
  IL_0016:  stloc.1
  IL_0017:  ldloc.1
  IL_0018:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_001d:  nop
  IL_001e:  ret
} // end of method Boxer::Main

So if there is no box opcode, where is the boxed copy? The clue is in the virtual function call to ToString(). Just before the ToString() call you can see the constrained opcode. According to the documentation for constrained:

If thisType is a value type and thisType does not implement method then ptr is dereferenced, boxed, and passed as the 'this' pointer to the callvirt method instruction.

So all of this is just a fancy way of saying that a boxed copy of a value type is always created when you call the virtual members on Object, if and only if your value type does not implement these virtual members. The relevant Object members are GetHashCode(), Equals() and ToString(). Implement these methods to avoid unwanted boxed copies!


.NET Daily Quiz #035

Note: Unless otherwise specified all SQL questions will apply to T-SQL on the latest version of MS SQL Server.

I’m trying to retrieve all log entries with a non-null severity level that is not level 1 or 2. Why isn’t my query returning anything?

select [Time], [Text] from [Logs] where [Logs].LogLevel not in (1,2,null)

Answer

SQL uses three-valued logic for logical predicates like this one. This means that in addition to the usual TRUE and FALSE logical values we have a NULL value. NULL is not the same as FALSE, but it does evaluate to NOT TRUE. So NULL = FALSE is false, as is NULL = TRUE, but NULL != TRUE is true. This causes the NOT IN predicate to fail on all NULL values (and every comparison involves a null comparison AND’ed together with comparison to 1 and 2 in our predicate).

A colleague pointed out to me that you can “resolve” this issue in T-SQL by using SET ANSI_NULLS OFF. He also correctly pointed at that you should never do this. According to the documentation:

In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.

There were a couple of alternative queries posted. One colleague offered this

Select [Time], [Text] from [Logs] where (ISNULL([Logs].LogLevel , -1) NOT IN (-1, 1, 2))

And another gave me:

select [Time], [Text] from [Logs] where [Logs].LogLevel not in (1,2) and [Logs].LogLevel is not null

I’m not enough of a SQL expert to know which is the better solution, but they both seem to work correctly.