Nerdy tidbits from my life as a software engineer

Tuesday, March 31, 2009

Very Annoying

I need to write some code that iterates through an array of IL byte code.  To do this, I need to be able to recognize a particular instruction and advance a pointer in an array.  It’s not that hard, except for one thing: there is no built-in array of OpCodes in the System.Reflection.Emit library.  This means in my code, I need to create a static array by hand just so I can iterate through it in a nice, clean, abstract way.  There are at least 100 OpCodes I’ll need to add to my list in order to do this.  So the question is, why doesn’t the OpCodes class define this array for me?  Wouldn’t that seem like a logical field to be in there?

I really hate tedious work.

Friday, March 27, 2009

In Defense Of Not Invented Here

Is it really so bad to reinvent the wheel?  Most people would say, “yes!”, and I would mostly agree.  But like all things in life, I think there are right answers, there are wrong answers, and there are questions that have no real answer.

You might think that on the surface there’s no point in writing your own continuous integration solution when there’s CruiseControl.net. But what if CruiseControl can’t do something your build process needs it to do?  Or what if it could work that way, but it would require so much work and take up so much time that the cost of making it work properly would outweigh the benefits?  At some point, you may wisely evaluate your situation and realize that, as much as you wish you could reuse somebody else's work, it’s simply not possible.  And then you’ll end up writing your own continuous integration solution – perhaps hating the fact that you have to, or maybe enjoying the experience – instead of using somebody else's.

Is that really such a terrible waste of resources?  After all, aren’t you better off having a continuous build solution than not having one at all?  Why are we lamenting the fact that somebody’s not using a solution that’s already out there when they can’t use it in the first place!

There are some things in life I am not going to write myself.  Ever.  Examples of these programs are:

  1. My operating system.
  2. My compiler.
  3. My web browser.
  4. Adobe Photoshop.
  5. Video games (yes, I will pay $50.00 to play Half Life 2, thank you very much)
  6. My printer driver (unlike Richard Stallman, I have no interest in starting a revolution in order to get my printer working again.  Nope.  I’ll just buy a new printer).

And then there are examples of things I could write myself, if I really had to:

  1. My unit testing framework.
  2. Windows calculator.
  3. Notepad.
  4. My continuous integration server.
  5. My mock framework.

Now, just to clarify: I didn’t write the list above because I want to reinvent these applications / APIs.  On the contrary.  I would prefer – much prefer – to use somebody else’s software than to write it myself.  But the software I listed above I find so essential to my productivity that if I were ever in a situation where, for whatever reason, I simply couldn’t use an existing solution, I would find it worth my time to write one myself.

And yes, I might enjoy the experience…but that’s besides the point.

So here’s my conclusion.  The people who really abhor the NIH syndrome do so because they themselves are not in a situation where they have to reinvent the wheel.  They therefore cringe at any example of not invented here because they fail to see how anybody could be in a situation where they can’t reuse other people’s work.  The common misconception, I think, among this crowd is a belief that anybody exhibiting NIH is doing so because they like to reinvent the wheel.  And I’m sure there are instances in the world where people really do like to reinvent the wheel.  But in my experience, nobody wants to waste their time.  For the same reason I have no need to waste 4 years of my life writing my own C# compiler, I will happily use Visual Studio’s, even though it would be fun to write my own.

If we never have any competition in our industry, how could things get any better?  If new innovations and ideas come out of NIH projects, and other people benefit from them, can’t we be happy with that outcome?  And isn’t it a plus that we learn skills and techniques along the way?

And, if by reinventing the wheel but sharing the solution, we end up creating a system that other people who are in similar situations can benefit from, aren’t we making a net positive impact in the world?  Our custom continuous integration solution may have started just to benefit us, but if hundreds or thousands of people in similar situations can benefit from our system, I think we would have spent our time wisely.

Thursday, March 26, 2009

Foreach vs For Statements

Have you ever looked at a foreach clause in ILDasm before?  It’s very interesting, and probably not what you expect.  Take the following simple line of code:

List<MethodExpectation> mMyList = new List<MethodExpectation>();
...
foreach(MethodExpectation expectation in mMyList)
{
}

Here’s what the compiler spits out for the foreach statement:

IL_0158:  ldarg.0
IL_0159:  ldfld      class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation> MyClass::mMyList
IL_015e:  callvirt   instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0> class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation>::GetEnumerator()
IL_0163:  stloc.s    CS$5$0003
.try
{
  IL_0165:  br.s       IL_0171
  IL_0167:  ldloca.s   CS$5$0003
  IL_0169:  call       instance !0 valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class [MyAssembly]MethodExpectation>::get_Current()
  IL_016e:  stloc.2
  IL_0171:  ldloca.s   CS$5$0003
  IL_0173:  call       instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class [MyAssembly]MethodExpectation>::MoveNext()
  IL_0178:  stloc.s    CS$4$0001
  IL_017a:  ldloc.s    CS$4$0001
  IL_017c:  brtrue.s   IL_0167
  IL_017e:  leave.s    IL_018f
}  // end .try
finally
{
  IL_0180:  ldloca.s   CS$5$0003
  IL_0182:  constrained. valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<class [MyAssembly]MethodExpectation>
  IL_0188:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
  IL_018e:  endfinally
}  // end handler

I suppose I always imagined that using IEnumerable to iterate through a list was less efficient than using a simple for loop.  But compared to the following, I suppose it’s hard to tell:

List<MethodExpectation> mMyList = new List<MethodExpectation>();
...
for(int i = 0; i < mMyList.Count; i++)
{
  MethodExpectation methodExpectation = mMyList[i];
}

You can see the difference fairly clearly:

IL_0156:  ldc.i4.0
IL_0157:  stloc.2
IL_0158:  br.s       IL_016d
IL_015a:  nop
IL_015b:  ldarg.0
IL_015c:  ldfld      class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation> MyClass::mMyList
IL_0161:  ldloc.2
IL_0162:  callvirt   instance !0 class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation>::get_Item(int32)
IL_0167:  stloc.3
IL_0168:  nop
IL_0169:  ldloc.2
IL_016a:  ldc.i4.1
IL_016b:  add
IL_016c:  stloc.2
IL_016d:  ldloc.2
IL_016e:  ldarg.0
IL_016f:  ldfld      class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation> MyClass::mMyList
IL_0174:  callvirt   instance int32 class [mscorlib]System.Collections.Generic.List`1<class [MyAssembly]MethodExpectation>::get_Count()
IL_0179:  clt
IL_017b:  stloc.s    CS$4$0001
IL_017d:  ldloc.s    CS$4$0001
IL_017f:  brtrue.s   IL_015a

This is a pretty standard for loop.  The top line sets the i variable (stored in loc2 / address CS$4$001) to 0 and breaks to 016d, which does the less than check.  If the check passes it breaks to the inside of the loop, which starts at instruction 015a (another strange nop).  The increment of i happens at the end of the loop right before the less than check occurs again.

Contrast this to using the enumerator, which is actually fairly clean considering what it’s doing.  The main thing that I wonder about is what kind of performance hit you get by setting up a try / finally block.  I actually don’t know the answer to this, but I always assumed that you get quite a big performance hit when you create an exception frame.  If that’s the case then using foreach instead of for would clearly be less desirable from a performance standpoint.

I had always figured that foreach’s would be slower since you were newing objects on the heap (via GetEnumerator()) just so you could go through a simple loop, which is obviously more work than putting an integer on the stack.  But if you add a try / finally block to that as well?  One solution might end up being so much faster than the other that, in a large program, it might end up making a very large difference.

Then again, foreach statements are very convenient.  And if that bothers you, you probably don’t want to look at the code that lambda expressions generate…

Monday, March 23, 2009

Should Reflection.Emit Code Spit Out Nop’s?

Open any assembly in ildasm.exe and you’ll see nop instructions all over the place.  Obviously, the compiler sees a need to emit these, even though they are documented to do nothing (maybe there’s some sort of optimization going on in the jitter that makes this meaningful?).  So, if I’m generating an assembly using reflection.emit, should I also spit out nop instructions, too?  To date, I’ve been skipping these as there doesn’t seem to be a need for them.  But I can’t help but wonder if there’s something I’m missing.  Is there any reason I should sprinkle my generated code with nop’s even though they don’t do anything?

Wednesday, March 18, 2009

Debugging Reflection.Emit Code Generations

It would really be nice if there was an easy way to do figure out how what the helpful “System.InvalidProgramException: Common Language Runtime detected an invalid program” exception when you have it.  The only way I’ve been able to debug problems with generated IL is by trial and error: comment out a bunch of lines of generation code, recompile, and then see if that works.  The real problem with the generated stuff is that you can’t do the usual debugging technique, which is to compare your auto-generated code with something that you generated with the compiler.  This, really, is the main difficulty with generating assemblies in .NET.

I know that you can use windbg to get to the bottom of a lot of this, but this is not a particularly intuitive way to figure things out.  It will be really nice when they finish opening up the compiler to programmatically generate types based on source code (have you seen Anders’ PDC presentation?  Very cool stuff).  That will make all of this reflection.emit stuff a thousand times easier.

Friday, March 6, 2009

A Very Useful Extension Method

I’m working on a little project that requires a lot of dynamic type conversion.  So all over the place I have code that does:

public T DoSomething()
{  
  ...
  return (T)Convert.ChangeType(GetSomeValue(), typeof(T), CultureInfo.InvariantCulture))
}

When it occurred to me that I can save myself a lot of headaches and hassle with the help of a super-simple extension method called ConvertTo<T>:

/// <summary>
/// Attempts to convert an object into another type.
/// </summary>
/// <typeparam name="T">The type to convert to</typeparam>
/// <param name="originalType">The original object before it's converted</param>
/// <returns>An instance or value of T</returns>
public static T ConvertTo<T>(this object originalType)
{
    return (T)Convert.ChangeType(originalType, typeof(T), CultureInfo.InvariantCulture));
}

So that now I can just write:

public T DoSomething()
{
  ...
  return GetSomeValue().ConvertTo<T>();
}

Which is a lot simpler, and much more convenient.  Why didn’t I think of that earlier?

Monday, March 2, 2009

Mock State Visualizer To The Rescue

I spent some time a few months ago working on an implementation for a mock state visualizer.  Basically, I now have the ability to save an html file to the output directory where the unit test results are stored that lets you visualize the expected calls and the actual calls.  Today, as I was wracking my brain trying to figure out why my expectations weren’t being fulfilled, I made use of this awesome feature for the first time.  Here’s what I found:

image

What you’re seeing here is the recorded state machine on the left, and the attempt to replay it on the right.  Each state on the left has a number of links between itself and the next state (and in all cases itself so that it can support ranged expectations, such as setting an expectation that set_LogFilePath is called between 3 and 6 times).  Each bullet on the left represents a method expectation between that state and another one.  The red lines on the replay log column are the links between states that fail to validate.  Since none of these calls repeat, there is a failing validation for each of these – but that’s OK because there are links in states 0 through 4 that can be validated.  This means that the mock framework found a way to validate the current method in each of these cases and walk from one state to the next.  This is how the validation process works.

The problem can be found in state 5.  Here, none of the links in state 5 could be validated.  This means that there is a mismatch between the expectations set in the unit test and the actual methods called in the target method.  You can see from the log that the problem is that my unit test was setting an expectation to have set_IsReplaying called, but get_IsReplaying was called instead.  Actually, in this case, the unit test is the problem, not the target code.

But wow, is a picture is worth 100,000 words.  Where as before I was having trouble figuring out which call was bad and where that call was coming from, a quick look at the chart above solves that problem.  Now that I know which call is failing and where it is, I can go fix my test and move on.

Implementing that replay log was time well spent.