Nerdy tidbits from my life as a software engineer

Wednesday, May 21, 2008

Garbage Collection Performance in the CLR

I recently did a research project at my office to obtain some statistics behind the performance impact of the CLR garbage collector after I found it difficult to find such statistics on the internet. What I discovered is very fascinating, and I instantly realized that many other people would be interested in the results of my experiments - so I decided to put it up on the website for everybody to see.

I should preface this by saying a few things about garbage collection in general. First, I am not an expert. What I know I have picked up from books and websites - just like everybody else - and it is very likely that at least a few of the things written in this article are inaccurate. That said, the results of these tests have given me enough confidence to reach a few conclusions that I think others will concur with. Also, keep in mind that these tests were designed to stress the garbage collector as much as possible, so they are not necessarily indicative of real world performance. Real-world systems, as they exist in the world today, are rarely architected with garbage collection performance in mind. And why should they? The CLR abstracts all of this from you, and as long as things are behaving fast enough, nobody really sits down and questions how things can be redesigned so that the collector can operate as efficiently as possible. That lack of knowledge in the world is part of why I had to do this research on my own: I found very little documentation on exactly how efficient (or inefficient) the garbage collector was. I have found a few resources that illustrate the real-time performance of the CLR online, as well as a few blogs that talk about optimizing the garbage collector in a general way. But I have found very little performance statistics online that have actual, tangible numbers behind the collectors performance. Even the CLR profiler, sadly, does not give you this information. So I set out to get some numbers to put behind our discussions, with the goal of finding numbers behind the worst case scenarios.

As a general rule, based on what I have read, my own empirical observations, and tests that I have run, here are some important observations I've made about garbage collection:

  • The CLR heap behaves very differently from the native heap. When a reference type is newed, the memory is simply allocated from the 'top' of the heap. This makes newing basically free - the heap is not scanned for the first continuous block of memory that can hold the amount of requested memory, as it is in native code.
  • The previous point implies something very critical for understanding garbage collection: memory can only be reclaimed with the collector runs and compacts the heap. There is simply no way to tell the CLR that you're not using memory at location x any more and that memory can now be used by somebody else. There is no .NET equivalent to delete, even if you are using C++/CLI (which lets you 'delete' CLR types, but really just calls Dispose). The IDisposeable pattern does not delete memory (I have found that this is a common misconception).
  • The garbage collector has a very important optimization based on the principle of longevity. It is built on the theory that objects that have hung around for long periods of time are likely to continue to hang around for long periods of time. This principle is really quite true, most of the time. Most user interfaces, for instance, generally allocate large amounts of objects at startup and then process events and data as it interacts with the user. However, those initial allocations for the UI never really go anywhere. So any object that is reachable from those GUI objects is extremely unlikely to stop being reachable in subsequent collections. The collector exploits this behavior and is therefore able to drastically improve performance by only looking at objects that have been recently newed.
  • The heap is split into three sections that are based on longevity. The first section is for objects that have been recently newed. Because the heap is allocated in a sequential pattern, these objects are clustered in close proximity to each other at the top of the heap and are called generation 0. When an object survives a round of garbage collection, it is promoted to generation 1, and relocated to the part of the heap that contains all of the other generation 1 objects. When a generation 1 object survives a collection, it is promoted to generation 2 - and again, relocated. The CLR decides when objects should be relocated, and this decision is beyond your control. The garbage collector will rarely, as you shall shortly see, run a full-scale collection - especially if you design your system to exploit the generational nature of the CLR. Again, this optimization is based on the fact that most dynamically created objects either hang around for a long time, or hang around for a very short amount of time. The long-living objects get shoved down to generation 2 and are rarely inspected during a collection. The short-lived objects sit at the top of the heap and are inspected constantly. The cost of cleaning up objects in generation 0, as you will see shortly, is essentially nothing. The cost of cleaning up memory in generation 2, conversely, is extremely expensive.
  • The advice you'll read from various places basically tells you that garbage collection is out of your control and can run at any time. You do not have the ability to tell the runtime not to run the collector during certain periods - or not to do anything other than, say, generation 0 collections. You also have no control over when the CLR decides it's time to run a collection. All of these things are out of your control and are really impossible to know for sure unless you happen to work at Microsoft and are part of the CLR team (I know for certain that there are all sorts of optimizations they do with, for instance, COM+ objects, that you simply can't ever know without inside knowledge).

    However, from my own observations, I have seen that the collector will not run unless you are newing data on the heap. Try it; you'll see. Load up an application - any application - that sits there and does nothing, and profile it in the CLR profiler. You'll see that the collector simply won't ever run. I have no way to verify that this is entirely the case, but based on my observations I believe it's a fair conclusion to reach. I had one test application that created a number of structs on the stack, in a loop, again and again - and in the profiler, the collector simply never ran once the application was done loading. I could run that loop all day and never get a single collection. I had another test that used object pools to grab pre-allocated object instances out of a repository, use it for a while, and return it - and the collector never ran. The point is that collections are triggered in response to something being newed on the heap. It seems to me that once the heap grows beyond a certain size - which is, again, beyond your control - the collector will run. The length and scope of that collection is beyond your control. This uncertainty is, for very good reasons, a big turn-off for applications that are extremely time-critical.
So, from the preceding text there are a few observations that can be made about optimizing the collector. First, if you don't want the collector to run constantly, don't put new objects on the heap unless you need to. Use structs, object pools, or simply re-use objects that have already been newed - but avoid constantly newing things. Second, be knowledgeable of the how the collector is optimizing based on object lifetimes. The basic principle to remember is that the longer something hangs around in the application, the more expensive it becomes to promote it to older generations and then to clean it up. Generally, you want to try and divide objects into two categories: objects that have short lifespans, and objects that have long lifespans. Where you will run into trouble, as you are about to see, is when you start mixing the two generations constantly.

The Test Data

This particular test started as an investigation into the efficiency of a debug logging system that we have at work. The logger component begins when the application starts up. It installs a system tray icon that, when clicked, pops up a form with a list box full of debug messages. The messages are entered into the system from all over the application. They contain various diagnostic information that can be very helpful in finding and fixing bugs. It is a very helpful tool for the developers, but the question I had was, is it efficient from a garbage collection perspective?

So for this test, I took a block of code from the logger that was doing a fair amount of string manipulation. Strings can be very inefficient abusers of your heap, since they are immutable and must be newed and copied whenever they are modified (which is why StringBuilder's are your friend). The first question I wanted to answer was, what is the CLR doing when we log messages? How frequently is it putting objects on the heap and causing collections to occur? So I set up a test where I invoked the logger 100 times a second, and tracked the results in the CLR profiler. This is what came out of it:

CLRProfiler1

CLRProfiler2

What this showed me is that every 15 seconds or so - roughly every 1500 log messages - the collector would run. That was useful information, sure, but how long did the collector run for? Did the collector take up enough time to impact performance in any tangible way? To find that out, I took the logger code and moved into another test application that used PerformanceCounter's to keep track of the percentage of time that the collector took over the course of the application to clean up the heap. Here's the program:

/// <summary>
/// A test application that examines the performance of the CLR's garbage collector.
/// </summary>
/// <param name="args"></param>
public static void Main(string[] args)
{
    DateTime startTime = DateTime.Now;

    PerformanceCounter mTimeInGCCounter = new PerformanceCounter();
    mTimeInGCCounter.CategoryName = ".NET CLR Memory";
    mTimeInGCCounter.CounterName = "% Time in GC";
    mTimeInGCCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen0CollectionsCounter = new PerformanceCounter();
    mNumberOfGen0CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen0CollectionsCounter.CounterName = "# Gen 0 Collections";
    mNumberOfGen0CollectionsCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen1CollectionsCounter = new PerformanceCounter();
    mNumberOfGen1CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen1CollectionsCounter.CounterName = "# Gen 1 Collections";
    mNumberOfGen1CollectionsCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen2CollectionsCounter = new PerformanceCounter();
    mNumberOfGen2CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen2CollectionsCounter.CounterName = "# Gen 2 Collections";
    mNumberOfGen2CollectionsCounter.InstanceName = "_Global_";

    Console.WriteLine("Starting test.  Press the enter key to stop.");

    bool isRunning = true;
    Semaphore semaphore = new Semaphore(0, 1);
    
    ThreadPool.QueueUserWorkItem((WaitCallback)((o) =>
    {
        startTime = DateTime.Now;
        bool alternate = false;

        while(isRunning)
        {
            // Log code copied from elsewhere:
            string message = "0|:1:";
            message += DateTime.Now.Ticks.ToString();
            message += ":Module:SomeMessage";

            string deliminator = "|";
            string[] parts = message.Split(deliminator.ToCharArray(), 2);

            uint systemLevel = Convert.ToUInt32(parts[0]);

            deliminator = ":";
            parts = parts[1].Split(deliminator.ToCharArray(), 5);
            uint messageLevel = Convert.ToUInt32(parts[1]);
            long fileTime = Convert.ToInt64(parts[2]);
            DateTime timeStamp = DateTime.FromFileTime(fileTime);
            string moduleName = parts[3];
            string messageData = parts[4];

            string timeString = timeStamp.ToString();

            string completeMessage = String.Format(
                "{0}{1}|{2} - {3} - {4}",
                messageLevel,
                systemLevel,
                timeString,
                moduleName,
                messageData);

        }

        semaphore.Release();

    }));

    string line = System.Console.ReadLine();
    isRunning = false;
    semaphore.WaitOne();
    TimeSpan appTime = DateTime.Now - startTime;

    float percentTimeInGC = mTimeInGCCounter.NextValue();
    int gen0CollectionCount = (int)mNumberOfGen0CollectionsCounter.NextValue();
    int gen1CollectionCount = (int)mNumberOfGen1CollectionsCounter.NextValue();
    int gen2CollectionCount = (int)mNumberOfGen2CollectionsCounter.NextValue();

    double msInGC = appTime.TotalMilliseconds * percentTimeInGC;

    StringBuilder messageBuilder = new StringBuilder();
    messageBuilder.Append("Total time of test: ");
    messageBuilder.Append(appTime.TotalMilliseconds);
    messageBuilder.Append("ms");
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Total time in garbage collection: ");
    messageBuilder.Append(Convert.ToString(msInGC));
    messageBuilder.Append("ms (");
    messageBuilder.Append(percentTimeInGC * 100);
    messageBuilder.Append("%)");
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 0 collections: ");
    messageBuilder.Append(Convert.ToString(gen0CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 1 collections: ");
    messageBuilder.Append(Convert.ToString(gen1CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 2 collections: ");
    messageBuilder.Append(Convert.ToString(gen2CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    double msPerGC = msInGC;
    msPerGC /= (gen0CollectionCount + gen1CollectionCount + gen2CollectionCount);

    messageBuilder.Append("Time per collection: ");
    messageBuilder.Append(Convert.ToString(msPerGC));
    messageBuilder.Append("ms");

    Console.WriteLine(messageBuilder.ToString());

}

(A quick note: the results in the following screenshots do not exactly mirror the preceding code because the code output has evolved a bit since the screenshots were taken. But the meat of the application is identical, and the results are genuine).

The purpose of this test is to see what happens when you really push the collector into running all the time, which is why the allocation thread is let loose and run repeatedly until the enter button is pressed. Here are the results from the program:

GCStats1 

I realize that this image is difficult to see, so here's what it says: the program ran for 103.59 seconds, and spent 8% of that time collecting garbage (about 8.6 seconds). The collector ran generation 0 collections 9764 times (99.5%), generation 1 collections 43 times (.39%), and generation 2 collections 10 times (.11%). The average time it took to do a single collection was .88165ms.

Read that again: .88165 milliseconds. That's it.

Now, this is not a real-world application. It has a very small footprint and doesn't do anything except make lots and lots of allocations on the heap, which can easily be discarded because they are all generation 0 objects at the top of the heap. Clearly, this test shows that the cost of cleaning up a small heap - even in an abusive program - is basically nothing.

So I ran another test. This time, I wanted to see what would happen if the application consumed more memory and generated these log messages at the same rate. So I modified the program so that it first allocated roughly 150mb of space on the heap before it ran the test. Not that surprisingly, performance was affected - but not quite as much as you might think:

GCStats4

Now I should admit that I haven't put this test into the profiler, so it's quite possible that once the list reached a certain size, it was placed in the large object heap which may have had a big positive impact on these numbers. In any case, however, the numbers really don't change too much. In this example, the collector ran 15075 times - 99.68% of those collections were generation 0, 0.23% of them were generation 1, 0.9% of them were generation 2, and the average collection took 1.37ms. To me this means that the collector runs efficiently even when the size of the application grows, basically by keeping the number of full-scale collections down to an absolute minimum and really only dealing with generation 0 objects unless it has no choice.

All of these data are interesting, but still not quite real-world. The real question I was trying to answer was how efficient the logger is. In this example, strings are allocated on the heap but are then discarded because the program is not storing the newly created references anywhere. The logger, however, is storing these strings: it is putting them in a ListBox, which is a generation 2 object that lives throughout the lifetime of the application. The newly created strings are generation 0 objects that are created from all over the system and then sent to the logger where they are aggregated. What happens when we create new objects constantly and shove them into a long-living object?

So I modified the code to store half of every string it created. No other modifications. I was purely curious to see what would happen if we started hanging onto some of these objects and forced the collector to do some relocations. Here's the code:

/// <summary>
/// A test application that examines the performance of the CLR's garbage collector.
/// </summary>
/// <param name="args"></param>
public static void Main(string[] args)
{
    DateTime startTime = DateTime.Now;
    List<string> mMessages = new List<string>();

    PerformanceCounter mTimeInGCCounter = new PerformanceCounter();
    mTimeInGCCounter.CategoryName = ".NET CLR Memory";
    mTimeInGCCounter.CounterName = "% Time in GC";
    mTimeInGCCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen0CollectionsCounter = new PerformanceCounter();
    mNumberOfGen0CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen0CollectionsCounter.CounterName = "# Gen 0 Collections";
    mNumberOfGen0CollectionsCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen1CollectionsCounter = new PerformanceCounter();
    mNumberOfGen1CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen1CollectionsCounter.CounterName = "# Gen 1 Collections";
    mNumberOfGen1CollectionsCounter.InstanceName = "_Global_";

    PerformanceCounter mNumberOfGen2CollectionsCounter = new PerformanceCounter();
    mNumberOfGen2CollectionsCounter.CategoryName = ".NET CLR Memory";
    mNumberOfGen2CollectionsCounter.CounterName = "# Gen 2 Collections";
    mNumberOfGen2CollectionsCounter.InstanceName = "_Global_";

    Console.WriteLine("Starting test.  Press the enter key to stop.");

    bool isRunning = true;
    Semaphore semaphore = new Semaphore(0, 1);
    
    ThreadPool.QueueUserWorkItem((WaitCallback)((o) =>
    {
        startTime = DateTime.Now;
        bool alternate = false;

        while(isRunning)
        {
            // Log code copied from elsewhere:
            string message = "0|:1:";
            message += DateTime.Now.Ticks.ToString();
            message += ":Module:SomeMessage";

            string deliminator = "|";
            string[] parts = message.Split(deliminator.ToCharArray(), 2);

            uint systemLevel = Convert.ToUInt32(parts[0]);

            deliminator = ":";
            parts = parts[1].Split(deliminator.ToCharArray(), 5);
            uint messageLevel = Convert.ToUInt32(parts[1]);
            long fileTime = Convert.ToInt64(parts[2]);
            DateTime timeStamp = DateTime.FromFileTime(fileTime);
            string moduleName = parts[3];
            string messageData = parts[4];

            string timeString = timeStamp.ToString();

            string completeMessage = String.Format(
                "{0}{1}|{2} - {3} - {4}",
                messageLevel,
                systemLevel,
                timeString,
                moduleName,
                messageData);

            string trimmedString = completeMessage.Trim();
            if(alternate)
                mMessages.Add(trimmedString);

            alternate = !alternate;

        }

        semaphore.Release();

    }));

    string line = System.Console.ReadLine();
    isRunning = false;
    semaphore.WaitOne();
    TimeSpan appTime = DateTime.Now - startTime;

    float percentTimeInGC = mTimeInGCCounter.NextValue();
    int gen0CollectionCount = (int)mNumberOfGen0CollectionsCounter.NextValue();
    int gen1CollectionCount = (int)mNumberOfGen1CollectionsCounter.NextValue();
    int gen2CollectionCount = (int)mNumberOfGen2CollectionsCounter.NextValue();

    double msInGC = appTime.TotalMilliseconds * percentTimeInGC;

    StringBuilder messageBuilder = new StringBuilder();
    messageBuilder.Append("Total time of test: ");
    messageBuilder.Append(appTime.TotalMilliseconds);
    messageBuilder.Append("ms");
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Total time in garbage collection: ");
    messageBuilder.Append(Convert.ToString(msInGC));
    messageBuilder.Append("ms (");
    messageBuilder.Append(percentTimeInGC * 100);
    messageBuilder.Append("%)");
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 0 collections: ");
    messageBuilder.Append(Convert.ToString(gen0CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 1 collections: ");
    messageBuilder.Append(Convert.ToString(gen1CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    messageBuilder.Append("Number of gen 2 collections: ");
    messageBuilder.Append(Convert.ToString(gen2CollectionCount));
    messageBuilder.Append(Environment.NewLine);

    double msPerGC = msInGC;
    msPerGC /= (gen0CollectionCount + gen1CollectionCount + gen2CollectionCount);

    messageBuilder.Append("Time per collection: ");
    messageBuilder.Append(Convert.ToString(msPerGC));
    messageBuilder.Append("ms");

    Console.WriteLine(messageBuilder.ToString());

}

I ran this code for around 30 seconds, and here are the truly amazing results:

GCStats6 

That simple difference of holding onto half of the allocated strings increases the percentage of large collections dramatically. In this test, the collector ran 2706 times, in which only 76% were generation 0 collections, and nearly 22.7% of them were generation 1 collections. Most importantly, where as the first program spent just 10% of its time collecting garbage, this version of the application spent more than 80% of its time cleaning up the heap. And the time per collection shot up 800%, from roughly 1ms to more than 8ms.

Data Analysis

The data show, pretty conclusively, that where you run into performance problems in the CLR is when bytes need to be relocated on the heap. The first two examples show that a larger memory footprint does not impact performance very much. The contrast, however, between the first two tests and the third is striking. If you want to kill performance, there really is no better way than to have long-living objects like a linked list - or a list box - slowly aggregate new objects over time. The reason why is that the collector needs to copy memory around on the heap and compact it repeatedly, causing new objects - in this case, strings - to get promoted to generation 1 and eventually, generation 2. Each time the collector runs, these objects need to be relocated - and the more objects you hang onto, the longer it will take to relocate these objects.

So, my conclusion is that the logger is, in fact, very inefficient. Obviously it's not getting log messages at the rate that this test application is producing them, but since the ListBox is a long-living object, aggregating many short lives objects, these short-lived objects slowly have to move up the generational ladder each time the collector runs, until eventually, they reach generation 2. Sadly, there is another performance problem with the logger: in order to prevent its size from growing too large, it starts discarding the oldest strings once it reaches a certain size. Suppose that by the time it starts discarding these strings they have already been promoted to generation 2. This implies that the cost of cleaning them up is extremely high - because the entire heap has to be scanned in order for these strings to be released.

Since our application is extremely performance sensitive, we have decided to explore other solutions.

Thursday, May 8, 2008

Fun With Pointers in C#

We're starting a new project at my office to build a new X-ray machine. One of the requirements for the system is that it needs to render frames off of the machine at 60 frames per second, which gives us very little time to do any processing on the data. I've been spending the last week or so investigating various strategies for controlling a .NET applications memory footprint to see whether or not we can use the framework or not. It's been an interesting research project, and its produced a number of ideas and answers that will help us figure out what to do next.

I'll post some thoughts on garbage collection and efficient memory use in the .NET world at some point, but for now I want to focus on an interesting compiler issue that I just rant into.
The .NET framework exposes a struct called IntPtr, which is supposed to encapsulate a platform specific pointer. IntPtr can be created out of the memory address of a value type. IE, IntPtr allows you to do things like this:

public unsafe void Test()
{
    int i = 10;

    IntPtr pointer = new IntPtr(&i);
    int* pI =(int*)pointer.ToPointer();

    *pI = 20;
}

I am not entirely sure what the benefit is of passing around IntPtr's instead of int*'s. From what I can tell, IntPtrs give you better interoperability with native API's because they can point to anything. There are number of intriguing methods in System.Runtime.InteropServices.Marshal that let you allocate space on the native heap using IntPtrs. For instance, you can allocate n bytes on the native heap using the AllocHGlobal method, which returns an IntPtr to the memory address of the newly allocated memory. One important thing to note about pointers and the .NET framework is that, as far I have been able to determine, there is simply no way to place a managed class anywhere other than the managed heap. This means that a managed type cannot be referenced by a pointer, since a managed type can be relocated. This is an unfortunate restriction. Even in the C++/CLI world, there is no way to place a managed type on the native heap and reference it by a pointer. I'm having difficulty understanding what the purpose of that language really is. Other than providing a migration path for legacy systems and making interoperability between native and managed easier, I cannot see any real benefit from writing managed code in C++/CLI. But I'm probably missing something.

Anyways, back to IntPtr's. IntPtr's are created out of raw addresses, and must be cast as a pointer to whatever type it is pointing to. Since there is no way to get the address of a managed class, this means that the only pointers you can create and reference in a C# application are to value types. But that value type can live anywhere - it can be on the call stack, the managed heap, or the unmanaged heap. IntPtr's can point to memory that exists in any of these spaces, but you run into a pretty serious problem, as you can see from the code above, when you start casting void*'s into specific types. This is bad design 101. Wouldn't it be nice if we could be type-safe with these pointers? Enter the following code....which does not compile:

/// <summary>
/// This is a type safe wrapper of an IntPtr.
/// </summary>
/// <typeparam name="T"></typeparam>
public unsafe struct SafePtr<T> where T : struct
{
    /// <summary>
    /// This is the pointer to a memory location in the program.
    /// </summary>
    private IntPtr mPointer;

    /// <summary>
    /// Creates a SafePtr out of the address of a specified location of T.
    /// </summary>
    /// <param name="pointer">An IntPtr that we are holding onto.</param>
    public SafePtr(IntPtr pointer)
    {
        mPointer = pointer;
    }

    /// <summary>
    /// Returns a pointer to a T.
    /// </summary>
    /// <returns>A pointer to an instance of T.</returns>
    public T* ToPointer()
    {
        return (T*)mPointer.ToPointer();
    }

}

The compiler spits out, 'Cannot take the address of, get the size of, or declare a pointer to a managed type ('T')', which would certainly make sense if T was a class. But, the where clause on the class declaration explicitly states that T is a struct, and you can take the address of, get the size of, and declare a pointer to a struct. This, it seems, should compile. But it does not, which means that any solution involving generics and pointers is likely to run into problems with this sort of error.

As a general rule, I've noticed that the .NET compiler does not seem to take the where clause on generic arguments into account when it does its semantic analysis. I'm sure that's not entirely true, but this seems to be a good example of something that should properly compile.

So once again, any solution that involves using the native heap or some mix of these two worlds seems impractical. It would really be nice if there was a way to have more control over memory in the .NET world. As it stands, there are good strategies that we have been talking about to minimize the number of allocations on the heap and the impact of garbage collections, but it would really be nice if there was some way to explicitly allocate and free data.