Saturday, June 26, 2010

Function Memoization Performance

For the last month I've been very busy with wrapping up a couple of projects for my Master's Degree, but now I'm done and I have some to start posting again. In my last post I demonstrated how to memoize a function and in this post I will discuss the performance of memoization.

Memoization offers significant performance benefits when the parameters of the function are relatively constant, the memoized function can retrieve the return value almost as fast as returning a constant. When the input values are diverse then the performance can be dramatically decreased.

There were two tests conducted to determine the performance of memoization. In the first test a function was memoized and was called multiple times without changing the input parameters. In the second test, the same function was memoized but was called with a randomly varying parameter (diverse). All the measurements are in CPU ticks to eliminate interference from any other processes running on the system at the given time. Below is the sample code for the tests where the input parameters were varied:

static Func<List<int>, int, int> twoParam = new Func<List<int>, int, int>(
(v, p) =>
{
return TwoParamFunc(v, p);
}).Memoize();

static int TwoParamFunc(List<int> vals, int period)
{
int sum = 0;
foreach (int val in vals)
{
sum += ((int)Math.Sqrt((double)val) * period);
}
return sum;
}

static void PerformanceTest()
{
Random rng = new Random(DateTime.Now.Millisecond);
int numRuns = 1000;
Stopwatch sw = new Stopwatch();
List<List<int>> valArrs = new List<List<int>>();
for (int i = 1; i < 10; ++i)
{
valArrs.Add(CreateList(10 * i, 1000 * i));
}

// two param memoized
double sumTicks = 0.0;
long count = 0;
for (int i = 0; i < numRuns; ++i)
{
foreach (List<int> vals in valArrs)
{
sw.Start();
twoParam(vals, rng.Next(1, numRuns) );
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
count++;
}
}
Console.WriteLine("Avg ticks memoized:\t" + sumTicks / count);

// two param non-memoized
sumTicks = 0;
count = 0;
for (int i = 0; i < numRuns; ++i)
{
foreach (List<int> vals in valArrs)
{
sw.Start();
TwoParamFunc(vals, rng.Next(1, numRuns));
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
count++;
}
}
Console.WriteLine("Avg ticks non-memoized:\t" + sumTicks / count);
}
A memoized function with uniform input parameters takes an average of 2.29 CPU ticks to execute, while a memoized function with a varying input can be over 40 times slower. If the input is diverse enough, then the memoized function will exceed the execution time of the direct call.



The benefit of the direct call is that the amount of CPU ticks it takes to execute will never change unless the function itself changes, but the drawback is that no matter what happens the performance cost will always be incurred. With memoization the performance cost can be significantly reduced, but it requires the user to analyze the input parameters and ensure that they’re uniform.

One more thing to note is that every time a memoized function is called with new input values, those values and the output will be cached in the memoization map. The map will grow indefinitely and the memory consumption is going to grow indefinitely... in order to reduce the memory consumption the map size must be limited. I came up with a simple solution where I keep track of the how frequently each value in the map is accessed and I remove those values which are used least frequently.


private static void RemoveLeastUsed<A,R>(Dictionary<A, Pair<R, int>> map)
{
List<A> orderedKeys = (from key in map.Keys
orderby map[key].AccessCount ascending
select key).Distinct().ToList<A>();

// Ignore the most frequently used keys
orderedKeys.RemoveRange(orderedKeys.Count/2, orderedKeys.Count/2);

// Remove teh least frequently used keys from the map
foreach (A key in orderedKeys)
{
map.Remove(key);
}

// Reset the access count
foreach(Pair<R, int> p in map.Values)
{
p.AccessCount = 0;
}
}

public static Func<A, R> Memoize<A, R>(this Func<A, R> function)
{
var map = new Dictionary<A, Pair<R, int>>();
return a =>
{
R r;
Pair<R, int> pair;
lock (sync)
{
if (!map.TryGetValue(a, out pair))
{
r = function(a);
map.Add(a, new Pair<R, int>(r, 0));
}
else
{
r = pair.Result;
pair.AccessCount++;
}

if (map.Count > Properties.Settings.Default.MemoizeMapSize)
{
RemoveLeastUsed<A,R>(map);
}
}
return r;
};
}