Tuesday, May 18, 2010

Function Memoization and Genetic Programming

In a GP nearly all of the computation is in the evaluation of the expression trees and any performance bumps in the area are always welcome. We can do a lot of optimization in the code, but we can only optimize to a certain point and after that we just have to take the performance hit for the function execution. Function memoization (not memorization) can be very helpful in this case. Wikipedia defines memoization as:
"an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs."

Genetic Programming is a particularly good domain for memoization, because you generally have a lot of specimens executing the same functions by providing similar or identical input parameters and the expected output only varies within the range of those parameters.

We've all seen the memoization examples with the Fibonacci sequence, but our world consists of more than just one parameter functions: we have two, three, four and more parameter functions, so we need a solution that provides memoization for a function with arbitrary number of parameters. Perhaps arbitrary number of parameters is an overstatement, but we should be able to at least get the same number of parameters as the C# Func delegate (up to four parameters). Of course if you want to memoize functions with more parameters, then you have to make your own Func-style delegate that supports more parameters.

First we'll start with the standard one parameter and one return value memoization function and we'll build on top of that:

public static Func<A, R> Memoize<A, R>(this Func<A, R> function)
{
var map = new Dictionary<A, R>();
return a =>
{
R r;
if (!map.TryGetValue(a, out r))
{
r = function(a);
map.Add(a, r);
}
return r;
};
}
As you can see there is nothing new there, but in order for two, three and four parameter functions to be memoizated we'll have to use anonymous types, lambdas and some tricky yet very innocent-looking programming :
private static Func<T, R> CastByExample<T, R>(Func<T, R> f, T example)
{
return f;
}

// Two parameter memoize function
public static Func<A, B, R> Memoize<A, B, R>(this Func<A, B, R> function)
{
var example = new { A = default(A), B = default(B) };
var tuplified = CastByExample(t => function(t.A, t.B), example);
var memoized = tuplified.Memoize();
return (a, b) => memoized(new { A = a, B = b });
}
When I first saw that I though it was a bunch of nonsense, but there is a method to the madness!

You can read up on the CastByExample function here and tuples + anonymous types here, but I'll tell you what it specifically achieves for us: it creates a tuple out of the input parameters, so since all the parameters are now a tuple we can pass the tuple into the one-param memoize function and the entire tuple will be treated as the key in the dictionary (i.e. all the parameters are automatically a key). In .NET 4.0 there is actually a Tuple class, but since I'm using .NET 3.5 I had to make do with anonymous types.

Here are the three and four parameter memoize functions:
// Three parameter memoize function
public static Func<A, B, C, R> Memoize<A, B, C, R>(this Func<A, B, C, R> function)
{
var example = new { A = default(A), B = default(B), C = default(C) };
var tuplified = CastByExample(t => function(t.A, t.B, t.C), example);
var memoized = tuplified.Memoize();
return (a, b, c) => memoized(new { A = a, B = b, C = c });
}

// Four parameter memoize function
public static Func<A, B, C, D, R> Memoize<A, B, C, D, R>(this Func<A, B, C, D, R> function)
{
var example = new { A = default(A), B = default(B), C = default(C), D = default(D) };
var tuplified = CastByExample(t => function(t.A, t.B, t.C, t.D), example);
var memoized = tuplified.Memoize();
return (a, b, c, d) => memoized(new { A = a, B = b, C = c, D = d });
}
That exhausts all the overloads of Func, so I'll show you a quick preview of how you can use these functions:

static Func<List<int>, int, int> twoParam = new Func<List<int>, int, int>(
(v, p) => { return TwoParamFunc(v, p); });

static int TwoParamFunc(List<int> vals, int period)
{
int sum = 0;
foreach (int val in vals)
{
sum += ((int)Math.Sqrt((double)val)*period);
}
return sum;
}

static List<int> CreateList(int start, int end)
{
List<int> vals = new List<int>();
for (int i = start; i < end; ++i)
{
vals.Add(i);
}
return vals;
}

static void Main(string[] args)
{
if (TwoParamFunc(CreateList(0, 1000), 10) == twoParam(CreateList(0, 1000), 10))
{
Console.WriteLine("The return values are correct!");
}
Console.ReadKey();
}
In the next blog I'll show you some of the tests and the performance analysis... I hope you enjoyed this.

References:

No comments:

Post a Comment