Sunday, August 1, 2010

Genetic Program Generated Strategy

I've been experimenting with Genetic Programming for some time now and we're starting to see some interesting results. I recently tried some USD/EUR currency data with 350 days training and 150 days validation, one of the resulting strategies was:

COS(ADD(Low, Close))

In simple words, add the low and the close then take the cosine of the result. A value greater than 0 is a Buy signal, a value less than 0 is a Sell signal, and a value equal to 0 is a Hold signal.

Over the 150 validation days this strategy made 13.7% and projected over a year it yielded around 25% profit. In the last year the USD/EUR has gone up about 9.2% and using that simple strategy is projected to produce 2.5 times more gains than the buy and hold.

Saturday, June 26, 2010

Function Memoization Performance

For the last month I've been very busy with wrapping up a couple of projects for my Master's Degree, but now I'm done and I have some to start posting again. In my last post I demonstrated how to memoize a function and in this post I will discuss the performance of memoization.

Memoization offers significant performance benefits when the parameters of the function are relatively constant, the memoized function can retrieve the return value almost as fast as returning a constant. When the input values are diverse then the performance can be dramatically decreased.

There were two tests conducted to determine the performance of memoization. In the first test a function was memoized and was called multiple times without changing the input parameters. In the second test, the same function was memoized but was called with a randomly varying parameter (diverse). All the measurements are in CPU ticks to eliminate interference from any other processes running on the system at the given time. Below is the sample code for the tests where the input parameters were varied:

static Func<List<int>, int, int> twoParam = new Func<List<int>, int, int>(
(v, p) =>
{
return TwoParamFunc(v, p);
}).Memoize();

static int TwoParamFunc(List<int> vals, int period)
{
int sum = 0;
foreach (int val in vals)
{
sum += ((int)Math.Sqrt((double)val) * period);
}
return sum;
}

static void PerformanceTest()
{
Random rng = new Random(DateTime.Now.Millisecond);
int numRuns = 1000;
Stopwatch sw = new Stopwatch();
List<List<int>> valArrs = new List<List<int>>();
for (int i = 1; i < 10; ++i)
{
valArrs.Add(CreateList(10 * i, 1000 * i));
}

// two param memoized
double sumTicks = 0.0;
long count = 0;
for (int i = 0; i < numRuns; ++i)
{
foreach (List<int> vals in valArrs)
{
sw.Start();
twoParam(vals, rng.Next(1, numRuns) );
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
count++;
}
}
Console.WriteLine("Avg ticks memoized:\t" + sumTicks / count);

// two param non-memoized
sumTicks = 0;
count = 0;
for (int i = 0; i < numRuns; ++i)
{
foreach (List<int> vals in valArrs)
{
sw.Start();
TwoParamFunc(vals, rng.Next(1, numRuns));
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
count++;
}
}
Console.WriteLine("Avg ticks non-memoized:\t" + sumTicks / count);
}
A memoized function with uniform input parameters takes an average of 2.29 CPU ticks to execute, while a memoized function with a varying input can be over 40 times slower. If the input is diverse enough, then the memoized function will exceed the execution time of the direct call.



The benefit of the direct call is that the amount of CPU ticks it takes to execute will never change unless the function itself changes, but the drawback is that no matter what happens the performance cost will always be incurred. With memoization the performance cost can be significantly reduced, but it requires the user to analyze the input parameters and ensure that they’re uniform.

One more thing to note is that every time a memoized function is called with new input values, those values and the output will be cached in the memoization map. The map will grow indefinitely and the memory consumption is going to grow indefinitely... in order to reduce the memory consumption the map size must be limited. I came up with a simple solution where I keep track of the how frequently each value in the map is accessed and I remove those values which are used least frequently.


private static void RemoveLeastUsed<A,R>(Dictionary<A, Pair<R, int>> map)
{
List<A> orderedKeys = (from key in map.Keys
orderby map[key].AccessCount ascending
select key).Distinct().ToList<A>();

// Ignore the most frequently used keys
orderedKeys.RemoveRange(orderedKeys.Count/2, orderedKeys.Count/2);

// Remove teh least frequently used keys from the map
foreach (A key in orderedKeys)
{
map.Remove(key);
}

// Reset the access count
foreach(Pair<R, int> p in map.Values)
{
p.AccessCount = 0;
}
}

public static Func<A, R> Memoize<A, R>(this Func<A, R> function)
{
var map = new Dictionary<A, Pair<R, int>>();
return a =>
{
R r;
Pair<R, int> pair;
lock (sync)
{
if (!map.TryGetValue(a, out pair))
{
r = function(a);
map.Add(a, new Pair<R, int>(r, 0));
}
else
{
r = pair.Result;
pair.AccessCount++;
}

if (map.Count > Properties.Settings.Default.MemoizeMapSize)
{
RemoveLeastUsed<A,R>(map);
}
}
return r;
};
}

Tuesday, May 18, 2010

Function Memoization and Genetic Programming

In a GP nearly all of the computation is in the evaluation of the expression trees and any performance bumps in the area are always welcome. We can do a lot of optimization in the code, but we can only optimize to a certain point and after that we just have to take the performance hit for the function execution. Function memoization (not memorization) can be very helpful in this case. Wikipedia defines memoization as:
"an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs."

Genetic Programming is a particularly good domain for memoization, because you generally have a lot of specimens executing the same functions by providing similar or identical input parameters and the expected output only varies within the range of those parameters.

We've all seen the memoization examples with the Fibonacci sequence, but our world consists of more than just one parameter functions: we have two, three, four and more parameter functions, so we need a solution that provides memoization for a function with arbitrary number of parameters. Perhaps arbitrary number of parameters is an overstatement, but we should be able to at least get the same number of parameters as the C# Func delegate (up to four parameters). Of course if you want to memoize functions with more parameters, then you have to make your own Func-style delegate that supports more parameters.

First we'll start with the standard one parameter and one return value memoization function and we'll build on top of that:

public static Func<A, R> Memoize<A, R>(this Func<A, R> function)
{
var map = new Dictionary<A, R>();
return a =>
{
R r;
if (!map.TryGetValue(a, out r))
{
r = function(a);
map.Add(a, r);
}
return r;
};
}
As you can see there is nothing new there, but in order for two, three and four parameter functions to be memoizated we'll have to use anonymous types, lambdas and some tricky yet very innocent-looking programming :
private static Func<T, R> CastByExample<T, R>(Func<T, R> f, T example)
{
return f;
}

// Two parameter memoize function
public static Func<A, B, R> Memoize<A, B, R>(this Func<A, B, R> function)
{
var example = new { A = default(A), B = default(B) };
var tuplified = CastByExample(t => function(t.A, t.B), example);
var memoized = tuplified.Memoize();
return (a, b) => memoized(new { A = a, B = b });
}
When I first saw that I though it was a bunch of nonsense, but there is a method to the madness!

You can read up on the CastByExample function here and tuples + anonymous types here, but I'll tell you what it specifically achieves for us: it creates a tuple out of the input parameters, so since all the parameters are now a tuple we can pass the tuple into the one-param memoize function and the entire tuple will be treated as the key in the dictionary (i.e. all the parameters are automatically a key). In .NET 4.0 there is actually a Tuple class, but since I'm using .NET 3.5 I had to make do with anonymous types.

Here are the three and four parameter memoize functions:
// Three parameter memoize function
public static Func<A, B, C, R> Memoize<A, B, C, R>(this Func<A, B, C, R> function)
{
var example = new { A = default(A), B = default(B), C = default(C) };
var tuplified = CastByExample(t => function(t.A, t.B, t.C), example);
var memoized = tuplified.Memoize();
return (a, b, c) => memoized(new { A = a, B = b, C = c });
}

// Four parameter memoize function
public static Func<A, B, C, D, R> Memoize<A, B, C, D, R>(this Func<A, B, C, D, R> function)
{
var example = new { A = default(A), B = default(B), C = default(C), D = default(D) };
var tuplified = CastByExample(t => function(t.A, t.B, t.C, t.D), example);
var memoized = tuplified.Memoize();
return (a, b, c, d) => memoized(new { A = a, B = b, C = c, D = d });
}
That exhausts all the overloads of Func, so I'll show you a quick preview of how you can use these functions:

static Func<List<int>, int, int> twoParam = new Func<List<int>, int, int>(
(v, p) => { return TwoParamFunc(v, p); });

static int TwoParamFunc(List<int> vals, int period)
{
int sum = 0;
foreach (int val in vals)
{
sum += ((int)Math.Sqrt((double)val)*period);
}
return sum;
}

static List<int> CreateList(int start, int end)
{
List<int> vals = new List<int>();
for (int i = start; i < end; ++i)
{
vals.Add(i);
}
return vals;
}

static void Main(string[] args)
{
if (TwoParamFunc(CreateList(0, 1000), 10) == twoParam(CreateList(0, 1000), 10))
{
Console.WriteLine("The return values are correct!");
}
Console.ReadKey();
}
In the next blog I'll show you some of the tests and the performance analysis... I hope you enjoyed this.

References: