Friday, April 23, 2010

Getting Data: EOD Historical Prices

One of the first issues that people encounter when trying to experiment with ML/AI algorithms in the trading field is getting (free) training data. It's pretty easy to get end of day (EOD) prices from google and yahoo- both providers allow you to download historical data in a csv format.

Let's take a look at both providers:

Google
  • Posts the EOD data earlier than Yahoo (2-3 hours before Yahoo posts the data).
  • Sample URL: http://www.google.com/finance/historical?q=NYSE:GOOG&output=csv
  • Has a limit on how many requests you can make per minute (don't know the exact figure, but if you request data too often google will block you for a period of time).
  • Good if you download the data manually and automatically (at low frequency).
Yahoo
  • Posts the EOD data a little later (around 7:00 p.m. CST).
  • Sample URL: http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=3&e=23&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv
  • Has no limit on how many requests you can make at all!
  • Good if you download the data both manually and automatically (no frequency limit).
The entire set of up-to-date EOD prices from Yahoo for all the stocks on all of the major exchanges is around 4GB. Google's database size is about the same.

I chose to use Yahoo, because it was important that there is no limit on the number of requests I can make... this allowed me to automate the process of downloading the data and populating my database.

Here is sample code on how to download the data:

void DownloadDataFromWeb(string symbol)
{
DateTime startDate = DateTime.Parse("1900-01-01");

string baseURL = "http://ichart.finance.yahoo.com/table.csv?";
string queryText = BuildHistoricalDataRequest(symbol, startDate, DateTime.Today);
string url = string.Format("{0}{1}", baseURL, queryText);

//Get page showing the table with the chosen indices
HttpWebRequest request = null;
HttpWebResponse response = null;
StreamReader stReader = null;

//csv content
string docText = string.Empty;
string csvLine = null;
try
{
request = (HttpWebRequest)WebRequest.CreateDefault(new Uri(url));
request.Timeout = 300000;

response = (HttpWebResponse)request.GetResponse();

stReader = new StreamReader(response.GetResponseStream(), true);

stReader.ReadLine();//skip the first (header row)
while ((csvLine = stReader.ReadLine()) != null)
{
string[] sa = csvLine.Split(new char[] { ',' });

DateTime date = DateTime.Parse(sa[0].Trim('"'));
Double open = double.Parse(sa[1]);
Double high = double.Parse(sa[2]);
Double low = double.Parse(sa[3]);
Double close = double.Parse(sa[4]);
Double volume = double.Parse(sa[5]);
Double adjClose = double.Parse(sa[6]);
// Process the data (e.g. insert into DB)
}
}
catch (Exception e)
{
throw e;
}
}

string BuildHistoricalDataRequest(string symbol, DateTime startDate, DateTime endDate)
{
// We're subtracting 1 from the month because yahoo
// counts the months from 0 to 11 not from 1 to 12.
StringBuilder request = new StringBuilder();
request.AppendFormat("s={0}", symbol);
request.AppendFormat("&a={0}", startDate.Month-1);
request.AppendFormat("&b={0}", startDate.Day);
request.AppendFormat("&c={0}", startDate.Year);
request.AppendFormat("&d={0}", endDate.Month-1);
request.AppendFormat("&e={0}", endDate.Day);
request.AppendFormat("&f={0}", endDate.Year);
request.AppendFormat("&g={0}", "d"); //daily

return request.ToString();
}

3 comments: