Exhuastive Set of Listed Equities

Discussion in 'Data Sets and Feeds' started by SunnyIsles, Jun 27, 2013.

  1. gmst

    gmst

    Thanks for the algorithm. I have never used perl. any other way to do it? I can write code in vba. Btw, possible for you to post your perl code? ok if not. maybe that will give an idea of how to execute this in vba. Thanks.

    Also, can you please explain this line - couldn't understand it.
    I use Perl with the Furl, HTML::TreeBuilder, and Parallel::ForkManager modules.
     
    #11     Jul 3, 2013
  2. Bob111

    Bob111

    +1. i have similar list of procedures. and that's how ladies and gentleman we are getting the data,which suppose to be free and easy to find on exchanges.
     
    #12     Jul 3, 2013

  3. Perl is the core scripting language. Similar to Ruby or Python or Lua, etc.

    Other people have written add-ons that enhance/simplify certain tasks, such as retrieving data from the web, parsing web pages, manipulating databases, creating charts, calculating statistics or option Greeks, etc.

    They donate their work to the public by submitting them to CPAN - the Comprehensive Perl Archive Network - cpan.perl.org.

    Ruby, Python, and R have similar repositories.

    The modules I mentioned and use:

    Furl - a faster version of LWP, which is the built-in module to download web pages and other data from the web.

    HTML::TreeBuilder - part of HTML::Tree. A module to help parse HTML pages. This module parses the page into simple text based on the html tags. There are similar modules that convert the HTML into XML or DOM trees for easier, more orderly parsing.

    Parallel::ForkManager is a simple module to help parallellize discrete procedures. So, instead of retrieving, parsing, and inserting the data into the database for the 7,200 stocks and ETFs, one at a time, I fire up 50 instances to run at the same time. That reduces the total time from hours to about 30 min over a cable modem line.

    I use it again when I update prices each day.

    DBI is the top-level module for database interaction. DBI works in conjunction with a database driver module (in my case, DBI::pg for PostgreSQL interaction).

    There are others, but those are the most relevant.

    I run FreeBSD. If you want to continue with Windows, you'll probably want to install Cygwin to run a Unix-like environment on your Windows box. Or, you can install VirtualBox, and install Linux or FreeBSD as a guest host.

    For the uninitiated, I highly recommend FreeBSD instead of Linux. The basic analogy I use to compare the two is this:

    Linux is a car - but you receive it in parts. It's up to you to put it together, tune it, and maintain it. No matter what distro you use, you'll still work harder to get it to do what you want.

    FreeBSD is a car, but it's delivered already to run and use. You just have to change the oil and filters, rotate the tires, etc, every once in a while. You'll still need to build/configure your window manager if you're particular like me and don't want to use GNOME or KDE.

    The code I've written is not ready for public consumption. I definitely need to run through and eliminate inconsistencies and inefficiencies.

    But the gist of the design:

    I've created my own package (Perl-speak for library or the above mentioned modules) called Finance::DataMining.

    The stocks portion is in Finance::DataMining::Stocks
    The futures is in Finance::DataMining::Futures
    The options portion is in Finance::DataMining::Options
    The analysis routines are in Finance::DataMining::Quant

    In Perl, the double-colons are equivalent to directory slashes. So, in Unix, Finance::DataMining::Stocks is equivalent to:

    $lib_dir/Finance/DataMining/Stocks.pm

    I've put all of the routines into the Stocks.pm file and all the config stuff - default variables, SQL calls, urls into a config file called $etc/Stocks.conf

    I've written everything so I can merely do this to get all of the symbols and all of their closing prices:

    my $stocks = new Finance::DataMining::Stocks;
    my @symbols = $stocks->getSymbols();

    foreach (@symbols) {
    my %prices = $stocks->getStockPricesBySymbol($_->[0]);

    foreach my $symbol (sort keys %prices) {
    foreach my $date (sort keys %{$prices{$symbol}}) {
    print "$symbol - $date - $prices{$symbol}{$date}{'close'}\n";
    }
    }
    }

    Granted, this isn't a very fast (it took 409 seconds when I timed it) or very practical example , since I'm pulling and printing data from a heavily indexed table with 20 million rows (I've downloaded all the price data from 1970 where applicable). But if you're not in any hurry, it works, especially for EOD data...

    If one needs/wants to avoid survivorship bias issues, then they can buy the delisted data from premiumdata.net. I doubt they'll have other useful data like shares outstanding, sector, industry, fundamentals, etc.

    This is just the data collection stuff. If I finally get around to building a more complete package - charting, backtesting, order generator, portfolio manager, maybe a web or app-based gui, then I'll release it.

    But it would need to be more complete and user-friendly than GeniusTrader. Which it isn't ... yet.

    And I may entertain the idea of rewriting it in Python. We'll see. No promises...

    *** how do we turn off the smilies? Replace the smilies above with colon and D ***
     
    #13     Jul 3, 2013
  4. gmst

    gmst

    blah12345678

    that was a very thorough reply and a nice introduction. Highly Appreciated!
     
    #14     Jul 3, 2013