news analytics

Discussion in 'Automated Trading' started by ssrrkk, Oct 6, 2011.

  1. ssrrkk

    ssrrkk

    Anyone out there know of a free source news analytics software / library package? Or does anyone have experience with commercial packages such as this one?

    http://thomsonreuters.com/products_services/financial/financial_products/a-z/news_analytics/

    I am investigating ways to incorporate "market sentiment" based on the news at the time. I am guessing that the Thomson Reuters service is very expensive, and out of reach of retail traders. Any ideas would be appreciated.
     
  2. byteme

    byteme

  3. ssrrkk

    ssrrkk

    thanks this looks really good, they even have a sentiment detection module. I am debating whether to use their training scheme, or actually just code in a dictionary of specific nouns and verbs and count occurrences. When you use trained statistical models, you never know what the model is latching on to (it could be latching on to noise). On the other hand, when you hand-code the nouns and verbs there is no guarantee that you have covered all the necessary linguistic space to enable good accuracy sentiment detection...
     
  4. An SMA(5) is the best sentiment indicator I know of.
     
  5. ssrrkk

    ssrrkk

    Yes I agree the SMAs and EMAs contain sentiment information, and therefore market direction and momentum. However, there are certain problems with it. First the shorter time frame SMAs cannot "see" too far forward (e.g., the 1-4 hour timeframe). It is easily affected by noisy spurious spikes that may not be an indication of the longer term momentum. The longer time scale SMAs do a better job but they are horribly delayed and almost useless. But the biggest problem of SMAs are that they don't tell you the approximate target levels the SPX is headed toward. They tell you the direction but not the approximate destination. It is like navigating with a compass but no map. I am not saying the news can reconstruct a good map, but more like a map of the world from the 1500s (i.e., horribly inaccurate, but may be marginally useful, and of course, risk and money management takes care of the horrendous errors that may result from trusting the map too much).
     
  6. gtor514

    gtor514

  7. ssrrkk

    ssrrkk

    This is very impressive. I will have to study this a little to figure out if I can use this quickly in my code. It looks like there is a command line executable version which presumably you can use to run pre-designed processes. That might be the way to use this. I could perhaps train a statistical model to do a categorical sentiment prediction, then save the process in the repository, and then periodically invoke that process in real-time through the command line from my code to analyze the latest news. Thanks!!
     
  8. ssrrkk

    ssrrkk

    Okay, I have been studying rapidminer for a little bit, and I am now a little skeptical whether this will do what I want. It seems that this does not do a full lexical grammar analysis of the sentence structures, but does a very sophisticated word count. Without the correct grammatical interpretation, I am not sure if one can accurately assess sentiment from a document -- for example, the verb "drop" or "decrease" could mean good or bad things depending on the subject, and whether it is used in the negative. I believe LingPipe or even NLTK will do a better job than this. For those packages, I can quickly categorize into subjects and verbs and it will also detect negatives so I can properly bin them before I run a learning algorithm. Does anyone have any experience with these kinds of things?