Universal Symbol/Security/Instrument Id?

Discussion in 'Order Execution' started by chaostheory, Feb 14, 2009.

  1. Folks,

    I'm contributing to an open source project and working on handling tick data.

    We have a need for universal instrument id. Does anything like that already exist?

    Specifically, the ordinary way of identifying securities with string data like "GOOG" or "USD/JPY" Or "ES01" or whatever is extremely inefficient at high speed processing of millions of ticks.

    We need the universal id to be binary like a 32 bit or 64 bit integer value.

    Does any thing like that exist?

    In the next post I'll mention a couple ideas we have for designing one and seek your opinions. If it comes to that we will probably check into the process for submitting this to an international standards organization for review and approval as a standard.
  2. The first idea to implement this is to follow the pattern of Unicode by simply assigning numbers to symbols directly as they are discovered.

    In that sense, GOOG on NASDAQ could be 243 and "USD/JPY" feed from one broker is 473 and "USD/JPY" feed from a different broker is 687, etc.

    The problem is that since many securities are traded on multiple exchanges or reported by numerous different data providers, it would be very inefficient to calculate comparisons.

    Instead it seems more valuable to partition the set of bits so that bits represent different aspects of the security, like which data feed or broker, which exchange, which expiration, etc.

    Specifically we propose the following and request feedback on whether anything is being overlooked to create a truly universal id.

    First 16 bits represents the provider of the data, which can be directly from an exchange, through a broker, a data provider. Each of these receives a specific numerical identifier. 16 bits will allow for 32767 of these in the whole world. Is that enough?

    Considering that some will close and others open, how long before the number of provider ids runs out?

    Next 16 bits represents the instrument itself which means GOOG, USDJPY, ES.

    That makes 32 bits so far.

    The remaining 32 bits must to identify type of instrument, expiration information for options, futures, indexes, etc.

    But strike prices for options can be stored separately from the universal id.

    So 8 bits represents the expiration type.
    0 means non-expiring like stocks, currencies.
    1 means american stock option expiration.
    2 means CME futures expiration.
    etc. etc.

    That allows 256 possibilities. Will that cover all known and future types of expiration?

    8 bits represents the expiration month or period within a year.
    6 bits allows 64 possibilities--so it could be up to weekly.

    16 bits represents the execution year with 0 meaning the year 2000. This allows years to stretch to 16,384 years in the future and 16,384 years into the past (using a negative number).

    The advantages of this design are that you can using binary arithmetic which is hundreds of times faster than string comparisons to filter incoming tick data, segregate and organize it.

    It makes for extremely fast data loads from binary files of tick data since no string to binary conversion is necessary.

    What are your thoughts?
  3. Very interesting and pretty well-thought out.

    But why do you need 8 bits for determining the type of expiration? Non expiring objects could just have their expirations nulled out, while expiring options could just have regular expiration sets. Why do you need to categorize what type of object is expiring?
  4. First to explain that, expiration type was a way to interpret expiration month or period since the period 4 may have different meaning regarding expiry date for monthly versus quarterly expiration type.

    Also, futures usually have different expiration dates than options on futures even if they expire in the same month.

    It seems American and European options have different expiration types.

    God only know what expiration schedules are following in other nations.


    Would it be preferable to simply specify the julian day within the year that the instrument expires?

    (Zeroed out, of course for non-expiring options).

    9 bits can cover 1 to 365 as the day of expiration during a year.

    And 15 bits can represent the year which leaves 8192 years in the future and 8192 years in the past from year 2000.
  5. yayt


  6. What an awesome post!!

    Clearly ISIN will be useful since it is international. And so will CUSIP since it forms the middle 9 digits of ISIN for the US.

    One Problem It covers every type of instrument except options and futures.

    So we have to find a way to solve the options and futures in the same universal identifier.

    Look, I did some calculations...

    For the Universal Id in binary, it can use the ISIN converted to binary for faster tick data performance.

    Since the ISIN has a 9 digit alphanumeric code plus 2 digit country code, that means 68,654,530,707,849,200
    possible combinations. This ignores the ISIN check digit.

    Converted to bits, that requires 56 bits.

    Since ISIN only identifies the security without identifying the exchange or the data source. So that requires some additional design.

    Good news is that since 8 bits remain. They can be used to identify which type of identifier so that futures can be covered.

    Does anyone know a standard identifier for all futures securities?

    Since options are always based on an underlying security, the option information like expiration month, strike price, etc can be stored separately.
  7. soks86


    There is no standard, sure many have been suggested but expecting a standard identifier is a bit much if people can't even follow their own legal regulations, how is that related I dunno but I'm just complaining.

    The real solution is to come up with your own system (ISIN isn't actually a bad one to follow) and have a system of converters depending on which exchange you're connecting to. Usually things like "product code" or "symbol" tend to stay the same (AUD/USD is AUD/USD on most exchanges) can be used alongside whichever unique identifier each exchange uses.

    Truth is, at least with CME Group, if you connect you identify your contracts by a "Security Description" which includes the Product + month + year HOWEVER when you listen to CMEG's market data stream you are looking at "Security IDs" which are LIKE ISINs but aren't really.

    This isn't going to happen, the exchanges won't benefit from it enough and translation mechanisms are simple enough. Try order entry through CMEG to BM&F and you'll see some interesting ones, CMEG in fact has to perform (on the fly) translation of ISIN (Security IDs) while submitting orders all the way to Brazil and translate it again on the way back to the user.

    There is no reason to go up this tree, I assure you there isn't any tastier fruit at the top as there is at the bottom. At least in my opinion.
  8. The purpose of this endeavor is primarily to produce way of identifying securities internal to the tradelink open source project on googlecode.

    That project needs a way to identify and store tick data such that it can be reloaded at high speed.

    Conversion of the broker or exhange codes to ISIN or other is fine during real time capture or real time trading.

    The need for speed is during historical playback. At that point any translation will drastically slow the loading of millions of ticks.

    So a binary internal representation will be faster.

    So we're looking to design a binary representation of a security for very fast processing, sorting, filtering, etc.

    Proposing this for an international standard is not important at all really especially now that we learn of the ISIN.

    I hope this explains it.

    If we can't identify an already established standard for commodities, futures, and currencies, then we can just use the remaining 8 bits to identify which type of identifier it is.

    So 0 for type will mean the remaining bits represent an ISIN identifier.

    But 1 for type can mean that the remaining bits represent a currency.

    2 for type can indicate the reminder represents a future, etc.

    The question is, what set of symbols to use for futures or options since there's no standard.

  9. Okay, that actually was our fall back plan if no standard exists.

    So we can just use the remaining bits to indicate the type of identifier.

    If it's 0 then the remaining 56 bits are the ISIN.

    If it's 1 then it will be our internal system for futures and commodities.

    For the internal system we can still imitate the ISIN by have a 2 digit country code for international support.

    Then the remaining 9 digits of alphanumeric are sufficient to define any type of currency or future that isn't covered by the ISIN standard.

    (It's a shame that the US doesn't inlude futures and currencies.)
    #10     Feb 15, 2009