Polygon live trades in Binary

Discussion in 'Data Sets and Feeds' started by Zuiquan, Apr 14, 2021.

  1. Zuiquan


    Hi there,

    I am using Polygon live feed for stocks with all tickers, trying to make some simple analysis with small delay (a few seconds max).
    I now managed to absorb fetching all the JSON data, even at market open, having a delay ~200ms.
    However, parsing in C++ the JSON string to get the floating-point numbers & integers is just taking useless resources on my CPU.
    I would prefer spare that CPU resources for the analysis part.

    I can understand the use of JSON for historical data, but for Live, this is far from optimal.

    Do you know if there is a plan to have a binary feed?

  2. how much cpu on parsing the text ? You can use independent thread to parse data.
  3. Zuiquan


    Do not have the figure right now, but as far as I remember it was higher than I expected it in the profiler.
    I gained some time by not using a JSON parser with its nodes and all,
    but parsing C-string the old-school way ;-)

    That parsing is done already in independent processing threads.
    But, I would prefer this CPU resources to be used for real processing not string to number conversion, string parsing for terminal characters

    Hence, a binary feed would be a better solution,
    plus I am pretty sure this is the way it is stored in PolygonIO data-structure (at least to compute the bars)

    I think also about the Level 2 that may come someday, giving even more feed data.
    I'd like to avoid buying a data-center :)
  4. qlai


    Are you running on VPS and you want to be as efficient as possible? Otherwise, it should be easy to dedicate another pc for parsing and reformatting.
  5. what is your experience of the data quality?
  6. Zuiquan


    For now, I have only tested this C/C++ code on a dedicated
    Intel i7-3770 3.4 GHz 4-Core 'home' computer
    as the VPS is used for a more stable python version (accumulating huge delay) especially at market opening.
    In the end, it will probably stay on a
    AMD Ryzen 9 3900X 3.8 GHz 12-Core dedicated computer once stable, maybe offloading other less priority processes on the VPS

    The parsing is not so high that a dedicated computer is needed but I find it sad to waste resource to parse text when the data is in essence made of numbers.
    To me the introduction of socket in browsers was the occasion to go binary, where everything else (Ajax) was text.
    Way too often, I see text websocket, whereas there are libs now to use binary with javascript (protobuf,...)

    Just hope that PolygonIO will provide the choice of a binary websocket...
  7. Zuiquan


    no complain on the quality so far
    I just reported a few inconsistencies at the field naming or API level, but no issue with the data itself

    If there was something to say, to me, it would be more the possibility to subscribe telling which fields of data we want.
    For example, I subscribe T.* and A.* when I only need the elligible price events with associated timestamp and also the accumulated volume.
    If I could subscribe by specifying:
    - price field
    - timestamp
    - accumulated volume
    - restricted to conditions:[......]
    that would be a good thing as I would receive far less data to fit my needs, and they would spare bandwidth
    Last edited: Apr 15, 2021
  8. Zuiquan


    Thank you for sharing that interesting trick.

    In my case, I do not use std::string, but just plain char* (C string zero-terminated)
    with some hard-coded offset for where I can be certain of the sub-string length,
    to avoid searching for quotes " or : that bloat the data.

    But I cannot avoid searching for some characters because the event are not packed in constant format, because the symbol does not take always the same length, the conditions neither,...

    Then, there is still the costly operation of converting a text to floating-point number, or text to 64-bit integer
  9. qlai


    The sad part is that Pilygon (or any provider) actually getting binary integer values and convert them to asciii floating point. The wasted cycles could be better spent mining BTC :)
    #10     Apr 15, 2021