master Python programmer?

Discussion in 'Programming' started by Secret Santa, Jul 21, 2018.

  1. I'd like to hire a remote consultant who is a Python guru, for several hours every month. It's going to be phone/chat discussions about the proverbial question "what's the best way to do X?". So partly it's about the intelligent design decisions and partly about knowing the vast array of libraries and various features.

    I will, obviously, pay cash and can reciprocate by chatting about various aspects of building and running systematic statistical and volatility strategies (can't give away anything proprietary, but I can give a lot of general guidance).

    PS. Not sure this belongs here or in the hook-up section, PM me please
     
  2. I think you'll need more than just an expert at python coding. You'll want someone with graduate-level expertise in statistical inference (which types tend to work more in Matlab or R than in Python so not that easy to find). But even more so you'll need someone with a good deal of domain knowledge -- you're not going to impart the specific knowledge you've accumulated in 10+ years of working in the options/vol markets in a few hours of online chat a month.

    The guy you're after is probably already working in the industry so you/he might have to work around NDA's/non-competes.

    Programming is, I think, the appropriate forum for this thread.
     
    Last edited: Jul 21, 2018
  3. You might try PM'ing Rosy. He is a good coder, has the domain knowledge, and appears to be independent or a freelancer.

    You could walk him through the linear algebra* parts a lot quicker than you could bring a non-market person up to speed on vol trading.

    * all stats, more or less, has a linalg representation. This may not be completely true, but it's not far off.
     
    destriero likes this.
  4. fan27

    fan27

    I might know a guy who I used to work with who could help. He is an algo trader and has extensive knowledge in C++ and python. He is working for ChinaSoft and here is what he is working on. Let me know if you would like me to reach out to him.

    • Member of a specialized technical team that applies data mining and machine learning algorithms to improve the operation of wireless networks. Current assignment is to add advanced machine learning capabilities to Huawei’s products and to move our code to a Spark/Hadoop environment.
    • Designed and implemented a front end GUI system to provide the user interface for our models. Users include both team members and internal clients that want to analyze new data. The GUI was written in Python and C++ and uses the Qt framework. I also built a low level framework that supports the communication between the python/QT UI and the R engine.
    • Developer and owner of the core Anomaly Detection (AD) models. My code is written in R and provides a flexible plug and play architecture to allow swapping the AD algorithms. The common infrastructure supports input, output, logging, reporting and charting and promotes code and design reuse. The framework also supports S4 objects for some key abstraction s on the wireless domain (e.g. Cells and RNCs). AD algorithms: Uni and Multivariate statistical models, PAM, and PCA.
    • Developed models to support cell profiling by analyzing UE speed. Discovered a very useful metric that uses signal strength variability to estimate UE speed when GPS is not available. Wrote all the data processing code and was heavily involved in the initial coding of the HMM models.
    • Responsible for ETL in our team. Developed a set of tools in Python/C++ to organize, clean-up and facilitate use of the field data by the statistical analysis tools.
    • Implemented clustering and regression models in R to analyze the KPI versus resources relationships in wireless networks.
     
    Secret Santa likes this.
  5. Actually, stats/math stuff is fairly trivial in what I do, I am not as smart as some people are :). Plus, I have a fairly smart quant working with me who really knows most of the math aspects of the business.

    What we seem to really suck at is pure programming/infrastructure issues and I usually don't even know what questions to ask in making the best decisions. A lot of times we already have the code, it's just I need to figure out how to make it not suck or not have to re-write it for every new backtest or strategy. Sometimes it's stuff along the lines of "I am storing data for X, what are the is it better to split the data by date and asset, or just by asset?" and sometimes it's something like "what are the advantages or disadvantages of using numpy matrix directly vs using a data frame for task X?". Sometimes I might sent a snippet of code and ask to see if we can make it more generic or re-usable.
     
  6. traider

    traider

    Are you dealing with high frequency/ high resolution data which requires you to optimize your data structures? Pandas is already using very efficient data structures behind the scene so that you don't have to worry about efficiency most of the time. It's also good when you are in a research phase and want to query your data. To me numpy matrix will be something more for production when I already know exactly what to implement and need to squeeze max speed from it.
     
  7. Robert Morse

    Robert Morse Sponsor

    I might have someone. I'll ask him.
     
  8. runtrader

    runtrader

    I'm an experienced developer - worked in the City of London for various investment banks and software houses for many years. Experienced with various languages - 10 years of C++, 8 years of Java, and recently 2 years of Python. I now develop and manage my own automated trading systems full-time. I'd be happy to chat to see how/if I can help. Not expecting any payment just happy to chat with like minded people. Drop me a PM.
     
  9. It was just an example, but I find that anything that requires iteration (e.g. adding path dependency to signal, such as hysteresis) slows pandas down a lot. Even in a non-HF mode, if I have a few hundred securities over 10 years, it's also pretty slow (well, it takes 10s of seconds to run a backtest, for example).
     
  10. runtrader

    runtrader

    I wouldn't focus on the performance of individual libraries to start with, since this is pre-optimisation. I've come across multiple programmers who state that their backtests are taking too long, only to discover that they are maxing out 1 CPU on their multi-core multi-CPU machine, i.e. only utilising 1 of a potentially 16 available cores!

    Instead ensure your data is structured in a manner that promotes concurrent computation. For example, I run backtests on long/short portfolios of 100s of stocks using years of intraday data. The data is use is structured in a manner that supports concurrent computation, i.e. multi-processing (not to be confused with multi-threading, which in Python is not what it sounds like because of the GIL!). When you run your backtest you want to be sure that all cores are being used 100% Once you've got the data correctly structured you can offload concurrent computations to other machines, aka a grid, and massively improve performance of your backtests.

    Additionally, many forget about memory constraints, the Python memory model uses references. Its really easy to exhaust heap, especially when working with multiple large Pandas DataFrames, a useful method is to use weak references (or soft references) which are garbage collected (released) when no longer required. Again, if your data is correctly structured you can load what ever data you require in memory, perform computation and allow the Python memory management to discard it automatically when its no longer required.

    The combination of concurrency and memory management allows high performance computation (well possibly not at fast as Java or C++ but that is another discussion :) These concepts are not limited to Python but instead are generally good software engineering principles.

    Once you've done the above, identify low-hanging-fruit bottlenecks in your processing by using a profiler (PyCharm has one built in as is pretty good) it can quickly identify where the CPU time is being spent.

    Good luck!
     
    #10     Jul 23, 2018
    helgen_1 and Secret Santa like this.