Computing a Bayes estimator using Python/R/Matlab/Octave

blueraincap · Jul 11, 2020

Without posting equations and examples, those who are familiar with statistics should know it always involves differentiating and integrating pdfs, from calculating moments to estimators. Pdf multiplication (joint pdfs, conditional pdfs) results in some complex formulas of several variables.

Be it moment-generating-functions, MLE, or Bayes estimators, we integrate the unknown parameter over its domain while other parameters remain undefined. For a simple example, the Bayes estimator for the p parameter of binomial distributions will lead to integrating the conditional pdf involving p, n (sample size) and x(number of success within sample), integrating p while n and x remain unknown constants. This kind of integration is symbolic computation, right? Is there any way to approach it using numerical integration?

Point being, playing with R/Matlab/Octave (involving sympy), it seems symbolic computation is a difficult and computationally-inefficient thing to do. A relatively simple textbook example involving integrating a conditional pdf which can be done by hand, took Matlab over 10-mins to complete and I couldn't even tell if it would succeed. I changed something to see how it would affect the result, it took another 15-mins. It isn't feasible. How are people doing these computations in practice?

Real Money · Jul 12, 2020

I'm a total slacker but if you integrate marginal density functions with unknown variables the solution will be a function involving those same variables. But, depending on the integrand, the resulting function may be very complex or even have no closed form solution.

Each case will be specific to the density function(s).

I can see situations where doing so would increase the complexity arbitrarily. Numerical integration is an entire field of study and is geared toward the literature pertaining to partial differential equations.

Some of the most complex problems in mathematics are integral equations of this kind. Solutions can be so complicated they require Ph.D level knowledge. Things like Beta distributions, hypergeometric series, Bernoulli numbers, or theory from the study of formal power series.

One thing you could do is to plug the equations into wolfram alpha to get an idea of what might be going on behind the scenes with those computational issues.

(glad I don't have to do any of this shit anymore, just develop my scripts and algorithms)

WeToddDid2 · Jul 12, 2020

blueraincap said:
Without posting equations and examples, those who are familiar with statistics should know it always involves differentiating and integrating pdfs, from calculating moments to estimators. Pdf multiplication (joint pdfs, conditional pdfs) results in some complex formulas of several variables.

Be it moment-generating-functions, MLE, or Bayes estimators, we integrate the unknown parameter over its domain while other parameters remain undefined. For a simple example, the Bayes estimator for the p parameter of binomial distributions will lead to integrating the conditional pdf involving p, n (sample size) and x(number of success within sample), integrating p while n and x remain unknown constants. This kind of integration is symbolic computation, right? Is there any way to approach it using numerical integration?

Point being, playing with R/Matlab/Octave (involving sympy), it seems symbolic computation is a difficult and computationally-inefficient thing to do. A relatively simple textbook example involving integrating a conditional pdf which can be done by hand, took Matlab over 10-mins to complete and I couldn't even tell if it would succeed. I changed something to see how it would affect the result, it took another 15-mins. It isn't feasible. How are people doing these computations in practice?
More...

What's in the box? Are you running it on a GPU(s)? Do you have at a minimum a GTX 1080 Ti? If you want some speed, throw in a RTX 2080 Ti at minimum. Run it on a box with 4x RTX 2080 Tis and get er done lickety-split.

Buy yourself one of these bad boys:

https://www.sabrepc.com/Deep-Learni...hYOF6u-fajlB2OqA3h7El8p9jI29wswxoC6soQAvD_BwE

https://www.exxactcorp.com/Deep-Learning-NVIDIA-GPU-Solutions

Have you tried AWS with a GPU-accelerated infrastructure or something similar?

https://aws.amazon.com/nvidia/

https://cloud.google.com/nvidia/

https://azure.microsoft.com/en-us/b...-workstations-with-flexible-gpu-partitioning/

Relatively cheap cloud server:
https://www.fluidstack.io/virtual_desktop/?

If you don't mind China stealing your code:
https://us.alibabacloud.com

List of Nvidia cloud partners:
https://www.nvidia.com/en-us/data-center/gpu-cloud-computing/

blueraincap · Jul 12, 2020

Real Money said:
I'm a total slacker but if you integrate marginal density functions with unknown variables the solution will be a function involving those same variables. But, depending on the integrand, the resulting function may be very complex or even have no closed form solution
More...

The problems such as Bayes estimators and the component pdfs are relatively basic, such that a closed-form is guaranteed (because they are straight from a textbook). Having understood the concept of Bayes and others, whose reasoning is sound, obviously we are to apply them and hell not by hand. If it is so difficult to do such a symbolic calculation for a simple textbook problem that can be done by hand, how is it done in the real world where pdfs are more convoluted. So surely such calculations are done in finance, so it volatility models or derivatives pricing which involve distributions. That's my curiosity.

Please PM me if you would like to help further. I have posted an example elsewhere.

tommcginnis · Jul 12, 2020

Three steps:
1) compute in a spreadsheet, utilizing minimal internal functions and minimal parentheses.
2) re-compute with free use of internal functions and parentheticals.
3) translate either step 1 or 2 into the language of your choice (depending on expediency).

I suspect the programmer; translation into a visual form where logic is made explicit would go a long way towards pointing out any issues with the language.

WeToddDid2 · Jul 12, 2020

@blueraincap, it is possible that tommcginnis is correct with respect to your code. Have you tried it in python using the tensorflow , theano, and/or SymPy libraries? It would be interesting to see the difference in time wrt the various languages.

Also, try your code on FluidStack to determine if it is your machine.

blueraincap · Jul 12, 2020

tommcginnis said:
Three steps:
1) compute in a spreadsheet, utilizing minimal internal functions and minimal parentheses.
2) re-compute with free use of internal functions and parentheticals.
3) translate either step 1 or 2 into the language of your choice (depending on expediency).

I suspect the programmer; translation into a visual form where logic is made explicit would go a long way towards pointing out any issues with the language.
More...

Since when can you do symbolic integration on a spreadsheet? I don't think so.

tommcginnis · Jul 12, 2020

blueraincap said:
Since when can you do symbolic integration on a spreadsheet? I don't think so.
More...

Pick up an engineering text.

blueraincap · Jul 12, 2020

tommcginnis said:
Pick up an engineering text.
More...

nah, I simply don't think if I have a serious speed issue doing symbolic mathematics in matlab/octave, doing it in a spreadsheet will help.

tommcginnis · Jul 14, 2020

blueraincap said:
nah, I simply don't think if I have a serious speed issue doing symbolic mathematics in matlab/octave, doing it in a spreadsheet will help.
More...

Then hit a whiteboard. Use the back on an envelope. Try a burnt stick on a cave wall. But there's more than one way to skin this cat, and the more explicit you make your methods, the more that computation inefficiencies will stand out (or fair computational short-cuts will expose themselves). YMMV.