(Replying to PARENT post)
I do not. Unfortunately, high performance and python do not go hand in hand. Yes, I know the heavy lifting is done by C/C++/Rust/Cuda/Blas/Numba and so on, but, when you run simulations for millions of steps, you end up with billions of python function calls.
Afaik only Jax actually performs any optimizations because it constructs an analytical gradient. Zygote seems to be able to that and more on the LLVM IR level which, I think should enable more optimizations.
(Replying to PARENT post)
This is becoming increasingly outdated as significant parts in scientific machine learning require parts to be written that don't simply compile to GPUs. Think, for example, when you mix a physics model, say for RF or some such, and deep learning to model parts of the function. In python, you cannot write the RF model because python is vastly too slow, so you're forced to write the RF model in something fast, like C/C++, then integrate that to python, then integrate that to your favorite tensor network, needing more languages than you can do immediately with Julia.
Deep learning is moving rapidly out of simply being a tensor engine, and being a tool in much larger problems, where many high performance pieces need developed. Julia is light years ahead of Python for these domains, and I cannot see Python ever catching up because it suffers from performance and/or multiple language problems to solve these.
If you've never learned about scientific machine learning - go read some or watch some videos. It's fascinating and growing rapidly.
(Replying to PARENT post)
Given how divergent the current crop of ML frameworks are, is this really a realistic expectation? Having played around with Julia and Flux for ML, I find I have to do just as much rewriting when translating e.g. TF -> Flux as TF -> PyTorch. You get some limited mixing and matching with Caffe2 <-> Torch and TF <-> JAX, but that breaks down the moment you leave a company's walled garden.
> I don't think that LLVM IR that uses is the best IR for the optimizations.
I think Chris Lattner agrees, which is why he also helped start https://mlir.llvm.org/. If anything, I predict we'll see more frameworks targeting it (prototypes for Numpy, PyTorch, TF and general XLA already exist). This implies that languages that target LLVM now will actually have a leg up because their compiled semantics can be more easily lowered to something accelerator-friendly.
(Replying to PARENT post)
Julia is like, the poster child for being able to mix and match and compose code and algorithms: Flux/Zygote can do auto-differentiation on the entire language without modification and you can drop in quite literally arbitrary Julia code as components of your network and it works.
> don't think that LLVM IR that uses is the best IR for the optimizations.
What makes you say this? They community has been able to do some pretty amazing things performance wise-e.g. pure-Julia implementation of BLAS/LAPACK reaching and in some cases exceeding performance parity, plus thereβs been plenty of work in CUDA support for arbitrary Julia code, which is impressive.
(Replying to PARENT post)
At the end the engine that compiles the mathematical expression to hardware is what matters, and I don't think that LLVM IR that uses is the best IR for the optimizations.