Highlights from UseR! 2017
The whole Renjin team had a chance to descend on Brussels for the 2017 UseR! Conference and the co-located 2017 R Implementation, Optimization and Tooling (RIOT) Workshop.
We had the chance to give a few presentations on Renjin, but we also had the opportunity to learn a lot from the other conference speakers. From the persepective of a language implementor, I wanted to highlight some of these talks:
There are a few independent efforts to apply static analysis techniques to R code for the purposes of optimization.
Renjin's JIT compiler of course combines runtime information with static analysis to compile for-loops to highly efficient machine code.
But Jianqiao Zhu also spoke about the ROSA Project, which uses some of the same static analysis techniques. Rather than applying them at runtime like Renjin does, ROSA is an Ahead-of-Time (AOT) compiler, which requires some user input as a substitute for the type information that Renjin gathers automatically at runtime.
R Byte Code Compiler
Tomas Kalibera gave a great presentation on GNU R's own Byte Code Compiler (video).
Compared with Renjin's JIT complier, the first big difference is the unit of compilation. The GNU R Byte Code Compiler (BCC) compiles entire functions, while Renjin's JIT compiler will selectively compile loops based on the number of expected iterations.
The second difference is when compilation takes place. Renjin compiles a loop mid-execution, using all available type information to generate highly-specialized machine code. The compiled loop body is thrown away after the loop finishes.
The GNU R BCC, on the other hand, compiles a function only once, and makes no assumptions about the incoming types of the arguments.
The trade off here is that Renjin can potentially spend more time on compilation, but can in many cases apply more optimizations because we know more about the types. We need better (and more) benchmarking to better compare the performance of these strategies.
One of the big questions we want to explore with Renjin is how we can take existing R and C code that was written for one context and transform it in such a way that it can run faster or in a new context. So far, we've focused on retargeting for the JVM and automatic parallelization, but we've experimented a bit with more exotic targets like GPUs, so I'm always keen to learn more about new architechtures.
Helena Kotthaus spoke on the challenges of parallelization in heterogenous embedded systems (video) where an R runtime would have access to multiple cores with different capabilities. This would be interesting challenge to support Renjin's vector pipeliner, which is currently "resource oblivious" when it comes to scheduling work on the available cores.
On another end of the computing spectrum, Scott Michael shared the results of R benchmarks on Intel's Knight Landing architecture (video), where an R runtime in principle has access to 64 cores with 256 threads. Though the benchmarks were mostly limited to linear algebra benchmarks, it was super interesting to learn more about this architecture.
Even after working on Renjin for several years, I continue to be suprised by the depth and flexibility of the R language.
Lionel Henry presented his work on "fexprs" or symbolic computation during RIOT and the main conference track. We're all familiar with a specifiying statistical models in R using a formula such as
y ~ 3*x + 1, but it gets far more complicated when you start writing functions that parametrize such expressions. It's something I hadn't given alot of thought to, and so it was a real eye-opener.
It was also super interesting to hear Radford Neal's proposals for extensions or changes to the R language itself. There are really are some syntactical constructions that trip up R programmers, such as
x is negative or empty, and I think there is tremendous value in finding backward-compatible solutions.
Kirill Müller shared progress on an R Foundation project to improve profiling of R and native code (video). I think there's a chance to develop a common raw output format for profiling tools on which a rich set of analysis tools can be written in R. We added a very basic profiler to Renjin awhile back, so we'll follow this closely to see if we can also support this common target.
The FastR team from Oracle has made some great progress on supporting graphics in their Graal-based R interpreter (video) that I think Renjin can leverage. After some experimentation, they settled on supporting the grid interface rather than the comparatively low-level grDevices interface which apparently is very tightly coupled to the GNU R interpreter internals. Their implementation is open source and written in Java so might be a good starting point for adding graphics support to Renjin.