Learning from NSF DALI 13
Last week I had a chance to join the National Science Foundation Workshop on Dynamic Languages for Scalable Data Analytics, or #dali13 and share an update on Renjin's design and progress towards full compatibility with GNU R.
Besides getting a chance to see Beautiful Indianpolis for the first time, it was really an incredible group that brought together scientists with real data scalability problems, and those of us who are trying to build solutions.
I learned a lot, on topics diverse as concurrent garbage collection, automatic parallization, to VM design.
Here are some links to projects I'm excited about:
Duncan Temple Lang is working on Rllvm a package for GNU R that allows you to translate R code to LLVM IR, play with it, and then compile to native code.
With MacLab, the McGill University crew is working on an open-source toolkit for analyzing and compiling Matlab code. It sounds like they face many similiar challenges that an R (re)implemention does as Matlab also lacks a formal specification.
Hadley Wickham gave an update on dplyr, which also uses deferred computation to improve performance on data frame operations, but written largely in user space. Very curious about the C++ interpreter for R expressions!
Ryan Newton Data.Array.Accelerate with me, which is a DSL for array computations embedded in Haskell with pluggable backends targeting CUDA, OpenCL, etc. Conceptually very similar to Renjin's Vector Pipeliner, and much to learn from the implementation. (Thanks [Ryan]!)
Chandra Krintz spoke about StochSS project that aims to provide stochiastic simulation as a service for life science researchers, using the AppScale platform, an open source clone of AppEngine. Renjin makes it MUCH easier to leverage the AppEngine/AppScale platform, so I'm excited about potential for collaboration here.
Thanks to Jan Vitek for organizing another great opportunity for collaboration!