R "Modules"

These days I write most of my code in R. All things being equal, I’d really rather not. But I work in biology and most of my colleagues and collaborators use R, so it’s easier to do the same.

One of the things I really miss from python is that every script that I write can be essentially treated like a full fledged “package” is treated in R. That is, if I start writing some ad-hoc code in a script and then decide there’s some part of it I want to re-use later in another script I can just write

import adhoc_script

bar = adhoc_script.load_function()
bar = adhoc_script.useful_processing_function(bar)

and everything works as you’d hope it would. Importantly, it doesn’t matter if I have defined load_function in my current session as importing from adhoc_script.py doesn’t overwrite anything (i.e., adhoc_script gets its own namespace).

Handling of namespaces are one of the areas where R is really awful to work with, but you can get most of this behaviour by bundling the ad-hoc script to be reused into an R package. Unfortunately every time I do this I have to:

Add the scaffolding necessary for an R package. I’ve published multiple packages and I still find this extremely non-trivial. There are whole books written about how best to do this.
Install the package.
Load the package into my current session.

There are tools that make turning scripts into package a bit more automated and less painful, but this is still a long way from what I really want. Conflicting names are still an issue as the expectation of the user when they load the package is that all the functions that were in adhoc_script.R should be in the main namespace. That is, our new script should look like

library(adhoc_srcipt)

bar = load_function()
bar = useful_processing_function(bar)

We can get around this somewhat by not calling library directly and writing adhoc_script::load_function() instead, but the syntax is clunky and it has other limitations. Far more restrictive for my purposes is that if I want to make some minor change to adhoc_script.R and then use that change in my current interactive session or script I need to re-install the package and then reload it. In the end, I’ve been doing what I suspect most people do and use the source command instead. That is, I just run source(adhoc_script.R) and the code is run line for line as if I’d copied and pasted it wherever I’m running source from.

This sort of works, I don’t have to build any special scaffolding to re-use my code and I can just re-run source if I change anything. But I’m still stuck with the name conflict issue and the whole thing is generally inelegant. After a couple of hours of googling and trying different options, I’ve come up with a compromise I’m sort of happy with.

The central idea is to use the sys.source function which does the same thing as source except it runs the code inside a separate “environment” which you are free to specify. Environments in R are a complicated topic in their own right, but for my purposes they’re just a way to run the code loaded via source in a way that it’s walled off from whatever else it is I’m doing while still allowing me to access the bits I need to. I wrote a function to hide all the business of calling sys.source from an environment , which ends up pretty close to what I want. After loading this function, I can do

import(adhoc_script)

bar = adhoc_script$load_function()
bar = adhoc_script$useful_processing_function(bar)

If I change adhoc_script.R I just need to run import(adhoc_function) again. I don’t need to worry about naming conflicts as they’re all nicely fenced off. I can even mimic most of the other nice features of python imports. For example,

import(adhoc_script,as='ahs')

bar = ahs$load_function()
proc_dat = ahs$useful_processing_function
bar = proc_dat(bar)

and if I really must load everything into the current environment

import(adhoc_script,all=TRUE)

bar = load_function()

It’s still far from perfect and I’m not 100% sure I haven’t made some horrible mistake in how I treat environments. But I’m much happier doing this than bumbling along using source which was in practice what I was doing previously. You can download the source code here. I stick it in my ~/.Rprofile file so it is run every time I start R.