These days I write most of my code in R. All things being equal, I’d really rather not. But I work in biology and most of my colleagues and collaborators use R, so it’s easier to do the same.
One of the things I really miss from python is that every script that I write can be essentially treated like a full fledged “package” is treated in R. That is, if I start writing some ad-hoc code in a script and then decide there’s some part of it I want to re-use later in another script I can just write
import adhoc_script
bar = adhoc_script.load_function()
bar = adhoc_script.useful_processing_function(bar)
and everything works as you’d hope it would. Importantly, it doesn’t matter if I have defined load_function
in my current session as importing from adhoc_script.py
doesn’t overwrite anything (i.e., adhoc_script
gets its own namespace).
Handling of namespaces are one of the areas where R is really awful to work with, but you can get most of this behaviour by bundling the ad-hoc script to be reused into an R package. Unfortunately every time I do this I have to:
- Add the scaffolding necessary for an R package. I’ve published multiple packages and I still find this extremely non-trivial. There are whole books written about how best to do this.
- Install the package.
- Load the package into my current session.
There are tools that make turning scripts into package a bit more automated and less painful, but this is still a long way from what I really want. Conflicting names are still an issue as the expectation of the user when they load the package is that all the functions that were in adhoc_script.R
should be in the main namespace. That is, our new script should look like
library(adhoc_srcipt)
bar = load_function()
bar = useful_processing_function(bar)
We can get around this somewhat by not calling library
directly and writing adhoc_script::load_function()
instead, but the syntax is clunky and it has other limitations. Far more restrictive for my purposes is that if I want to make some minor change to adhoc_script.R
and then use that change in my current interactive session or script I need to re-install the package and then reload it. In the end, I’ve been doing what I suspect most people do and use the source
command instead. That is, I just run source(adhoc_script.R)
and the code is run line for line as if I’d copied and pasted it wherever I’m running source
from.
This sort of works, I don’t have to build any special scaffolding to re-use my code and I can just re-run source
if I change anything. But I’m still stuck with the name conflict issue and the whole thing is generally inelegant. After a couple of hours of googling and trying different options, I’ve come up with a compromise I’m sort of happy with.
The central idea is to use the sys.source
function which does the same thing as source
except it runs the code inside a separate “environment” which you are free to specify. Environments in R are a complicated topic in their own right, but for my purposes they’re just a way to run the code loaded via source
in a way that it’s walled off from whatever else it is I’m doing while still allowing me to access the bits I need to. I wrote a function to hide all the business of calling sys.source
from an environment , which ends up pretty close to what I want. After loading this function, I can do
import(adhoc_script)
bar = adhoc_script$load_function()
bar = adhoc_script$useful_processing_function(bar)
If I change adhoc_script.R
I just need to run import(adhoc_function)
again. I don’t need to worry about naming conflicts as they’re all nicely fenced off. I can even mimic most of the other nice features of python imports. For example,
import(adhoc_script,as='ahs')
bar = ahs$load_function()
proc_dat = ahs$useful_processing_function
bar = proc_dat(bar)
and if I really must load everything into the current environment
import(adhoc_script,all=TRUE)
bar = load_function()
It’s still far from perfect and I’m not 100% sure I haven’t made some horrible mistake in how I treat environments. But I’m much happier doing this than bumbling along using source
which was in practice what I was doing previously. You can download the source code here. I stick it in my ~/.Rprofile
file so it is run every time I start R.