These days I write most of my code in R. All things being equal, I’d really rather not. But I work in biology and most of my colleagues and collaborators use R, so it’s easier to do the same.
One of the things I really miss from python is that every script that I write can be essentially treated like a full fledged “package” is treated in R. That is, if I start writing some ad-hoc code in a script and then decide there’s some part of it I want to re-use later in another script I can just write
import adhoc_script bar = adhoc_script.load_function() bar = adhoc_script.useful_processing_function(bar)
and everything works as you’d hope it would. Importantly, it doesn’t matter if I have defined
load_function in my current session as importing from
adhoc_script.py doesn’t overwrite anything (i.e.,
adhoc_script gets its own namespace).
Handling of namespaces are one of the areas where R is really awful to work with, but you can get most of this behaviour by bundling the ad-hoc script to be reused into an R package. Unfortunately every time I do this I have to:
- Add the scaffolding necessary for an R package. I’ve published multiple packages and I still find this extremely non-trivial. There are whole books written about how best to do this.
- Install the package.
- Load the package into my current session.
There are tools that make turning scripts into package a bit more automated and less painful, but this is still a long way from what I really want. Conflicting names are still an issue as the expectation of the user when they load the package is that all the functions that were in
adhoc_script.R should be in the main namespace. That is, our new script should look like
library(adhoc_srcipt) bar = load_function() bar = useful_processing_function(bar)
We can get around this somewhat by not calling
library directly and writing
adhoc_script::load_function() instead, but the syntax is clunky and it has other limitations. Far more restrictive for my purposes is that if I want to make some minor change to
adhoc_script.R and then use that change in my current interactive session or script I need to re-install the package and then reload it. In the end, I’ve been doing what I suspect most people do and use the
source command instead. That is, I just run
source(adhoc_script.R) and the code is run line for line as if I’d copied and pasted it wherever I’m running
This sort of works, I don’t have to build any special scaffolding to re-use my code and I can just re-run
source if I change anything. But I’m still stuck with the name conflict issue and the whole thing is generally inelegant. After a couple of hours of googling and trying different options, I’ve come up with a compromise I’m sort of happy with.
The central idea is to use the
sys.source function which does the same thing as
source except it runs the code inside a separate “environment” which you are free to specify. Environments in R are a complicated topic in their own right, but for my purposes they’re just a way to run the code loaded via
source in a way that it’s walled off from whatever else it is I’m doing while still allowing me to access the bits I need to. I wrote a function to hide all the business of calling
sys.source from an environment , which ends up pretty close to what I want. After loading this function, I can do
import(adhoc_script) bar = adhoc_script$load_function() bar = adhoc_script$useful_processing_function(bar)
If I change
adhoc_script.R I just need to run
import(adhoc_function) again. I don’t need to worry about naming conflicts as they’re all nicely fenced off. I can even mimic most of the other nice features of python imports. For example,
import(adhoc_script,as='ahs') bar = ahs$load_function() proc_dat = ahs$useful_processing_function bar = proc_dat(bar)
and if I really must load everything into the current environment
import(adhoc_script,all=TRUE) bar = load_function()
It’s still far from perfect and I’m not 100% sure I haven’t made some horrible mistake in how I treat environments. But I’m much happier doing this than bumbling along using
source which was in practice what I was doing previously. You can download the source code here. I stick it in my
~/.Rprofile file so it is run every time I start R.