The assignment of variables in R, a statistical programming language, is accomplished by variable <- 2
. In this example, the object variable
takes on a value of 2.1
But this isn’t a story about how to name variables. It’s a reflection about how the naming of variables in my R scripts has changed over the years and how this reflects the evolution of my data processing methods.
Here’s the evolution summarized in code:
# when I first started to learn R
behav_RT_data_raw <- read_csv("path/to/data.csv")
# a few years into using R
rt_data <- read_csv("path/to/data.csv")
# now
dd <- prep_data()
Phase 1: the mega script
# when I first started to learn R
behav_RT_data_raw <- read_csv("path/to/data.csv")
When I first started to learn R, I’d pack all my code into a single script. I figured that as long as my code was all in one place, all I needed to do was run the script and my results and figures would be computed with the push of a button (i.e., source("path/to/script.R"
).
This worked well at first. My analyses were not super complicated and I wasn’t working with a lot of data. However, as I gained experience and started learning more complicated analyses, I realized that having 1,000+ lines of code made it difficult to debug and fix code when things went wrong. By packing all analyses into a single script, I also had to be mindful of how I named my objects, hence the super long and detailed name of behav_RT_data_raw
.
Phase 2: modularity
# a few years into using R
rt_data <- read_csv("path/to/data.csv")
The one script idea was madness, so I started to break up my analyses from one script to modular scripts that each did a specific task. For example, I had one script that loaded all my packages, then I had scripts that preprocessed and prepared various raw data files for analysis, then I had additional scripts that performed separate analyses on the processed data.
This procedure worked well for a long time. Splitting the analysis pipeline into modular scripts made it easier to debug. Often, it was possible to modify things in one place and have that change propagate throughout the pipeline. Entire R packages have been built around this approach; {targets}
2 is built to monitor the changes to modular scripts in analysis pipelines.
Without having all my variables in one script, I was free to shorten my variables because there were less to create. Therefore, behav_RT_data_raw
became rt_data
. Nice and short.
I remember reading R programming guides and learning that naming variables should be done clearly and explicitly so that you can easily tell apart variables. I think this is a good approach for beginners, but the more and more you program, the lazier you get. And not the bad kind of lazy, where you stop commenting code (but that can happen too!). I’m talking about the good kind of lazy, where you stop wanting to type so much, where the difference between a capital A
and lowercase a
is stark. Reaching that pinky finger over for the shift key becomes too much work.
In my opinion, laziness in programming is a good thing; it leads to more efficient solutions.
Phase 3: functions
# now
dd <- prep_data()
You are starting to see a trend here. Variables are becoming shorter and less intelligible. This is because I have started to embrace designing custom functions instead of using scripts to perform analyses.
For years I was resistant to writing my own functions. I seemed to always convince myself that the time spent writing a function to do a task would be wasted. What if the function I wrote was so specific that it only pertained to this analysis? What if the function I wrote would be too complex and difficult to debug? The what-ifs were endless.
It started slow. I first wrote a function to make it easier to plot. I then wrote another function to preprocess some data. I then started to modify existing functions for additional functionality. I started to realize that the functions I was writing could be used across multiple projects, and often, a function I wrote could be incorporated into another larger wrapper function with more functionality. Functions started to use each other to do tasks!
By this point, my fingers are tired of typing. No more _
, no more capitals. Two lowercase characters are good enough for me. I’d honestly love to use single character variables; however, this is bad form in R as there are functions that are single letters (e.g., t()
, c()
) and these very useful functions will be unavailable if assigned something else.
One of my favorite variables to create is dd
.3 Simple. Easy. Perfectly lazy.
Summary
When you program in a language for a while, you get sick of typing so much.
The cool this about this is the directionality; you could also write 2 -> variable
or even x <- 2 -> y
in which the value 2 is assigned to both x
and y
.
https://books.ropensci.org/targets/
To be perfectly honest, I was also heavily inspired by Simon N. Wood’s book Generalized Additive Models An Introduction with R, Second Edition. Wood’s code is also very custom function heavy with one to two character variables that I found so refreshing.