R Helper Functions

If you do a lot of R programming, you probably have a list of R helper functions set aside in a script that you include on R startup or at the top of your code. In some cases helper functions add capabilities that aren’t otherwise available. In other cases, they replicate functionality that is available elsewhere without loading unnecessary components. Below I present two of my most frequently used data manipulation helper functions as examples.

### Descriptives R Helpler Function
# Display some basic descriptives
descs <- function (x) {
  if(!hidetables) {
    if(length(unique(x))>30) {
      print('Summary results:')
      print(summary(x))
      print('')
      print('Number of categories is greater than 30, table not produced')
    } else {
      print('Summary results:')
      print(summary(x))
      print('')
      print('Table results:')
      print(table(x, useNA='always'))
    }
   
  } else {
    print('Tables are hidden')
  }
}

# Set hide tables to true to hide tables
hidetables <- FALSE


### Dummy Variable R Helper Function
# Create dummy variables for each level of a categorical variable
createDummies <- function(x, df, keepNAs = TRUE) {
  for (i in seq(1, length(unique(df[, x])))) {
    if(keepNAs) {
      df[, paste(x,'.', i, sep = '')] <- ifelse(df[, x] != i, 0, 1)
    } else {
      df[, paste(x,'.', i, sep = '')] <- ifelse(df[, x] != i | is.na(df[, x]) , 0, 1)     
    }
  }
  df
}

The Descriptives R Helper Function produces a summary or table of the passed variable/object; it uses the number of unique values to determine whether to call just the summary() or summary() and table() functions. It also includes NAs by default in the tables (one of table()‘s biggest annoyances). Once the exploratory and data manipulation work is done, all output from this function can be suppressed by setting the hidetables object to TRUE.

The Dummy Variable R Helper Function creates indicator variables from all values of a variable. Based on experience, I avoid the factor object as much as possible and this approach allows me to quickly create indicators that can be used in any way I want.

If you don’t have a programming background or are just beginning with R, you might not have had time to realize the benefit of helper functions or identify the tasks you do repetitively, but it’s worthwhile to give the issue some consideration. Helper functions can be exceptionally useful for saving time on repetitive tasks or facilitating your work. They’re so useful in fact, that there is a special ProgrammingR event planned around helper functions coming soon. For the more experienced R programmers out there, make a mental note of the most useful helper functions you’ve written in the past. That list will come in handy in the near future!

Progress bars in R using winProgressBar

Using progress bars in R scripts can provide valuable timing feedback during development and additional polish to final products. winProgressBar and setWinProgressBar are the primary functions for creating progress bars in R.

Progress bars, and progress indicators in general, are relatively uncommon in R programming. This makes sense, as they can add bloat and, being design elements, they generally fall into the classification of “nice but not necessary”. However, during development, especially when using loops, progress bars can a cleaner way of tracking loop progress than, for example, printing iteration numbers. And for programmers who prepare scripts or packages for non-programmers, they add feedback that users have come to expect from other software.

To add progress bars in R scripts use the winProgressBar and setWinProgressBar functions. For non-Windows users there’s also a very similar Tcl/Tk version (tkProgressBar and settkProgressbar); depending on your current set up, you may need to install the tcltk library to use it.

Setting up a progress indicator to track the progress of a loop is very straightforward. First, initialize the display:

pb <- winProgressBar(title="Example progress bar", label="0% done", min=0, max=100, initial=0)

Use the title and label options to set the style of the display. The min and max options should use whatever values are most applicable to your task, but in most cases this will be displaying a percentage so a range of 0 to 100, starting at 0, makes sense.

After initializing the progress indicator, add your loop code and update the progress bar by calling the setWinProgressBar function at the end of each loop:

for(i in 1:100) {
  Sys.sleep(0.1) # slow down the code for illustration purposes
  info <- sprintf("%d%% done", round((i/100)*100))
  setWinProgressBar(pb, i/(100)*100, label=info)
}

# Once the loop is exited, close the progress bar window:

close(pb)

Here is a snapshot of the progress indicator in action:

Windows Progress Bar Example

To track the progress of an entire script or program, you just need to place the initializing function (winProgressBar) at the beginning of the code and do updates via setWinProgressBar at key points within the flow of your program. For example, if the first task your code performs usually takes about 10% of the total time required, set the value to 10% after that task completes.

If a popup progress bar doesn’t work for your task, there is a minimalist option, txtProgressBar, that by default draws a line in the console. It also allows for some fun customizations:

Text Progress Bar Example