How to Recode Data in R
This page will show you how to recode data in R by either replacing data in an existing field or recoding into a new field based on criteria you specify. This page first addresses how to recode in base R. If you’re looking for information on the recode() command in the package car, scroll to the bottom.
Replace Data in an existing field In R
The first example shows how to replace the data in an existing field when you want to replace the data for every row (no criteria). This code replaces any data that is already in the field Grade in the data frame SchoolData with the number 5, the text string five, or NA.
# Replace all the data in a field with a number
SchoolData$Grade <- 5
# Replace all the data in a field with with text
SchoolData$Grade <- "Five"
# Replace all the data in a field with NA (missing data)
SchoolData$Grade <- NA
The second example shows how to apply criteria so that only data in specific rows is replaced. Note that if you want to replace NA with some value you cannot use ==NA. You must use is.na(). See below for an example.
# Replace the data in a field based on equal to some value SchoolData$Grade[SchoolData$Grade==5] <- "Grade Five" # Or replace based on greater than or equal to some value SchoolData$Grade[SchoolData$Grade<=5] <- "Grade Five or Less" # Or replace based on equal to some text SchoolData$Grade[SchoolData$Grade=="Five"] <- "Grade Five" # Or replace only missing data # Note that ==NA does not work! SchoolData$Grade[is.na(SchoolData$Grade)] <- "Missing Grade"
The third example shows how to replace data based on more than one criteria. This code creates a new field SchoolType and enters “Elementary” into it for all rows where Grade is less than or equal to 5 and SchoolStatus is OPEN.
# Replace data based on the values in more than one field SchoolData$SchoolType[SchoolData$Grade<=5 & SchoolStatus=="OPEN"] <- "Elementary School"
Recode into a new field
The fourth example shows how to make a copy of an existing field. Sometimes you don’t want to recode data but instead just want another column containing the same data. This example makes a new column called CopyOfGrade and fills it with the data from Grade. This isn’t exactly recoding but is related and comes up a lot since it is usually a good idea to make a copy of a field and then do the recoding on the copy rather than on the original.
# Copy a column in R
# First create the new column
SchoolData$CopyOfGrade <- NA
# Then copy the data from the existing column into the new one.
SchoolData$CopyOfGrade <- SchoolData$Grade
The fifth example shows how to recode data into a new numeric field based on criteria from a numeric field. Note that with numeric fields you do not surround the value with quotation marks. With a character field you do surround the value with quotation marks (next example).
This example creates a new field called NewGrade based on the field Grade. Note that, as with the above examples, you can again use & or any of the other operators to produce the criteria you want.
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$Grade==5] <- 5
The sixth example shows how to recode data into a new character field based on criteria from a numeric field. This example again creates a new field called NewGrade based on the field Grade.
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$Grade==5] <- "Grade Five"
The seventh example shows how to recode data into a new character field based on criteria from a character field. This example creates a new field called NewGrade based on the field Grade.
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$Grade=="Grade Five"] <- "Grade Five"
The eighth example shows how to recode data into a new numeric field based on criteria from a character field. This example again creates a new field called NewGrade based on the field Grade.
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$Grade=="Grade Five"] <- 5
Recode into A New Field Using Data From An Existing field And Criteria from Another Field
This is where things get a little weird. If you want to recode data into a field and pull that data from another field, you have to specify the criteria on both sides of the <-. If you don’t, R will still recode but you won’t get the results you’re expecting. For example, let’s say you want to copy the data from Grade into NewGrade but only where SchoolType is “Elementary”. You might think that this will work:
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$SchoolType=="Elementary"] <- SchoolData$Grade
And it will work! But you won’t get the results you are expecting. R won’t copy the data from Grade for only the rows where SchoolType is Elementary. Instead, it will start at the top of the data frame and copy each row. To recode correctly you have to specify the criteria on both sides of the <-, as in example nine:
# Recode into a new field in R # First create the new field StudentData$NewGrade <- NA # Then recode the old field into the new one for the specified rows SchoolData$NewGrade[SchoolData$SchoolType=="Elementary"] <- SchoolData$Grade[SchoolData$SchoolType=="Elementary"]
The Recode Command From the Package Car
The recode() command from the car package is another great way to recode data in R. Recode from car can be very powerful and is a good alternative to the code above.
If you want to recode from car you have to first install the car package and then load it for use.
# Install the car package
install.packages("car")
# Load the car package
library(car)
Now recode Grade from 5 to 6:
# Recode grade 5 to grade 6 SchoolData$Grade<-recode(SchoolData$Grade,"5=6")
If you want to recode based on text, use the ‘ mark around the text.
Now recode Grade from 5 to 6:
# Recode grade 5 to grade 6 SchoolData$Grade<-recode(SchoolData$Grade,"'Grade Five'=5")
To set recode multiple values use c()
# Recode grade 5 to grade 6 SchoolData$Grade<-recode(SchoolData$Grade,"c(1,2,3,4,5)='Five or Less'")
Recode can recode data into a new field. This code creates a new field called NewGrade based on Grade. Note that if you don’t specify that value is recoded R will just copy the existing value into the new field.
# Create a new field called NewGrade SchoolData$NewGrade <- recode(SchoolData$Grade,"5='Elementary'")
Of course, you can convert a value to NA, or NA to a value.
# Recode grade 3 to NA SchoolData$Grade<-recode(SchoolData$Grade,"3=NA") # Or recode NA to 7 SchoolData$Grade <- recode(SchoolData$Grade,"NA=7")
One advantage to recode is that it can recode multiple values in one line of code.
# Recode grade 5 to grade 6 and grade 6 to grade 7 SchoolData$Grade<-recode(SchoolData$Grade,"5=6;6=7")
Another advantage to recode is that it makes using ranges easy.
# Recode grades 1 through 5 to Elementary SchoolData$Grade<-recode(SchoolData$Grade,"1:5='Elementary'")
One more advantage to recode is that it includes the use of the commands lo and hi to specify a range. Lo tells recode to start the range at the lowest value. Hi tells recode to end the range at the highest value.
# Recode the lowest grade through 5 to Elementary
SchoolData$Grade<-recode(SchoolData$Grade,"lo:5='Elementary'")
# Recode grade 9 to the highest grade to High School
SchoolData$Grade<-recode(SchoolData$Grade,"9:hi='High School'")
A final advantage to recode is that it includes the use of the command else to to specify a what to do with any value that was not already recoded. The following converts grades 1 through 5 to Elementary, 6 through 8 to Middle, and all other grades (including NA) to high.
# Recode grades SchoolData$Grade<-recode(SchoolData$Grade,"1:5='Elementary';6:8='Middle;else='High'")
There are other options that can be used with recode in car. See official R-manual page on read.csv to learn more: http://cran.r-project.org/web/packages/car/car.pdf.
Recode All Columns in a Data Frame in One Step
The examples I’ve shown you so far allow you to recode one variable at a time. Sometimes you might want to recode all of your variables. You can do this by just repeating your recode command over and over but this can be a lot of typing if you have hundreds of variables.
To show how this works, lets get some data to practice with. We can use the ChickWeight data set that is included in the base R installation. Don’t be alarmed if you see <promise> in your Global Environment when you run the below line. This is just R’s way of saying it will load the data when you need it.
# Create practice data with two columns called V1 and V2 data <- as.data.frame(matrix(0, ncol = 2, nrow = 5)) data$V1 <- c(1, 0, 0, 1, 1) data$V2 <- c(0, 0, 1, 0, 0)
Let’s recode the number 1 in the practice data frame to 777. This code uses both function() and apply() but teaching about these is beyond the scope of this article.
# Recode 1 to 777 wherever 1 appears in the practice data frame data <- apply(data, 2, function(x) {x[x == 1] <- 777; x})
You can use the recode command from the car library to recode multiple values at the same time across all the variables in the dataset using the above method.
# Create practice data with two columns called V1 and V2 data <- as.data.frame(matrix(0, ncol = 2, nrow = 5)) data$V1 <- c(1, 0, 0, 1, 1) data$V2 <- c(0, 0, 1, 0, 0) # Recode 1 to 777 wherever 1 appears in the practice data frame # Recode 0 to 888 wherever 0 appears in the practice data frame library(car) data <- apply(data, 2, function(x) {x <- recode(x,"1=777; 0=888"); x})
Practice
To practice recoding data in R, try the exercises in this introduction to R.
Thanks for reading! This website took a great deal of time to create. If it was helpful to you, please show it by sharing with friends, liking, or tweeting! If you have any thoughts regarding this R code please post in the comments.
This was greatly useful to me. Thanks for your efforts!
Your example 10 helped out [Recode into A New Field Using Data From An Existing field And Criteria from Another Field]! Very frustrating when R does not perform as ‘expected’
Happy to help!
I have multiple variables that represent Adverse Event severity that are currently integer variables and can be have values 0-6. A score of 0 should actually be NA (which I have been able to do). I need to create a new factor variable where if any of the variables are scored 3 or greater will be one level and variables scored 1 or 2 will be another level. Do you have any suggestions?
See the examples above. You should be able to do something like:
library(car)
data$adverse_event_new <- recode(data$adverse_event_original,"0=NA; 1:2='level_one'; 3:6='level_two'") If adverse_event_new doesn't come out as a factor you can use something like: data$adverse_event_new <- as.factor(data$adverse_event_new) Good luck!
Hello! I am trying to use the sapply function to rename my variables 0=”false” 1=”positive” in R. However, I am receiving an error message, and I assume it is because I have values that have “NA” for missing. What do you suggest I use when writing my code? Currently, I am using an else and if else statement, but I am still receiving an error message.
Current Code:
tvdlm2 <- data$tvdlm
tvdlm2 <- sapply(tvdlm2, function(y) {if (y!=1) "false" if (y=1) "positive" else "NA"})
I think the problem is that you have y=1. You need y==1.
I tried my above example with one of the values replaced with an NA and I didn’t get an error message:
data <- as.data.frame(matrix(0, ncol = 2, nrow = 5)) data$V1 <- c(1, 0, 0, 1, 1) data$V2 <- c(0, 0, 1, 0, 0) data[1,1] <- NA data <- apply(data, 2, function(x) {x[x == 1] <- 777; x}) data V1 V2 [1,] NA 0 [2,] 0 0 [3,] 0 777 [4,] 777 0 [5,] 777 0
I’m just now learning how to use R after years of using STATA. I was struggling to figure out how to do something as simple as recode a variable, and all of the explanations depended on “ifelse,” which was not exactly what I needed.
Thank you for doing this write up and showing the broader range of options for how we could recode our data! Especially this bit:
# Replace data based on the values in more than one field
SchoolData$SchoolType[SchoolData$Grade<=5 & SchoolStatus=="OPEN"] <- "Elementary School"
Thanks again.
Happy to help!