Subset Data in R

How to Subset Data in R

This page will show you how to subset data in R. In R the command “subset” is used to filter the data in a data frame based on the criteria you set. The below will show you how subset works and provides some subset examples.

The first example selects the records in the data frame StudentData where Grade is 3 and copies those records back into the same data frame. Note that the Grade field is numeric.

# Subset data in R
StudentData<-subset(StudentData, Grade==3)

The problem with the above is that all the records where Grade is not 3 have been lost. In this example, R selects the records from the data frame StudentData where Grade is 3 and copies those records to a new data frame Grade3StudentData, preserving all of the records for later use.

# Subset data in R
Grade3Data<-subset(StudentData, Grade==3)

The next example shows that the criteria must be surrounded with quotes if the subset is based on a text field.. This example selects the records in the data frame StudentData where SchoolName is Pine Tree Elementary and copies those records to a new data frame PineTreeData.

# Subset data in R
PineTreeData<-subset(StudentData, SchoolName=="Pine Tree Elementary")

The next subset example shows how subset works with multiple criteria. In this case, records from StudentData where SchoolName is Pine Tree Elementary and Grade is 3 are copied to PineTreeGrade3Data. Note that AND in R is the ampersand &.

# Subset data in R
PineTreeGrade3Data<-subset(StudentData, SchoolName=="Pine Tree Elementary" & Grade==3)

Of course, you can combine the two criteria with OR instead of AND. Note that OR in R is the vertical bar |.

# Subset data in R
PineTreeGrade3Data<-subset(StudentData, SchoolName=="Pine Tree Elementary" | Grade==3)

And you can use greater than or less than:

# Subset data in R
Grade3orAboveData<-subset(StudentData, Grade>=3)
Or you could keep only the records with missing Grade by using the is.na command. Note that you cannot use Grade==NA because NA is null and so cannot be evaluated.
# Subset data in R
Grade3orAboveData<-subset(StudentData, is.na(Grade))
Or you could keep only the records without missing Grade by using the is.na command and the exclamation point, which means NOT in R. Note that you cannot use Grade!=NA because NA is null and so cannot be evaluated.
# Subset data in R
Grade3orAboveData<-subset(StudentData, !is.na(Grade))

There are other options that can be used with subset. See official R-manual page on subset to learn more: https://stat.ethz.ch/R-manual/R-devel/library/base/html/subset.html.

Practice

To practice subsetting data in R, try the exercises in this data manipulation tutorial.

Thanks for reading! This website took a great deal of time to create. If it was helpful to you, please show it by sharing with friends, liking, or tweeting! If you have any thoughts regarding this R code please post in the comments.

JM

Leave a Reply

Your email address will not be published. Required fields are marked *