How to Connect to Database in R
This page will show you how to connect to database in R and return data. This requires the package RODBC.
Notes:
All of the examples on this page show how to connect to an Oracle database using SQL. The SQL is within the double quotes and the R code is outside of the double quotes.
You do not need to establish the connection to the database each time you query it. You can connect once and then continue to use that connection every time you want to query the database. I repeat creating the connection in each example below so that the examples stand on their own.
I’ve found that using the argument believeNRows=FALSE when creating the connection is necessary when using Windows 7 but not Windows XP. I haven’t been able to test using other operating systems. You may not need this. Without it I am able to connect but get the following error when querying the database.
Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : negative length vectors are not allowed
First Example: Return All Data from One Table
The first example shows how to connect to database in R and queries the database DATABASE and returns all of the data (this is specified using the * in SQL) from the table DATATABLE. The table is preceded by the database schema SCHEMA and separated by a period. Each of the words in all caps needs within the query needs to be replaced so that the query applies to your database.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" # If you are using operating system authentication (the computer already knows who you # are because you are logged into it) you can leave out the uid="USERNAME", part. channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Check that connection is working (Optional) odbcGetInfo(channel) # Find out what tables are available (Optional) Tables <- sqlTables(channel, schema="SCHEMA") # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, " SELECT * FROM SCHEMA.DATATABLE")
Second Example: Return Only Specific Fields
The second example shows how to connect to database in R and query the database DATABASE and pull only the specified fields from the table DATATABLE.
Note that loading the RODBC package and creating a connection does not have to be repeated if they were done in the first example. I include them in later examples so that each example is complete.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Find out what fields are available in the table (Optional) # as.data.frame coerces the data into a data frame for easy viewing Columns <- as.data.frame(colnames(sqlFetch(channel, "SCHEMA.DATATABLE"))) # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, " SELECT SCHOOL, STUDENT_NAME FROM SCHEMA.DATATABLE")
Third Example: Return Only Specific Fields and Records
The third example shows how to connect to database in R and query the database DATABASE and pull only the specified fields from the table DATATABLE, excluding records that don’t meet the criteria specified (SCHOOL_YEAR=’2011-12′).
Note that the field that is being used to filter records does not have to be included in the returned results.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, " SELECT SCHOOL, STUDENT_NAME FROM SCHEMA.DATATABLE WHERE SCHOOL_YEAR='2011-12'")
Fourth Example: Joining Two Tables and Returning Only Specific Fields and Records
The fourth example queries the database DATABASE and pulls only the specified fields from the tables DATATABLE and DATATABLE_TWO, excluding records that don’t meet the criteria specified (SCHOOL_YEAR=’2011-12′ and PARENT_AGENCY_NAME=’Pine Tree District’).
Note that this example also renames DATATABLE to DT and DATATABLE_TWO to DTTWO. Renaming isn’t necessary but is often used in SQL queries.
Note also that this example demonstrates renaming of fields within the SQL query using “as”.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, " SELECT DT.SCHOOL_YEAR, DTTWO.DISTRICT_NAME AS DISTRICT, DTTWO.SCHOOL_NAME AS SCHOOL, DT.GRADE_LEVEL AS GRADE, DT.ACTL_ATT_DAYS AS ACTUAL_DAYS, DT.POSS_ATT_DAYS AS POSSIBLE_DAYS FROM (SCHEMA.DATATABLE DT INNER JOIN SCHEMA.DATATABLE_TWO DTTWO ON (DT.SCHOOL_YEAR = DTTWO.SCHOOL_YEAR AND DT.SCHOOL_NUMBER = DTTWO.SCHOOL_CODE)) WHERE DT.SCHOOL_YEAR = '2011-12' AND DTTWO.SCHOOL_NAME = 'Pine Tree Elementary School'")
Fifth Example: Using a Parameter from R to Return Only Specific Records
The fifth example queries the database DATABASE and pulls only the specified fields from the table DATATABLE, excluding records that don’t meet the criteria specified using the parameter YEAR.
Note that the below must specify that sep=”” or it will return nothing. That is because the default separator for paste is a space, meaning that your SQL query would be looking for records where SCHOOL_YEAR=” 2010-11 “.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Parameter YEAR <- "2010-11" # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, paste(" SELECT YEAR, SCHOOL_YEAR, DISTRICT_CODE, GRADE_LEVEL FROM SCHEMA.DATATABLE WHERE ((SCHEMA.DATATABLE.SCHOOL_YEAR='", YEAR, "')) ", sep="" ))
Sixth Example: Using a Parameter from R with More Than One Value
The sixth example shows how to use R to query a database when you want to return rows that match any one of multiple values in a column. For example, if you have a table that contains rows from many years and want to return all rows that are from 2012, 2013, and 2014.
# Load RODBC package library(RODBC) # Create a connection to the database called "channel" channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE) # Parameter YEARS <- c("2012", "2013", "2014") # Query the database and put the results into the data frame "dataframe" dataframe <- sqlQuery(channel, paste("SELECT YEAR, SCHOOL_YEAR, DISTRICT_CODE, GRADE_LEVEL FROM SCHEMA.DATATABLE WHERE SCHEMA.DATATABLE.SCHOOL_YEAR IN ('", paste(YEARS, collapse = "', '"), "') ", sep=""))
There are other options that can be used with RODBC. See official R-manual page on read.csv to learn more: http://cran.r-project.org/web/packages/RODBC/index.html.
Practice
To practice getting data from a database with R, try the exercises in this importing data into R tutorial.
Thanks for reading! This website took a great deal of time to create. If it was helpful to you, please show it by sharing with friends, liking, or tweeting! If you have any thoughts regarding this R code please post in the comments.
I get below error when I execute the statement you gave to connect. I have also created DSN in User DSN and System DSN on my PC. I am using Windows 7
channel <- odbcConnect("connection_name", uid="username", pwd="password", believeNRows=FALSE)
Warning messages:
1: In odbcDriverConnect("DSN=connection_name;UID=username;PWD=password", :
[RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
2: In odbcDriverConnect("DSN=connection_name;UID=username;PWD=password", :
ODBC connection failed
I’m sorry, I don’t have a good answer for this one. I suspect there is still a problem with the DSN.
Note that I replaced the example connection, username and password. I couldn’t tell if those were real or not.
FYI, you need to remove the “,” after STUDENT_NAME in your second example. Thanks for the documentation though.
You’re right! I fixed it. Thanks!
Thanks!
You’re welcome!
How do you connect to RSTUDIO by using OS Authentication with no user name or password using RODBC?
Thanks
If you are using operating system authentication (the computer already knows who you are because you are logged into it) you can leave out the ‘uid=”USERNAME”,’ part. I’m haven’t worked on a system where a password wasn’t required though. Have you tried leaving out both the uid and pwd arguments?
I am running this R code with my data which has 4 columns, but only two are used. I am getting some errors that cannot fix, I will appreciate any help.
Thanks
countToTpm <- function(counts, effLen)
{
rate <- log(counts) – log(effLen)
denom <- log(sum(exp(rate)))
exp(rate – denom + log(1e6))
}
countToFpkm <- function(counts, effLen)
{
N <- sum(counts)
exp( log(counts) + log(1e9) – log(effLen) – log(N) )
}
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) – log(sum(fpkm)) + log(1e6))
}
countToEffCounts data1 cnts lens countDf
> # assume a mean(FLD) = 170.71
> countDf$effLength <- countDf$length – 170.71 + 1
Error in `$ countDf$tpm countDf$fpkm with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
Error in all.equal(tpm, fpkmToTpm(fpkm)) : object ‘tpm’ not found
> countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
Error in countToEffCounts(count, length, effLength) :
object 'count' not found
I’m sorry, I’m not sure how to help with this. You may want to try Stack Overflow