Connect to Database in R

How to Connect to Database in R

This page will show you how to connect to database in R and return data. This requires the package RODBC.

Notes:

All of the examples on this page show how to connect to an Oracle database using SQL. The SQL is within the double quotes and the R code is outside of the double quotes.

You do not need to establish the connection to the database each time you query it. You can connect once and then continue to use that connection every time you want to query the database. I repeat creating the connection in each example below so that the examples stand on their own.

I’ve found that using the argument believeNRows=FALSE when creating the connection is necessary when using Windows 7 but not Windows XP. I haven’t been able to test using other operating systems. You may not need this. Without it I am able to connect but get the following error when querying the database.

Error in .Call(C_RODBCFetchRows, attr(channel, "handle_ptr"), max, buffsize, : 
 negative length vectors are not allowed

First Example: Return All Data from One Table

The first example shows how to connect to database in R and queries the database DATABASE and returns all of the data (this is specified using the * in SQL) from the table DATATABLE. The table is preceded by the database schema SCHEMA and separated by a period. Each of the words in all caps needs within the query needs to be replaced so that the query applies to your database.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
# If you are using operating system authentication (the computer already knows who you
# are because you are logged into it) you can leave out the uid="USERNAME", part.
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Check that connection is working (Optional)
odbcGetInfo(channel)

# Find out what tables are available (Optional)
Tables <- sqlTables(channel, schema="SCHEMA")

# Query the database and put the results into the data frame "dataframe"
 dataframe <- sqlQuery(channel, "
 SELECT *
 FROM
 SCHEMA.DATATABLE")

Second Example: Return Only Specific Fields

The second example shows how to connect to database in R and query the database DATABASE and pull only the specified fields from the table DATATABLE.

Note that loading the RODBC package and creating a connection does not have to be repeated if they were done in the first example. I include them in later examples so that each example is complete.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Find out what fields are available in the table (Optional)
# as.data.frame coerces the data into a data frame for easy viewing
Columns <- as.data.frame(colnames(sqlFetch(channel, "SCHEMA.DATATABLE")))

# Query the database and put the results into the data frame "dataframe"
 dataframe <- sqlQuery(channel, "
 SELECT SCHOOL,
 STUDENT_NAME
 FROM
 SCHEMA.DATATABLE")

Third Example: Return Only Specific Fields and Records

The third example shows how to connect to database in R and query the database DATABASE and pull only the specified fields from the table DATATABLE, excluding records that don’t meet the criteria specified (SCHOOL_YEAR=’2011-12′).

Note that the field that is being used to filter records does not have to be included in the returned results.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Query the database and put the results into the data frame "dataframe"
dataframe <- sqlQuery(channel, "
 SELECT SCHOOL,
 STUDENT_NAME
 FROM
 SCHEMA.DATATABLE
 WHERE
 SCHOOL_YEAR='2011-12'")

Fourth Example: Joining Two Tables and Returning Only Specific Fields and Records

The fourth example queries the database DATABASE and pulls only the specified fields from the tables DATATABLE and DATATABLE_TWO, excluding records that don’t meet the criteria specified (SCHOOL_YEAR=’2011-12′ and PARENT_AGENCY_NAME=’Pine Tree District’).

Note that this example also renames DATATABLE to DT and DATATABLE_TWO to DTTWO. Renaming isn’t necessary but is often used in SQL queries.

Note also that this example demonstrates renaming of fields within the SQL query using “as”.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Query the database and put the results into the data frame "dataframe"
 dataframe <- sqlQuery(channel, "
 SELECT
 DT.SCHOOL_YEAR,
 DTTWO.DISTRICT_NAME AS DISTRICT,
 DTTWO.SCHOOL_NAME AS SCHOOL,
 DT.GRADE_LEVEL AS GRADE,
 DT.ACTL_ATT_DAYS AS ACTUAL_DAYS,
 DT.POSS_ATT_DAYS AS POSSIBLE_DAYS
 FROM
 (SCHEMA.DATATABLE DT INNER JOIN SCHEMA.DATATABLE_TWO DTTWO
 ON (DT.SCHOOL_YEAR = DTTWO.SCHOOL_YEAR AND
 DT.SCHOOL_NUMBER = DTTWO.SCHOOL_CODE))
 WHERE
 DT.SCHOOL_YEAR = '2011-12' AND
 DTTWO.SCHOOL_NAME = 'Pine Tree Elementary School'")

Fifth Example: Using a Parameter from R to Return Only Specific Records

The fifth example queries the database DATABASE and pulls only the specified fields from the table DATATABLE, excluding records that don’t meet the criteria specified using the parameter YEAR.

Note that the below must specify that sep=”” or it will return nothing. That is because the default separator for paste is a space, meaning that your SQL query would be looking for records where SCHOOL_YEAR=” 2010-11 “.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Parameter
YEAR <- "2010-11"

# Query the database and put the results into the data frame "dataframe"
 dataframe <- sqlQuery(channel, paste("
 SELECT
 YEAR,
 SCHOOL_YEAR,
 DISTRICT_CODE,
 GRADE_LEVEL
 FROM SCHEMA.DATATABLE
 WHERE
 ((SCHEMA.DATATABLE.SCHOOL_YEAR='", YEAR, "'))
 ", sep=""
 ))

Sixth Example: Using a Parameter from R with More Than One Value

The sixth example shows how to use R to query a database when you want to return rows that match any one of multiple values in a column. For example, if you have a table that contains rows from many years and want to return all rows that are from 2012, 2013, and 2014.

# Load RODBC package
library(RODBC)

# Create a connection to the database called "channel"
channel <- odbcConnect("DATABASE", uid="USERNAME", pwd="PASSWORD", believeNRows=FALSE)

# Parameter
YEARS <- c("2012", "2013", "2014")

# Query the database and put the results into the data frame "dataframe"
dataframe <- sqlQuery(channel, paste("SELECT 
 YEAR,
 SCHOOL_YEAR,
 DISTRICT_CODE,
 GRADE_LEVEL
 FROM SCHEMA.DATATABLE 
 WHERE SCHEMA.DATATABLE.SCHOOL_YEAR IN ('", paste(YEARS, collapse = "', '"), "') 
 ", sep=""))

There are other options that can be used with RODBC. See official R-manual page on read.csv to learn more: http://cran.r-project.org/web/packages/RODBC/index.html.

Thanks for reading! This website took a great deal of time to create. If it was helpful to you, please show it by sharing with friends, liking, or tweeting! If you have any thoughts regarding this R code please post in the comments.

JM

10 thoughts on “Connect to Database in R

  1. Sham

    I get below error when I execute the statement you gave to connect. I have also created DSN in User DSN and System DSN on my PC. I am using Windows 7

    channel <- odbcConnect("connection_name", uid="username", pwd="password", believeNRows=FALSE)

    Warning messages:
    1: In odbcDriverConnect("DSN=connection_name;UID=username;PWD=password", :
    [RODBC] ERROR: state IM002, code 0, message [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
    2: In odbcDriverConnect("DSN=connection_name;UID=username;PWD=password", :
    ODBC connection failed

    Reply
    1. Justin@RProgramming.net Post author

      I’m sorry, I don’t have a good answer for this one. I suspect there is still a problem with the DSN.

      Note that I replaced the example connection, username and password. I couldn’t tell if those were real or not.

      Reply
  2. Bosede

    How do you connect to RSTUDIO by using OS Authentication with no user name or password using RODBC?
    Thanks

    Reply
    1. Justin@RProgramming.net Post author

      If you are using operating system authentication (the computer already knows who you are because you are logged into it) you can leave out the ‘uid=”USERNAME”,’ part. I’m haven’t worked on a system where a password wasn’t required though. Have you tried leaving out both the uid and pwd arguments?

      Reply
  3. Humberto Munoz Barona

    I am running this R code with my data which has 4 columns, but only two are used. I am getting some errors that cannot fix, I will appreciate any help.

    Thanks

    countToTpm <- function(counts, effLen)
    {
    rate <- log(counts) – log(effLen)
    denom <- log(sum(exp(rate)))
    exp(rate – denom + log(1e6))
    }

    countToFpkm <- function(counts, effLen)
    {
    N <- sum(counts)
    exp( log(counts) + log(1e9) – log(effLen) – log(N) )
    }

    fpkmToTpm <- function(fpkm)
    {
    exp(log(fpkm) – log(sum(fpkm)) + log(1e6))
    }

    countToEffCounts data1 cnts lens countDf
    > # assume a mean(FLD) = 170.71
    > countDf$effLength <- countDf$length – 170.71 + 1
    Error in `$ countDf$tpm countDf$fpkm with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
    Error in all.equal(tpm, fpkmToTpm(fpkm)) : object ‘tpm’ not found
    > countDf$effCounts <- with(countDf, countToEffCounts(count, length, effLength))
    Error in countToEffCounts(count, length, effLength) :
    object 'count' not found

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *