Search Authors: Jason Bloomberg, Roger Strukhoff, Liz McMillan, Yeshim Deniz, Shelly Palmer

Blog Feed Post

How to select and merge R data frames with SQL

The R language provides many features in the language for selecting data from data frames: the "[" operator, logical functions, and utility functions like "subset". But if you know SQL (the query language ubiquitous in database systems), none of this is necessary. With the sqldf package, you can just pretend that your data frame is a database, and use SQL directly.  The sqldf function supports the full richness of the SQL language, but applied to data frames in R's memory. This includes: SELECT ... WHERE statements to select rows and columns according to logical criteria CASE clauses, for queries with special cases ORDER BY statements, to sort the resulting data according to specified columns LEFT JOIN and INNER JOIN statements for merging data frames The sqldf package uses its own internal database engine, so there's no special database configuration you need to do. Just enter the following in R: install.packages("sqldf") library(sqldf) and you should be good to go. The SQLDF FAQ is a good resource for getting started, and this sqldf video tutorial from Keystone Solutions shows the sqldf package in action with examples of SQL queries from simple to complex. Google code: sqldf: SQL select on R data frames

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid