Search Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Shelly Palmer, Lacey Thoms

Blog Feed Post

Reading data from the new version of Google Spreadsheets

Spreadsheets remain an important way for people to share and work with data. Among other providers, Google has provided the ability to create online spreadsheets and other documents. Back in 2009, David Smith posted a blog entry on how to use R, and specifically the XML package to import data from a Google Spreadsheet. Once you marked your Google sheet as exported, it took about two lines of code to import your data into a data frame. But things have changed More recently, it seems that Google changed and improved the Spreadsheet product. Google's own overview of changes lists some changes, but one change isn't on this list. In the previous version, it was possible to publish a sheet as a csv file. In the new version it is still possible to publish a sheet, but the ability to do this as csv is no longer there. On April 5, 2014 somebody asked a question on StackOverflow on how to deal with this. Because I had the same need to import data from a spreadsheet shared in our team, I set out to find and answer. Quick overview of publishing a sheet in the new version of Google docs To publish a Google Docs spreadsheet is really as simple as following these three steps: Create a google sheet Publish to web Copy the document link R code to read the Google data Here is the code. You will need to load the XML package before using this. The function readGoogleSheet() returns a list of data frames, one for each table found on the Google sheet: library(XML) readGoogleSheet <- function(url, na.string="">", doc) ret <- readhtmltable(htmltable, header=header, stringsasfactors=FALSE, as.data.frame=TRUE) lapply(ret, function(x){ x[ x == na.string] > 0){ dat <- dat[-seq_len(skip), ] } if(nrow(dat) 1) return(dat) >= 2){ if(all(is.na(dat[2, ]))) dat <- dat[-2, ] } if(header && nrow(dat)> 1){ header <- as.character(dat[1, ]) names(dat) > 0){ nrows <- min(nrows, nrow(dat)) dat >

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid