| By David Smith | Article Rating: |
|
| February 20, 2013 07:17 PM EST | Reads: |
623 |
This guest post is by Tammer Kamel, Founder of Quandl
Finding and formatting numerical data for analysis in R or Excel or indeed any application is a pain that all real world data analysts know all too well. In aggregate I have probably spent weeks of my life trying to find data on the web. And several more weeks validating, formatting and cleaning the data. Analysis offers data scientists interesting, intellectually stimulating problems. But data acquisition, the necessary precursor, offers only tedium and pain. It's a time vampire.
The solution to this problem is conceptually obvious: one site with all the world’s data, nicely formatted and documented; an omni-platform. Platforms aspiring to this objective keep appearing and disappearing. They appear because they are great ideas. They disappear because they demand publishers upload and maintain data on an external site. Publishers don’t comply because they have enough work just maintaining the data in their own database, let alone someone else’s.
So, if the data won’t come to the platform the only alternative is the platform comes to the data. What does that mean? It means that to succeed in building a truly comprehensive data platform, you must ask nothing of data publishers. You have to create a solution that feeds off whatever the publisher is spitting out regardless of how absurdly the data might be published.
That’s what we're doing at Quandl. We've built a sort of "universal data parser" which has thus far parsed about 2.8 million datasets. We've asked nothing of any data publisher. As long as they spit out data somehow (excel, text file, blog post, xml, api, etc) the "Q-bot" can slurp it up.
The result is www.quandl.com as sort of "search engine" for numerical data. The idea with Quandl is that you can find data fast. And more importantly, once you find it, it is ready to use. This is because Quandl's bot returns data in a totally standard format. Which means we can then translate to any format a user wants.
Quandl is rich in financial, economic and sociological time series data. The data is easy to find. It is transparent to source. It can be easily merged with each other. It can be visualized and shared. It is all open. It is all free. There's much more about our vision on our about page.
From the start, Quandl delivered data in all the standard formats (Excel, csv, xml, json). We're now moving on to deliver data to applications in the exact format those apps demand their data. We're starting with R. We've done something simple to start. The next step for us is to complete an R package to be made available on CRAN.
In the near future we will be inviting (and indeed encouraging) Quandl users to "drive" the Quandl-bot themselves so that Quandl has the data they personally need. We're working towards building a sort of Wikipedia of numerical data. In the long term we hope to do to certain "closed data dinosaurs" what Jimmy Wales did to Britannica. In the short term, I would be very pleased if we could make Quandl a valuable resource for the R community. Read the original blog entry...
Published February 20, 2013 Reads 623
Copyright © 2013 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By David Smith
David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.< David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Google Compute enters the IaaS market
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo | Maximizing the Small Things: Efficiencies for Cloud Hardware
- Google Submits Concessions to EC; Gets Sued in the UK
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- Global Mobile Security (mSecurity) Market 2013-2018
- RightScale Supports Windows Azure Infrastructure Services General Availability
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- RetailMeNot Shoppers Trend Report: While Over 8 in 10 U.S. Residents Cite Affordability as Their Top Vacation Priority, a Majority (58%) Could Waste Hundreds of Dollars by Booking Travel a la Carte
- ChannelAdvisor Participates in Upcoming Retail Industry Conferences RBTE and Retail Week Live
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Flexera Software App Portal Release Now Integrated with Software License Optimisation & Application Readiness Solutions to Ensure Optimised Software Spend and Continuous Compliance
- Enter for a Chance to Win an Apple iPad Mini During the Grand Opening of Silverleaf, Taylor Morrison’s Latest New Home Community in Denver
- How to Protect Your Facebook Account Before Graph Search is Public
- Oracle Appeals Java Decision
- Google Compute enters the IaaS market
- Enterprise Social Adoption Challenges
- Where Are RIA Technologies Headed in 2008?
- Cloud People: A Who's Who of Cloud Computing
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- ManageWP Powers Over 100,000 WordPress Sites Within Three Months of Launch
- SEO/SEM Tips & Tricks: How and When Should You Submit Your Website to Google?
- Google Version 2.0: Googzilla - The Calculating Predator
- Google's Competitive Advantage: It Leverages "The Power of Free"
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Google Space Launches at Heathrow Airport
- AOL To Enhance Video Search Engine by Adding RSS Feeds
- Ulitzer’s Amazing First 30 Days in Public Beta
- The World's Youngest "Google Entrepreneur" Is One Month Old


























