| By Jnan Dash | Article Rating: |
|
| October 10, 2012 05:16 PM EDT | Reads: |
1,881 |
Hadoop traces its origins to Google where two early projects GFS (Google File System) and GMR (Google Map Reduce) were written besides Big Table, to manage large volumes of data. These systems are great at crunching large volumes of data in a distributed computing environment (with commodity servers) in batch mode. Any changes to the data requires streaming over the entire data-set and thus big latency. So it is good for “Data in Rest” or static data.
Now Google finds itself limited by its own invention of GFS/GMR/BigTable. Hence they have been working on the post-Hadoop set of data crunching tools – Percolator, Dremel, and Pregel. Here is a brief narration of each of these tools.
Percolator is a system for incrementally processing updates to a large data set. By replacing a batch-based indexing system with one on incremental processing with Percolator, you significantly speed up the process and reduce analysis time. Percolator’s architecture provides horizontal scalability and resilience. The best candidates for this is large indexes where the performance improvement factor can be 100. The big advantage of Percolator is that the indexing time is now proportional to the size of the page, not to the size of the index.
Dremel is for ad-hoc analytics. It is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Dremel claims to be about 100 times faster than MapReduce. It’s architecture is similar to Pig and Hive, but instead of MapReduce, it’s engine is based on aggregator trees.
Pregel is a system for large-scale graph processing and graph data analysis. It is designed to execute graph algorithms faster and API is easy to use. As to be expected Pregel is architected for efficient, scalable, and fault-tolerant implementation on clusters of thousands of commodity computers. Graphs are everywhere – social networks, computer network topologies, games among soccer teams, citations among scientific papers, and the most pervasive graph is the web itself. Pregel is a scalable infrastructure to mine a wide range of graphs and programs are expressed as a sequence of iterations. Google has been using Pregel internally for some time now.
Besides Google, Facebook and Twitter are also working on new innovations. Recently Twitter released its Storm project to the Apache open source. One key trend is “Data in Motion”, or how to deal with data that is moving. This is the velocity aspect of Big Data.
Read the original blog entry...
Published October 10, 2012 Reads 1,881
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jnan Dash
Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- Google Compute enters the IaaS market
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo | Maximizing the Small Things: Efficiencies for Cloud Hardware
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Google Submits Concessions to EC; Gets Sued in the UK
- Global Mobile Security (mSecurity) Market 2013-2018
- RightScale Supports Windows Azure Infrastructure Services General Availability
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- RetailMeNot Shoppers Trend Report: While Over 8 in 10 U.S. Residents Cite Affordability as Their Top Vacation Priority, a Majority (58%) Could Waste Hundreds of Dollars by Booking Travel a la Carte
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- ChannelAdvisor Participates in Upcoming Retail Industry Conferences RBTE and Retail Week Live
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Google Compute enters the IaaS market
- How to Protect Your Facebook Account Before Graph Search is Public
- Google Says Motorola’s Upcoming Phones Don’t ‘Wow’ Them
- Why Cloud Computing Skills Will Be Required for IT Workers
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Where Are RIA Technologies Headed in 2008?
- Cloud People: A Who's Who of Cloud Computing
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- ManageWP Powers Over 100,000 WordPress Sites Within Three Months of Launch
- SEO/SEM Tips & Tricks: How and When Should You Submit Your Website to Google?
- Google Version 2.0: Googzilla - The Calculating Predator
- Google's Competitive Advantage: It Leverages "The Power of Free"
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Google Space Launches at Heathrow Airport
- AOL To Enhance Video Search Engine by Adding RSS Feeds
- Ulitzer’s Amazing First 30 Days in Public Beta
- The World's Youngest "Google Entrepreneur" Is One Month Old





















