| By Dave Graham | Article Rating: |
|
| January 28, 2009 09:15 AM EST | Reads: |
12,193 |
The basic ideology of Cloud Optimized Storage Solutions, as noted in the three previous installments, is to ingest significant amounts of both structured and unstructured content and, operating within the confines of SLAs and tiering, provide this data back to users with acceptable performance.
In the previous three Cloud Optimized Storage Solution (COSS) articles in this series, I’ve discussed the content being stored, the method of storage, as well as principles derived from data tiering. Today, I want to jump ahead a bit and discuss how neural networks and heuristics can impact the processing of object and file data for the cloud.
One of the more recent advancements within computing has been the application of heuristics and neural networking. Heuristics is defined as being “…an educational method in which learning takes place through discoveries that result from investigations…“ While heuristics has historically been used in such products like anti-virus software, it provides an incredible wealth of capability and technology for the COSS solution. Similarly, neural networks provide capacitive understanding of processing layers and optimizations that learn patterns based on underlying statistical data. How do these two technologies apply to COSS?
The basic ideology of COSS, as noted in the previous parts of this paper, is to ingest significant amounts of both structured and unstructured content and, operating within the confines of SLAs and tiering, provide this data back to users with acceptable performance. While fairly reductionistic in nature, it is how the data is allocated to storage that provides the greatest insight into the impact that neural nets and heuristics can potentially have. To illustrate this point, here is a graphical example of file placement within COSS without using heuristics.
As seen below, data is submitted to COSS by API or other integration point, meta data is calculated for said object based on pre-defined categories of content understanding (i.e. “Movies”) and content is placed in Tier 1 for faster access and greater availability. Policy is enacted on this movie object such that it is automatically moved from Tier 1 to Tier 2 after a fixed period of time and again to Tier 3 based on similar time constraints. Globally, policy is additional set for compression, encryption, deduplication, and optimizations and this is applied for content at rest as well as incoming data. Once data has been moved from tier to tier, there is no really process for retrieving that data and promoting it to a different tier based on access or usage patterns.

While this example is extremely reductionistic, it highlights the particular areas where neural nets and heuristics can be applied to approve both the way that data is ingested but also how it is maintained across its lifespan (i.e until delete). In essence, COSS, under this particular model, is administrator-enforced. Here, then, is an example of data ingest to COSS with neural nets and heuristics enabled:

Almost immediately, it becomes apparent that COSS is taking a more active role in the ingest and storage allocation for the file data. Instead of having a global category created (i.e. “Movies”), COSS applies bit-patterning and packet inspection to the data being ingested to determine file composition. Such inspection has several significant implications: less time spent applying policy enhancements such as deduplication/encyption (storage processor intensive) and more time optimizing content layout and placement within tiers (default becomes Tier 2: accessibility and performance). Once the data is inspected, it is determined to be of a certain type (i.e. application/x-octet stream) and placed in a default tier (Tier 2). COSS recognizes that this data is already in a compressed state and rules out compression and deduplication policies and potentially, depending on source/API mapping, rules out encryption policies. Once data is at rest on Tier 2, COSS watches file access patterns to determine when and how it is being accessed. If statistical trending against that file starts showing increased access, COSS will promote the file to a higher tier for more adequate performance and access. If the trending notices a decline in traffic to that file, it can demote it to Tier 2, Tier 3, etc. without affecting surrounding data.
Implications for Global Implementations
The examples above highlighted policies and actions on a single file or object but when it is extrapolated out to the COSS system on a global level, it becomes a much more powerful tool. In essence, the heuristic database and neural network capabilities can be applied to linked COSS systems for global replication and file/object processing. As patterning is completed against file types and categories are created or designed by the engine, the resulting database can be asynchronously updated to other members of the larger COSS network. This replication would make use of recursive heuristic database updates to ensure consistency against the other COSS members and to ensure that data residing across all COSS members was categorized and tagged appropriately. Additionally, since one of the mechanisms for data protection with COSS is to utilize multiple data replicas for redundancy, it serves the additional purpose of spreading the database for protection purposes.
Implications for Heuristic Processing and Control
The additional processing overhead that heuristic analysis brings to the fore an added layer of complexity in implementation and design. Given that COSS is designed to utilize commodity hardware with the differentiating feature being the actual software “brains,” the added performance burden of a heuristic model might seem untenable for basic implementations. However, as recent research has shown, the simple addition of a General Purpose Graphical Processing Unit (GPGPU) to the COSS hardware to offload these more complex routines would fit within the paradigm of commodity hardware. By coding to specific GPGPU routines (as evidenced by the research into WPA key decode, for example) based on nVidia’s CUDA specifications, for example, the heuristic branch paths could be removed from the general storage operation paths handled by the storage system processor. Since each GPGPU typically has ownership of a local, low latency cache (e.g. GDDR4) and has multiple programmable vector units, the ability to process large sets of data is assured.
One area that would need to be addressed with the use of GPGPUs for heuristic programming is the issue of redundancy. Given that no methodology currently exists to maintain GPGPU functionality across two discrete units in a single system, either the programming path would need to account for multiple GPGPU engines within the general I/O complex or it would need to be designed into the heuristic path. In a clustered front end I/O stack (a la EMC’s Atmos), it would be a simple matter of having a GPGPU per individual node member with the overall software stack to process the heuristic path in a parallel fashion.
Published January 28, 2009 Reads 12,193
Copyright © 2009 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Dave Graham
Dave Graham is a Technical Consultant with EMC Corporation where he focused on designing/architecting private cloud solutions for commercial customers.
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- Google Compute enters the IaaS market
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo | Maximizing the Small Things: Efficiencies for Cloud Hardware
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Google Submits Concessions to EC; Gets Sued in the UK
- Infinity Augmented Reality and Technical Evangelist Robert Scoble Are Enthusiastic About the Upcoming Release of Google Glass
- Global Mobile Security (mSecurity) Market 2013-2018
- Cloud People: A Who's Who of Cloud Computing
- Enterasys Spotlights SDN's Impact on Traditional Networking in Upcoming Webinar
- RetailMeNot Shoppers Trend Report: While Over 8 in 10 U.S. Residents Cite Affordability as Their Top Vacation Priority, a Majority (58%) Could Waste Hundreds of Dollars by Booking Travel a la Carte
- NASA's Twitter Account Wins Back-To-Back Shorty Awards
- ChannelAdvisor Participates in Upcoming Retail Industry Conferences RBTE and Retail Week Live
- Basho Announces Open Source Riak CS and General Availability of Riak CS Enterprise v1.3
- Google Compute enters the IaaS market
- How to Protect Your Facebook Account Before Graph Search is Public
- Google Says Motorola’s Upcoming Phones Don’t ‘Wow’ Them
- Why Cloud Computing Skills Will Be Required for IT Workers
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Where Are RIA Technologies Headed in 2008?
- Cloud People: A Who's Who of Cloud Computing
- Dolphin Announces Open API With Over 50 Add-ons Including Dropbox and Wikipedia
- ManageWP Powers Over 100,000 WordPress Sites Within Three Months of Launch
- SEO/SEM Tips & Tricks: How and When Should You Submit Your Website to Google?
- Google Version 2.0: Googzilla - The Calculating Predator
- Google's Competitive Advantage: It Leverages "The Power of Free"
- Cloud Expo 2011 East To Attract 10,000 Delegates and 200 Exhibitors
- Google Space Launches at Heathrow Airport
- AOL To Enhance Video Search Engine by Adding RSS Feeds
- Ulitzer’s Amazing First 30 Days in Public Beta
- The World's Youngest "Google Entrepreneur" Is One Month Old






















