Click here to close now.

Welcome!

Search Authors: Kelly Murphy, Carmen Gonzalez, Elizabeth White, AppDynamics Blog, Pat Romanski

Related Topics: Cloud Expo, Web 2.0

Cloud Expo: Article

100 Minutes Without Gmail: Google Explains Outage

"We've turned our full attention to helping ensure this kind of event doesn't happen again"

"We've turned our full attention to helping ensure this kind of event doesn't happen again," wrote Ben Treynor, self-described 'VP Engineering and Site Reliability Czar' at Google, in an official blogged explanation last night of the 100-minute Gmail outage yesterday, which Treynor conceded "was a Big Deal, and we're treating it as such."

He then described in detail the events that conspired to bring about the outage:

"Here's what happened: This morning (Pacific Time) we took a small fraction of Gmail's servers offline to perform routine upgrades. This isn't in itself a problem — we do this all the time, and Gmail's web interface runs in many locations and just sends traffic to other locations when one is offline.

However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn't access Gmail via the web interface because their requests couldn't be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don't use the same routers.

The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously). After establishing that the core problem was insufficient available capacity, the team brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google's architecture), distributed the traffic across the request routers, and the Gmail web interface came back online."

Treynor's post ended with a detailed explanation of Google's plans to prevent a repeat of the same problem:

"What's next ... Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom. Some of the actions are more subtle — for example, we have concluded that request routers don't have sufficient failure isolation (i.e. if there's a problem in one datacenter, it shouldn't affect servers in another datacenter) and do not degrade gracefully (e.g. if many request routers are overloaded simultaneously, they all should just get slower instead of refusing to accept traffic and shifting their load). We'll be hard at work over the next few weeks implementing these and other Gmail reliability improvements."

He ends, "Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity."

More Stories By Jeremy Geelan

Jeremy Geelan is Chairman & CEO of the 21st Century Internet Group, Inc. and an Executive Academy Member of the International Academy of Digital Arts & Sciences. Formerly he was President & COO at Cloud Expo, Inc. and Conference Chair of the worldwide Cloud Expo series. He appears regularly at conferences and trade shows, speaking to technology audiences across six continents. You can follow him on twitter: @jg21.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
The recent trends like cloud computing, social, mobile and Internet of Things are forcing enterprises to modernize in order to compete in the competitive globalized markets. However, enterprises are approaching newer technologies with a more silo-ed way, gaining only sub optimal benefits. The Modern Enterprise model is presented as a newer way to think of enterprise IT, which takes a more holistic approach to embracing modern technologies.
The true value of the Internet of Things (IoT) lies not just in the data, but through the services that protect the data, perform the analysis and present findings in a usable way. With many IoT elements rooted in traditional IT components, Big Data and IoT isn’t just a play for enterprise. In fact, the IoT presents SMBs with the prospect of launching entirely new activities and exploring innovative areas. CompTIA research identifies several areas where IoT is expected to have the greatest impact.
Every day we read jaw-dropping stats on the explosion of data. We allocate significant resources to harness and better understand it. We build businesses around it. But we’ve only just begun. For big payoffs in Big Data, CIOs are turning to cognitive computing. Cognitive computing’s ability to securely extract insights, understand natural language, and get smarter each time it’s used is the next, logical step for Big Data.
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
There's no doubt that the Internet of Things is driving the next wave of innovation. Google has spent billions over the past few months vacuuming up companies that specialize in smart appliances and machine learning. Already, Philips light bulbs, Audi automobiles, and Samsung washers and dryers can communicate with and be controlled from mobile devices. To take advantage of the opportunities the Internet of Things brings to your business, you'll want to start preparing now.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo in Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal an...
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at @ThingsExpo, Robin Raymond, Chief Architect at Hookflash, will walk through the shifting landscape of traditional telephone and voice services ...
The 17th International Cloud Expo has announced that its Call for Papers is open. 17th International Cloud Expo, to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, APM, APIs, Microservices, Security, Big Data, Internet of Things, DevOps and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Architect for the Internet of Things and Intelligent Systems at Red Hat, described how to revolutioniz...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be
The security devil is always in the details of the attack: the ones you've endured, the ones you prepare yourself to fend off, and the ones that, you fear, will catch you completely unaware and defenseless. The Internet of Things (IoT) is nothing if not an endless proliferation of details. It's the vision of a world in which continuous Internet connectivity and addressability is embedded into a growing range of human artifacts, into the natural world, and even into our smartphones, appliances, and physical persons. In the IoT vision, every new "thing" - sensor, actuator, data source, data con...
Container frameworks, such as Docker, provide a variety of benefits, including density of deployment across infrastructure, convenience for application developers to push updates with low operational hand-holding, and a fairly well-defined deployment workflow that can be orchestrated. Container frameworks also enable a DevOps approach to application development by cleanly separating concerns between operations and development teams. But running multi-container, multi-server apps with containers is very hard. You have to learn five new and different technologies and best practices (libswarm, sy...
SYS-CON Events announced today that DragonGlass, an enterprise search platform, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. After eleven years of designing and building custom applications, OpenCrowd has launched DragonGlass, a cloud-based platform that enables the development of search-based applications. These are a new breed of applications that utilize a search index as their backbone for data retrieval. They can easily adapt to new data sets and provide access to both structured and unstruc...
There's Big Data, then there's really Big Data from the Internet of Things. IoT is evolving to include many data possibilities like new types of event, log and network data. The volumes are enormous, generating tens of billions of logs per day, which raise data challenges. Early IoT deployments are relying heavily on both the cloud and managed service providers to navigate these challenges. In her session at Big Data Expo®, Hannah Smalltree, Director at Treasure Data, discussed how IoT, Big Data and deployments are processing massive data volumes from wearables, utilities and other machines...
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you'll have no problem fil...
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
The worldwide cellular network will be the backbone of the future IoT, and the telecom industry is clamoring to get on board as more than just a data pipe. In his session at @ThingsExpo, Evan McGee, CTO of Ring Plus, Inc., discussed what service operators can offer that would benefit IoT entrepreneurs, inventors, and consumers. Evan McGee is the CTO of RingPlus, a leading innovative U.S. MVNO and wireless enabler. His focus is on combining web technologies with traditional telecom to create a new breed of unified communication that is easily accessible to the general consumer. With over a de...
Disruptive macro trends in technology are impacting and dramatically changing the "art of the possible" relative to supply chain management practices through the innovative use of IoT, cloud, machine learning and Big Data to enable connected ecosystems of engagement. Enterprise informatics can now move beyond point solutions that merely monitor the past and implement integrated enterprise fabrics that enable end-to-end supply chain visibility to improve customer service delivery and optimize supplier management. Learn about enterprise architecture strategies for designing connected systems tha...
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
SYS-CON Media announced today that 9 out of 10 " most read" DevOps articles are published by @DevOpsSummit Blog. Launched in October 2014, @DevOpsSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce softw...