Click here to close now.

Welcome!

Search Authors: Carmen Gonzalez, Elizabeth White, Jackie Kahle, Lori MacVittie, Blue Box Blog

Related Topics: Cloud Expo, Web 2.0

Cloud Expo: Article

100 Minutes Without Gmail: Google Explains Outage

"We've turned our full attention to helping ensure this kind of event doesn't happen again"

"We've turned our full attention to helping ensure this kind of event doesn't happen again," wrote Ben Treynor, self-described 'VP Engineering and Site Reliability Czar' at Google, in an official blogged explanation last night of the 100-minute Gmail outage yesterday, which Treynor conceded "was a Big Deal, and we're treating it as such."

He then described in detail the events that conspired to bring about the outage:

"Here's what happened: This morning (Pacific Time) we took a small fraction of Gmail's servers offline to perform routine upgrades. This isn't in itself a problem — we do this all the time, and Gmail's web interface runs in many locations and just sends traffic to other locations when one is offline.

However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. As a result, people couldn't access Gmail via the web interface because their requests couldn't be routed to a Gmail server. IMAP/POP access and mail processing continued to work normally because these requests don't use the same routers.

The Gmail engineering team was alerted to the failures within seconds (we take monitoring very seriously). After establishing that the core problem was insufficient available capacity, the team brought a LOT of additional request routers online (flexible capacity is one of the advantages of Google's architecture), distributed the traffic across the request routers, and the Gmail web interface came back online."

Treynor's post ended with a detailed explanation of Google's plans to prevent a repeat of the same problem:

"What's next ... Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom. Some of the actions are more subtle — for example, we have concluded that request routers don't have sufficient failure isolation (i.e. if there's a problem in one datacenter, it shouldn't affect servers in another datacenter) and do not degrade gracefully (e.g. if many request routers are overloaded simultaneously, they all should just get slower instead of refusing to accept traffic and shifting their load). We'll be hard at work over the next few weeks implementing these and other Gmail reliability improvements."

He ends, "Gmail remains more than 99.9% available to all users, and we're committed to keeping events like today's notable for their rarity."

More Stories By Jeremy Geelan

Jeremy Geelan is Chairman & CEO of the 21st Century Internet Group, Inc. and an Executive Academy Member of the International Academy of Digital Arts & Sciences. Formerly he was President & COO at Cloud Expo, Inc. and Conference Chair of the worldwide Cloud Expo series. He appears regularly at conferences and trade shows, speaking to technology audiences across six continents. You can follow him on twitter: @jg21.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
SYS-CON Events announced today that CodeFutures, a leading supplier of database performance tools, has been named a “Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. CodeFutures is an independent software vendor focused on providing tools that deliver database performance tools that increase productivity during database development and increase database performance and scalability during production.
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Raspberry Pi, BeagleBone, Spark and Intel Edison. You will also get an overview of cloud technologies s...
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic strategies that utility/cloud computing provides. Whether public, private, or in a hybrid form, clo...
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, will discuss how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at the same time reduce Time to Market (TTM) by using plug and play capabilities offered by a robust I...
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what the future may hold. Mike Kavis is Vice President & Principal Cloud Architect at Cloud Technology Pa...
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? Join this panel of experts as they peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you’ll have no problem filling in your buzzword bingo cards.
SYS-CON Events announced today that Creative Business Solutions will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Creative Business Solutions is the top stocking authorized HP Renew Distributor in the U.S. Based out of Long Island, NY, Creative Business Solutions offers a one-stop shop for a diverse range of products including Proliant, Blade and Industry Standard Servers, Networking, Server Options and Care Packs. As a trusted supplier, CBS guarantees quality controlled stock levels thanks to an Auto...
SOA Software has changed its name to Akana. With roots in Web Services and SOA Governance, Akana has established itself as a leader in API Management and is expanding into cloud integration as an alternative to the traditional heavyweight enterprise service bus (ESB). The company recently announced that it achieved more than 90% year-over-year growth. As Akana, the company now addresses the evolution and diversification of SOA, unifying security, management, and DevOps across SOA, APIs, microservices, and more.
There are lots of challenges in IoT around secure, scalable and business friendly infrastructure for enterprises. For large corporations, IoT implementations are one of the top priorities of the decade. All industries are seeing a competitive need to sustain by investing in IoT initiatives. The value addition comes from improved customer service, innovative product and additional revenue streams. The data from these IP-connected devices can be leveraged for a variety of business applications as well as responsive action controls. The various architectural building blocks of an IoT ...
GENBAND introduced its Real Time Communications (RTC) Client for Lync* to seamlessly combine real-time communications with Lync Instant Messaging (IM) and Presence. “We’re shaking up the economics of delivering Unified Communications (UC) and offering a compelling way to integrate previously bespoke communications technologies,” said Carl Baptiste, GENBAND’s Senior Vice President, Enterprise Solutions. “We’re offering enterprises the best of both worlds by combining our own high availability voice, video and collaboration with Lync’s IM and Presence; creating a single, web centric, client. O...
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
After making a doctor’s appointment via your mobile device, you receive a calendar invite. The day of your appointment, you get a reminder with the doctor’s location and contact information. As you enter the doctor’s exam room, the medical team is equipped with the latest tablet containing your medical history – he or she makes real time updates to your medical file. At the end of your visit, you receive an electronic prescription to your preferred pharmacy and can schedule your next appointment.
Can call centers hang up the phones for good? Intuitive Solutions did. WebRTC enabled this contact center provider to eliminate antiquated telephony and desktop phone infrastructure with a pure web-based solution, allowing them to expand beyond brick-and-mortar confines to a home-based agent model. It also ensured scalability and better service for customers, including MUY! Companies, one of the country's largest franchise restaurant companies with 232 Pizza Hut locations. This is one example of WebRTC adoption today, but the potential is limitless when powered by IoT.
SYS-CON Events announced today that Optimal Design, an Internet of Things solution provider, will exhibit at SYS-CON's Internet of @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Optimal Design is an award winning product development firm offering industrial design and engineering services to the consumer, medical, and defense markets.
Chuck Piluso will present a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. Speaker Bio: Prior to Data Storage Corporation (DSC), Mr. Piluso founded North American Telecommunication Corporation, a facilities-based Competitive Local Exchange Carrier licensed by the Public Service Commission in 10 states, serving as the company's chairman and president from 1997 to 2000. Between 1990 and 1997, Mr. Piluso served as chairman & founder of International Telecommunications Corporation, a facilities-based international carrier licensed by t...
The WebRTC Summit 2015 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the M2M space. This really allows some room for influential individuals to create more high value inter...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Participants will reach the final if their IoT solution is liked. A community vote will determine the best solutions submitted in each country, after which an expert jury will select the national winners and the best international IoT solution. Each country's best solution can win a national marketing campaign worth up to €30,000 and become a partner in Deutsche Telekom's participating markets. The winning international solution can become partner of Deutsche Telekom Group across all eight countries and reach out to a potential of 10,8 million business customers. Deutsche Telekom Group has a...
Recent technology advances in miniaturization has positioned the wearables as the pinnacle of technology convergence with the human body. We inquire if wearables are mere standard miniaturized devices extended with the connectivity and present our views on considerations like design, applications, performance, efficiency, interoperability, usage scenarios, human device interaction and consequent trade-offs enabling wearables to impart optimal value.