Welcome!

Cognitive Computing Authors: Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan, Zakia Bouachraoui

Related Topics: @DXWorldExpo, Cognitive Computing , Agile Computing, @CloudExpo

@DXWorldExpo: Blog Feed Post

Data Landscape at Facebook By @JnanDash | @CloudExpo [#BigData]

What does the data landscape look like at Facebook with its 1.3 billion users across the globe?

Data Landscape at Facebook

What does the data landscape look like at Facebook with its 1.3 billion users across the globe? They classify small data referring to OLTP-like queries that process and retrieve a small amount of data, usually 1-1000 objects requested by their id. Indexes limit the amount of data accessed during a single query, regardless of the total volume. Big data refers to queries that process large amounts of data, usually for analysis: trouble-shooting, identifying trends, and making decisions. The total volume of data is massive for both small and big data, ranging from hundreds of terabytes to hundreds of petabytes on disk.

The heart of Facebook’s core data is TAO (The Association of Objects) – distributed data store for the social graph. The workload on this is extremely demanding. Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends — the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time. Thus, the data set must be retrieved and rendered on the fly in a few hundred milliseconds. TAO provides access to tens of petabytes of data, but answers most queries by checking a single page in a single machine. The challenge here is how to optimize for mobile devices which have intermittent connectivity and higher network latencies than most web clients.

Big data stores are the workhorses for data analysis and they grow by millions of events (inserts) per second and process tens of petabytes and hundreds of thousands of queries per day. The three data stores are: 1) ODS (Operational Data Store) has 2 billion time series of counters (used mostly in alerts and dashboards) and processes 40000 queries per second. 2) Scuba is the fast slice-and-dice data store with 100 terabytes in memory. It ingests millions of new rows per second and deletes just as many. Throughput peaks around 100 queries per second scanning 100 billion rows per second. 3) Hive is the data warehouse with 300 petabytes of data in 800,000 tables. Facebook generates 4 new petabytes of data and runs 600,000 queries and 1 million map-reduce jobs per day. Presto, HiveQL, Hadoop, and Giraph are the common query engines over Hive.

What is interesting to note here is that ODS, Scuba, and Hive are not traditional relational databases. They process data for analysis, not to serve users, so they do not need ACID guarantees for data storage and retrieval. Instead, challenge arises from high data insertion rates and massive data quantities.

Facebook represents the new generation Big Data user demanding innovative solutions not seen in the traditional database world. A big challenge is how to accommodate the seismic shift to mobile devices with their peculiar intermittent network connectivity.

No wonder, Facebook  hosted a data faculty summit last September with many known academic researchers from around the country to find answers to its future data challenges.

Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

IoT & Smart Cities Stories
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and G...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...