Click here to close now.



Welcome!

API Journal Authors: Liz McMillan, Pat Romanski, Elizabeth White, Kevin Benedict, Anders Wallgren

Blog Feed Post

Incidental R

by Joseph Rickert Last week, I posted a list of sessions at the Joint Statistical Meetings related to R. As it turned out, that list was only the tip of the iceberg. In some areas of statistics, such as graphics, simulation and computational statistics the use of R is so prevalent that people working in the field often don't think to mention it. For example, in the session New Approaches to Data Exploration and Discovery which included the presentation on the Glassbox package that figured in my original list, R was important to the analyses underlying nearly all of the talks in one way or another. The following are synopses of the talks in that session along with some pointers to relevant R resources.  Exploring Huge Collections of Scatterplots Statistics and visualization legend Leland Wilkinson of Skytree showed off ScagExployer, a tool he built with Tuan Dang of the University of Illinois at Chicago to explore scagnostics (a contraction for “Scatter Plot Diagnostics” made up by John Hartigan and Paul Tukey in the 1980’s). ScagExployer makes it possible to look for anomalies and search for similar distributions in a huge collections of scatter plots. (The example Leland showed contained 124K plots).The ideas and many of the visuals for the talk can be found in the paper ScagExplorer: Exploring Scatterplots by Their Scagnostics. ScagExployer is Java based tool, but R users can work with the scagnostics package written by Lee Wilkinson and Anushka Anand in 2007. Glassbox: An R Package for Visualizing Algorithmic Models: Google’s Max Ghenis presented work he did with fellow Googlers Ben Ogorek; and Estevan Flores. Glassbox is an R application that attempts to provide transparency to “blackbox” algorithmic models such as Random Forests. Among other things, it calculates and plots the collective importance of groups of variables in such a model. The slides for the presentation are available, as is the package itself. Google is using predictive modeling and tools such as glassbox to better understand the characteristics of its workforce and to ask important, reflective questions such a “How can we better understand diversity?” The company also does HR modeling to see if what they know about people can give them a competitive edge in hiring. For example, Google uses data collected from people who have interviewed at the company in the past, but who have not received offers from Google, to try and understand Google's future hiring needs. The coolest thing about this presentation was that these guys work for the Human Resources Department! If you think that you work for a tech company go down to HR and see if you can get some help with Random Forests. A Web Application for Efficient Analysis of Peptide Libraries Eric Hare of Iowa State University introduced PeLica, work he did with colleagues Timo Sieber of University Medical Center Hamburg-Eppendorf and Heike Hofmann of Iowa State University. PeLica is an interactive, Shiny application to help assess the statistical properties of peptide libraries. PeLica’s creators refer to it as a Peptide Library Calculator that acts as a front end to the R package peptider which contains functions for evaluating the diversity of peptide libraries. The authors have done an exceptional job of using the documentation features available in Shiny to make their app a teaching tool. To Merge or Not to Merge: An Interactive Visualization Tool for Local Merges of Mixture Model Components Elizabeth Lorenzi of Carnegie Mellon showed the prototype for an interactive visualization tool that she is working on with Rebecca Nugent of Carnegie Mellon and Nema Dean of the University of Glasgow. The software calculates inter-component similarities of mixture model component trees and displays them as hierarchical dendrograms. Elizabeth and her colleagues are implementing this tool as an R package. An Interactive Visualization Platform for Interpreting Topic Models Carson Sievert of Iowa State University presented LDAvis, a general framework for visualizing topic models that he is building with Kenny Shirley of AT&T Labs. LDAvis is interactive R software that enables users to interpret and compare topics by highlighting keywords. The theory is nicely described in a recent paper, and the examples on Carson’s Github page are instructive and fun to play with. In this plot below, circle 26 representing a  topic has been selected. The bar chart on the right displays the 30 most relevant terms for this topic. The red bars represent the frequency of a term in a given topic, (proportional to p(term | topic)), and the gray bars represent a term's frequency across the entire corpus, (proportional to p(term)). Gravicom: A Web-Based Tool for Community Detection in Networks Andrea Kaplan showed off an interactive application that she and her Iowa State University team members, Heike Hofmann and  Daniel Nordman are building. GRavicom is an interactive web application based on Shiny and the D3 JavaScript library that lets a user manually collect nodes into clusters in a social network graph and then save this grouping information for subsequent processing. The idea is that eyeballing a large social network and selecting “obvious” groups may be an efficient way to initialize a machine learning algorithm. Have a look at a  Live demo. Human Factors Influencing Visual Statistical Inference Mahbubul Majumder of the University of Nebraska presented joint work done with Heike Hofmann and Dianne Cook, both of Iowa State University, on identifying key factors such as demographics, experience, training, of even the placement of figures in an array of plots, that may be important for the human analysis of visual data.     

Read the original blog entry...

More Stories By David Smith

David Smith is Vice President of Marketing and Community at Revolution Analytics. He has a long history with the R and statistics communities. After graduating with a degree in Statistics from the University of Adelaide, South Australia, he spent four years researching statistical methodology at Lancaster University in the United Kingdom, where he also developed a number of packages for the S-PLUS statistical modeling environment. He continued his association with S-PLUS at Insightful (now TIBCO Spotfire) overseeing the product management of S-PLUS and other statistical and data mining products.<

David smith is the co-author (with Bill Venables) of the popular tutorial manual, An Introduction to R, and one of the originating developers of the ESS: Emacs Speaks Statistics project. Today, he leads marketing for REvolution R, supports R communities worldwide, and is responsible for the Revolutions blog. Prior to joining Revolution Analytics, he served as vice president of product management at Zynchros, Inc. Follow him on twitter at @RevoDavid

@ThingsExpo Stories
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
When people aren’t talking about VMs and containers, they’re talking about serverless architecture. Serverless is about no maintenance. It means you are not worried about low-level infrastructural and operational details. An event-driven serverless platform is a great use case for IoT. In his session at @ThingsExpo, Animesh Singh, an STSM and Lead for IBM Cloud Platform and Infrastructure, will detail how to build a distributed serverless, polyglot, microservices framework using open source tec...
Connected devices and the industrial internet are growing exponentially every year with Cisco expecting 50 billion devices to be in operation by 2020. In this period of growth, location-based insights are becoming invaluable to many businesses as they adopt new connected technologies. Knowing when and where these devices connect from is critical for a number of scenarios in supply chain management, disaster management, emergency response, M2M, location marketing and more. In his session at @Th...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
Cloud Expo, Inc. has announced today that Andi Mann returns to 'DevOps at Cloud Expo 2016' as Conference Chair The @DevOpsSummit at Cloud Expo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited t...
"We work in the area of Big Data analytics and Big Data analytics is a very crowded space - you have Hadoop, ETL, warehousing, visualization and there's a lot of effort trying to get these tools to talk to each other," explained Mukund Deshpande, head of the Analytics practice at Accelerite, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
"delaPlex is a software development company. We do team-based outsourcing development," explained Mark Rivers, COO and Co-founder of delaPlex Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
IoT is rapidly changing the way enterprises are using data to improve business decision-making. In order to derive business value, organizations must unlock insights from the data gathered and then act on these. In their session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, and Peter Shashkin, Head of Development Department at EastBanc Technologies, discussed how one organization leveraged IoT, cloud technology and data analysis to improve customer experiences and effi...
Basho Technologies has announced the latest release of Basho Riak TS, version 1.3. Riak TS is an enterprise-grade NoSQL database optimized for Internet of Things (IoT). The open source version enables developers to download the software for free and use it in production as well as make contributions to the code and develop applications around Riak TS. Enhancements to Riak TS make it quick, easy and cost-effective to spin up an instance to test new ideas and build IoT applications. In addition to...
The idea of comparing data in motion (at the sensor level) to data at rest (in a Big Data server warehouse) with predictive analytics in the cloud is very appealing to the industrial IoT sector. The problem Big Data vendors have, however, is access to that data in motion at the sensor location. In his session at @ThingsExpo, Scott Allen, CMO of FreeWave, discussed how as IoT is increasingly adopted by industrial markets, there is going to be an increased demand for sensor data from the outermos...
CenturyLink has announced that application server solutions from GENBAND are now available as part of CenturyLink’s Networx contracts. The General Services Administration (GSA)’s Networx program includes the largest telecommunications contract vehicles ever awarded by the federal government. CenturyLink recently secured an extension through spring 2020 of its offerings available to federal government agencies via GSA’s Networx Universal and Enterprise contracts. GENBAND’s EXPERiUS™ Application...
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Presidio has received the 2015 EMC Partner Services Quality Award from EMC Corporation for achieving outstanding service excellence and customer satisfaction as measured by the EMC Partner Services Quality (PSQ) program. Presidio was also honored as the 2015 EMC Americas Marketing Excellence Partner of the Year and 2015 Mid-Market East Partner of the Year. The EMC PSQ program is a project-specific survey program designed for partners with Service Partner designations to solicit customer feedbac...
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discussed how businesses can gain an edge over competitors by empowering consumers to take control through IoT. He cited examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He also highlighted how IoT can revitalize and restore outdated business models, making them profitable ...
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, provided an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data profession...
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
Apixio Inc. has raised $19.3 million in Series D venture capital funding led by SSM Partners with participation from First Analysis, Bain Capital Ventures and Apixio’s largest angel investor. Apixio will dedicate the proceeds toward advancing and scaling products powered by its cognitive computing platform, further enabling insights for optimal patient care. The Series D funding comes as Apixio experiences strong momentum and increasing demand for its HCC Profiler solution, which mines unstruc...