Welcome!

Search Authors: Maureen O'Gara, Pat Romanski, Xenia von Wedel, Don Nelson, Corey Roth

Related Topics: Virtualization

Virtualization: Article

The Importance of Data Connectivity to Virtualization

Don't let inefficient data access undermine your virtualization goals

Bear in mind why you decided to virtualize your server environment in the first place. Virtualization came about in large part as a solution to address the fact, widely reported in industry media, that dedicated single-application servers were typically running at a very low rate of utilization – 10% to 15% of capacity, according to one IDC analyst. Data centers were paying for hardware-based resources of which they used only a fraction. Virtualization lets you make more efficient and cost-effective use of those resources. Overprovisioning on the hardware side to accommodate virtualization would be a self-defeating proposition. It follows that, for the benefit of maximized resource usage to be fully realized, the applications running on VMs must also maximize efficiency. Data connectivity components are no exception. If the database connectivity components that you are using are not efficient in their use of CPU, memory, storage, and network I/O, your virtualization efforts will fail.

Data Connectivity Is Not All the Same
The differences among data connectivity components such as ODBC and JDBC drivers and ADO.NET data providers is typically poorly understood even by many data specialists. A key factor contributing to this lack of awareness is that the most widely used commercial databases all include data connectivity components at no additional charge; these are quite often simply used by default in connecting a particular database to various applications. The open source community, too, offers data connectivity software. However, insisting on the use of such “free” – but often substandard – data connectivity components can actually cost organizations more than they anticipate in terms of inadequate performance.

Within the context of a virtualized environment, if the data connectivity middleware is not designed for maximum streamlined and efficient functionality – if it employs client libraries, disk caching, and/or verbose database communications, for example – the overall consumption of the hardware resources could be considerable.

Figures 1-3 illustrate the results of tests conducted comparing resource usage for a single server using the standard data connectivity component included with the relational database from a major vendor against a commercially available third-party data connectivity component specifically designed for high performance. Figure 1 simply compares the raw throughput performance in terms of database rows read by an accessing application (rows per second compares a third-party data connectivity component vs a database vendor-provided data connectivity component).

As expected, the graph shows that using the high-performance data connectivity component yields a 25–50% edge in throughput performance. A look at the difference in how efficiently the two components use hardware-based resources to deliver their respective performance, however, is more interesting to anyone considering a virtualized server environment. The graph as shown in Figure 2 measures the database rows read by the same accessing application per each second of CPU usage (it compares a third-party data connectivity component versus a database vendor-provided data connectivity component).

As reflected in this graph, use of the third-party high-performance data connectivity component yields about twice the efficiency in CPU usage over the one included with the major relational database. The next graph, measuring the differences in total memory usage, is even more notable.

Figure 3 shows the third-party component using as little as one-sixth of the memory consumed with the vendor-provided component to deliver the superior throughput shown in the first graph; it compares a third-party data connectivity component versus a database vendor-provided data connectivity component.

This degree of difference, as demonstrated on a traditional non-virtualized server, becomes highly significant in a virtualized scenario involving resource contention with additional VMs on a single physical machine consisting of highly utilized hardware. In that case, potential hardware contention issues could sharply curtail the number of VMs you can feasibly run on that machine. Since leveraging your hardware resources to the max is the goal, it pays to understand the differences between data connectivity components.

The Impact of Architecture
The first thing to be aware of is the general architecture of the data connectivity component. Many data connectivity components are complicated and slowed by the use of database vendor client libraries. These comprise additional software that must be installed on the same server as the application. Client libraries are very general-purpose; they are designed to cover the widest possible range of connectivity scenarios. They introduce additional steps into the connectivity process and thus reduce performance and scalability.

Wire protocol data connectivity components are different, as shown in the diagrams in Figure 4 that illustrate this point for ODBC drivers. These use the same (officially supported) protocols used by the native database clients and thus communicate directly with the database at the network level, requiring no use of client libraries.

This streamlined architecture enhances performance and scalability via reduced complexity. The use of client libraries flies in the face of a primary reason for doing virtualization in the first place: reducing staff involvement through streamlined administration. As mentioned earlier, the libraries must be deployed on each server. For a machine running four virtual machines, these libraries must be installed, deployed, and configured four separate times! Wire protocol components demand no deployment of client libraries and considerably reduce the configuration process due to the lack of additional required components. This streamlined administration achieves one of the primary goals of virtualization: to reduce the cost of administration.

More Stories By Mike Johnson

Mike Johnson is program manager for DataDirect Technologies' Connect for ODBC and Connect64 for SSIS product lines responsible for defining the future direction and functionality of DataDirect's pace setting ODBC and SSIS product development initiatives.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Virtualization news for the channel community and you ! 05/29/08 03:57:31 PM EDT

Trackback Added: The Importance of Data Connectivity to Virtualization; While we are posting, blogging, thinking, … about Virtualization, one might even forget the access infrastructure to the solution.Data Connectivity is clearly a must have and a ‘must be damn good’. Mike Johnson (sys-con.com) wrote a n...