Welcome!

Search Authors: Mark O'Neill, Pat Romanski, Xenia von Wedel, Don Nelson, Corey Roth

Related Topics: PowerBuilder

PowerBuilder: Article

Introducing "Generation S" - for Search

Generations X and Y are being followed by one that will rely on search to accomplish almost any task

This indexing process flow involves numerous steps: capturing the new incoming customer communication, creating dynamic joins with other tables and applications, running a procedure to aggregate the related case records, structuring and transforming the message into an indexing format required by the search engine, and passing it to the search engine for re-indexing, and deleting the prior record.

Vendors have taken different approaches to transactional data indexing:

  • Crawling databases: Web search engines have adopted an approach to transactional indexing similar to document indexing - they crawl tables in databases using SQL select statements. Crawling is an acceptable choice for slowly changing tables, but not for large volumes of frequently changing data that needs to be available for search in near-real time. It is also not very effective for applications and highly normalized operational data stores.

  • Passing the search query to the application: This solution relies on some intelligence to determine how to match search terms with applications. It then relies on the application for data extraction and aggregation. This approach works well for simple queries, such as stock price information. Implementation becomes more daunting if users can run multiple queries against the same application. In those cases, a self-service application will likely offer more robust querying capabilities and be less confusing to the user.

  • Pushing application data to the index: Instead of letting the engine crawl the records, an application pushes data into the index using a search engine-provided indexing API. The application makes all connections into the underlying data store and has complete control over scheduling, interfacing protocols, and data structures. The scope of effort to configure and use this method depends on the extraction and transformation complexity and the available application tools for it.

  • Integrating data through SOA and process flows: These same APIs can let integration tools broaden the scope of the index. It requires integration capabilities, including transformation tools, process flow capabilities, and adapters, to define and execute the process that captures and enriches transaction data in real time.
The first three methods are application-specific and would work in projects with limited scope. The fourth method is generic and will address all present and emerging search integration needs, but very few traditional BI companies have the expertise in modern integration architecture to implement it.

User Interface Augmentation
With search technologies, we're used to thinking that less is more. When a BI search returns a large number of records, however, simple interfaces displaying search hits ordered by relevancy aren't enough. Consider a bell curve, for instance: even though the right-hand tail is small, it may represent a large number of records in absolute terms. No one has the time to page through hundreds of results, so BI search results must enable interactivity to supplement relevancy. This helps users avoid information overload and easily find the exact information they need.

Search Results Classification and Categorization
Two methods enhance the filtering of search results: classification and categorization of the hits. Both methods appear the same to end users. The underlying data is used to group the search results, and then present the groups in ordinary tree controls to let the user select parameters and narrow down the hits. This interaction is referred to as guided navigation (see Figure 2).

Although they appear the same to users, categorization and classification create groups in fundamentally different ways.

Search companies, with roots in unstructured data, typically extract categories from the unstructured text using statistical methods. This automates the grouping process, but it doesn't give information architects any control over how records are grouped.

BI companies, with roots in structured data, dynamically classify records instead. Information architects define metadata about the structures they want to index; this metadata can precisely control how records are grouped.

The two methods aren't mutually exclusive. Categorization offers definite advantages with parameterized searchable structured data as well as unstructured content that contains structured metatags (pre-categorized unstructured content). Given the trend of tagging every piece of structured or unstructured content, classification clustering appears to be more complementary to categorization. If the BI search solution provides both methods, the classification and categorization can be displayed simultaneously, providing the user with a robust overview of the data.

As search emerges as the primary information access point, robust metadata will become even more important as it is used to build custom, adaptable navigation interfaces to augment or replace many current application interfaces.

Search Results Analytics
Users need to do more with search results than filter them. Search returns a data set - potentially quite large - and users will benefit from the ability to manipulate it. Expect vendors to differentiate based on this emerging requirement.

The common capability to sort results by date or relevancy provides little value on large result sets, because the first result page only shows the top or bottom hits. Sorting on metadata categories, which are provided by some vendors, gives users more power to explore and organize large result sets (see Figure 3).

Some vendors have recently added the ability to convert the search results from the standard Google-like display with snippets to a tabular view (see Figure 4). This suits structured data but, as with all features, not all tabular views are equal: most tabular views provide static data and can only be sorted by date, relevancy, and other predefined categories. Also, server-based sorting operations regenerate the tabular view on each user interaction. In these cases, the user only benefits from a different display compared to the standard view.

Other vendors convert results into a dynamic tabular view that applies calculations, visualizations, charts, roll ups, and pivot tables locally in the browser. This opens a whole new perspective on search, making the result set much more useful and enabling users to do reporting and ad hoc analyses; for example, comparing data along two or more dimensions, as they're accustomed to doing with pivot tables in Excel. A user's search for an HDTV might return hundreds of results, which the user could use to compare prices by brand and monitor size (see Figure 5).

Since reporting and analysis of this type is often done using a data warehouse, it's not surprising that some vendors require the creation of an intelligent data warehouse at the time of indexing. However, some vendors provide the ability to manipulate the data directly in the browser without requiring any additional technology. Keeping the data and reports self-contained provides additional advantages, such as saving and sharing them via e-mail.

Ad hoc analytics on search results seems to be the most promising area for creating a true search-driven BI.

Search-Based Reporting
To provide BI search to the masses, you have to avoid re-creating all the complexities of traditional BI.

For example, if the chosen solution only indexes reports, how will you support a user whose needed information isn't in any indexed report? In this type of solution, the report usually acts as an entry point that takes the user to the BI world to refine her request. The user may find what she needs by drilling down from within the report; if not, however, she has to use the regular BI tools to modify the existing report or to create an ad hoc report. The user has dropped from a simple search paradigm into all the complexities of BI that search should eliminate.

A metadata-based approach provides a different user experience. The indexed records or transactions act as the entry points to BI, and dynamically constructed metadata-driven report links can take the user to any information resource. For example, a police record search application can provide, directly from each criminal offense record, links to the offense details, a summary report of all criminal records for the offender, another summary report on all criminal activities within date and geographic ranges, a crime analysis, and police activity structured ad hoc reports. Any metadata associated with the hit is passed to the report or to the structured ad hoc form. This BI search solution gives untrained users one-click access to all reporting capabilities without dropping them into any BI tool. Unless the reporting capabilities are as robust and simple as the search is, applications and tools will remain the preferred point of entry to BI.

Conclusion
Search and BI complement each other through more than just access to data, reports, and related documents. Together, they expose a rich set of information resources to ordinary users. It remains to be seen whether combined search and BI will go mainstream; however, there are many applications that could leverage their symbiotic relationship, and if the right indexing methodology and technologies are deployed search may help bring BI to the masses.

More Stories By Rado Kotorov

Dr. Rado Kotorov is a technical director of strategic product management at Information Builders Inc., responsible for emerging reporting, analytic and visualization technologies. Prior to joining Information Builders, he managed the implementation of BI solutions and decision-support systems, data warehouses, and custom applications. He has developed analytic models and applications for the pharmaceutical, retail, CPG, financial, and automotive industries. Rado Kotorov has a PhD in decision and game theory and economics from Bowling Green State University. He has publications on business processes, emerging technologies, CRM, KM, innovation, and entrepreneurship.

More Stories By Jake Freivald

Jake Freivald is the vice president of corporate marketing for Information Builders and iWay Software, an Information Builders company and leader in enterprise integration. In this position, he is responsible for developing and executing all of the solution marketing strategies. Jake joined Information Builders in 1999, prior to that he held several managerial positions with Andersen Consulting and Prudential Life Insurance Company of America.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.