http://www.databasejournal.com/features/db2/analytics-yes-big-data-no.html
Analytics Yes; Big Data No!
Some companies have been slow to acquire big data applications. They discovered that modern hardware platforms and database management systems were more than adequate for most of their business analytics needs. Such needs share several common attributes, including analytics run against operational systems, where the analytics logic and engine were close to the object data. This meant that companies could avoid complex and high-volume data movement and extract-transform-load (ETL) strategies while executing queries against already existing, well-tuned databases.
In this article we introduce the concepts of strategic and tactical analytics, and how best to support these methods with your current IT infrastructure.
Basic data processing gave way to an explosion in information technology. Statisticians developed new mathematical methods and functions, and hardware and software tools grew in speed and power. On-line analytical processing (OLAP) came of age as analysts used these sophisticated tools to process large volumes of data in short periods of time. Eventually many of the most useful methods and practices were lumped together into a single phrase: business intelligence analytics.
As analytics advanced the most common usage was customer base segmentation. Which types of customers were buying what, in what areas, and at what times? By analyzing historical sales data and aggregating by customer type and geography, companies could now predict customer buying habits. This allowed companies to set prices and provide the best combination of products and services to maximize profitability.
In larger companies, IT organizations needed to process a growing volume of data and do a lot of machine-intensive analysis. However, accessing operational data in this way tended to either lock information or use CPU and other resources to the detriment of operational systems. To prevent analytics from negatively impacting these systems (especially customer-facing systems), data was typically copied to non-operational platforms such as data warehouses and data marts. Once copied, analytical software could then execute against these data copies, preventing performance issues on the original, critical systems.
This trend resulted in the spawning of a multitude of analytical silos within the company. Much effort was spent in copying data, sometimes several times, across the enterprise. Now, with a growing user base, a huge expansion in data volume, and a need for fast return of results, the analytics space has split into two camps: strategic analysis against large data stores, and tactical analysis in real-time.
Examples of strategic analysis include:
Implementation of tactical analysis requires speed and minimal use of resources. Tactical analytics software is usually executed as part of a transaction, against current operational data. The software uses a configuration file to determine which analytics processes to run at what times and against what categories of transactions. Management can then control which analytics are executed on a real-time basis, perhaps reacting to news of possible fraud, or even reducing the occurrence of analytics due to system overloads or resource outages.
Copying data to a separate hardware platform and running the analytics there requires too much time and processing power to meet a real-time transaction service level. The requirement to reduce data movement means hosting the analytics solutions as close to the data as is feasible, possibly even integrating it with on-line transaction processing (OLTP) within operational systems.
Initial analytics starts with the fact and dimension tables in the data warehouse. However, this data is usually not real-time due to the large amount of data transformation and cleansing required when extracting from operational systems. A nightly batch load process is typical; consequently, analytics in the warehouse is not done on real-time data.
It may be possible to do tactical analytics in the staging area, since this data is closest to the origin of the data in operational systems. This presupposes that the warehouse data staging process executes frequently and gathers internally consistent data. For example, customer purchase transactional data can be extracted hourly, as long as the transactions are complete and reference customers, products, and services that are already defined in the DW. However, it is rare that the warehouse can be custom-tuned to act on near real-time operational data.
The data warehouse of today is best used for strategic analytics, and the IT organization should prepare itself for the inevitable integration of the current enterprise data warehouse environment with one or more big data solutions. These solutions usually consist of a hybrid hardware and software solution, commonly called an appliance.
The appliance contains a large data store with high-speed access to the data using some combination of proprietary data storage, massively parallel data movement, and query parallelism. For example, customer purchase transactions may be stored on hundreds of disks within a disk array. The appliance accepts analytical queries, parses them, then splits each query into hundreds of subqueries, one for each disk. The queries then execute simultaneously, with the appliance merging the results of each query and returning the final result to the requestor.
Tactical analytics are usually implemented first. As the company gains insight into its data, it begins to store historical information in a data warehouse. Query results can now be used to both upsize the current platform or justify a separate analytical environment. As your business analytics space grows, off-load analytics that is not required in real-time to a separate platform. Plan for a separate platform for strategic analytics. As your strategic analytics needs grow, begin planning for a big data solution that will be integrated into your enterprise data warehouse environment.
See all articles by Lockwood Lyon
In this article we introduce the concepts of strategic and tactical analytics, and how best to support these methods with your current IT infrastructure.
Business Intelligence and Analytics
The phrase “business intelligence and analytics” has its roots in that most basic of analytical tools, the spreadsheet. Organizations have used these for decades to store various categories of data and execute calculations to predict trends and analyze changes in data. Popular terms such as roll-up, aggregation, and cubes found their beginnings in functions available in spreadsheet software.Basic data processing gave way to an explosion in information technology. Statisticians developed new mathematical methods and functions, and hardware and software tools grew in speed and power. On-line analytical processing (OLAP) came of age as analysts used these sophisticated tools to process large volumes of data in short periods of time. Eventually many of the most useful methods and practices were lumped together into a single phrase: business intelligence analytics.
As analytics advanced the most common usage was customer base segmentation. Which types of customers were buying what, in what areas, and at what times? By analyzing historical sales data and aggregating by customer type and geography, companies could now predict customer buying habits. This allowed companies to set prices and provide the best combination of products and services to maximize profitability.
In larger companies, IT organizations needed to process a growing volume of data and do a lot of machine-intensive analysis. However, accessing operational data in this way tended to either lock information or use CPU and other resources to the detriment of operational systems. To prevent analytics from negatively impacting these systems (especially customer-facing systems), data was typically copied to non-operational platforms such as data warehouses and data marts. Once copied, analytical software could then execute against these data copies, preventing performance issues on the original, critical systems.
This trend resulted in the spawning of a multitude of analytical silos within the company. Much effort was spent in copying data, sometimes several times, across the enterprise. Now, with a growing user base, a huge expansion in data volume, and a need for fast return of results, the analytics space has split into two camps: strategic analysis against large data stores, and tactical analysis in real-time.
Strategic Analytics
Strategic, or high-level analysis, has moved in the direction of big data. Large amounts of data arriving at high velocity necessitated implementation of large data stores (big data) and hybrid hardware and software tools implementing massive parallel data movement and processing. Analysts typically use these solutions to execute long-running, complex queries against a database partitioned by multiple dimensions, allowing for different “slicing” and aggregation to achieve the desired results. The data is typically historical, rather than real-time, and the results are examined and probed at leisure.Examples of strategic analysis include:
- Period-to-period sales analysis. Analysts compare year-over-year product sales data to determine purchasing patterns. This can be used to predict shortages of inventory, customer buying habits, and geographic areas where new warehouses or stores could be built.
- Data mining. Here, analysts display a summary of data by multiple dimensions, such as purchases of a set of similar products. Then they “drill down” into the data by requesting further aggregations such as purchases by store, purchases by date, or purchases by salesperson. The reverse operation is called “roll up”, where the analyst requests that the data be summarized at a higher level, such as by geographic region or by department.
Operational or Tactical Analytics
Contrast the above with tactical or operational analysis. Here, analysis is executed against a smaller, usually real-time data source, and the results used by the requesting application to complete a transaction. For example, consider analyzing a customer purchase in real-time. The request for the purchase triggers an analytical engine that compares that purchase against the customer’s historical purchase patterns to detect possible fraud. Such in-transaction analytics requires speed, since the transaction is real-time.Implementation of tactical analysis requires speed and minimal use of resources. Tactical analytics software is usually executed as part of a transaction, against current operational data. The software uses a configuration file to determine which analytics processes to run at what times and against what categories of transactions. Management can then control which analytics are executed on a real-time basis, perhaps reacting to news of possible fraud, or even reducing the occurrence of analytics due to system overloads or resource outages.
Copying data to a separate hardware platform and running the analytics there requires too much time and processing power to meet a real-time transaction service level. The requirement to reduce data movement means hosting the analytics solutions as close to the data as is feasible, possibly even integrating it with on-line transaction processing (OLTP) within operational systems.
Enter the Data Warehouse
Some data warehouses (DWs) are purpose-built as analytics databases. The DW most likely already contains the most useful dimension tables for warehouse processing, and usually contains data from important operational systems. The warehouse infrastructure is also home to many data movement processes such as extract, transform, load (ETL), data staging areas, operational data stores, data marts, and so forth.Initial analytics starts with the fact and dimension tables in the data warehouse. However, this data is usually not real-time due to the large amount of data transformation and cleansing required when extracting from operational systems. A nightly batch load process is typical; consequently, analytics in the warehouse is not done on real-time data.
It may be possible to do tactical analytics in the staging area, since this data is closest to the origin of the data in operational systems. This presupposes that the warehouse data staging process executes frequently and gathers internally consistent data. For example, customer purchase transactional data can be extracted hourly, as long as the transactions are complete and reference customers, products, and services that are already defined in the DW. However, it is rare that the warehouse can be custom-tuned to act on near real-time operational data.
But What about Big Data?
Regrettably, growth of the DW in scope and volume may preclude its use in tactical analytics. Data is rarely real-time as time spent in data movement, transformation and storage creates unacceptable delays.The data warehouse of today is best used for strategic analytics, and the IT organization should prepare itself for the inevitable integration of the current enterprise data warehouse environment with one or more big data solutions. These solutions usually consist of a hybrid hardware and software solution, commonly called an appliance.
The appliance contains a large data store with high-speed access to the data using some combination of proprietary data storage, massively parallel data movement, and query parallelism. For example, customer purchase transactions may be stored on hundreds of disks within a disk array. The appliance accepts analytical queries, parses them, then splits each query into hundreds of subqueries, one for each disk. The queries then execute simultaneously, with the appliance merging the results of each query and returning the final result to the requestor.
Summary
Business intelligence analytics today is seen to be a combination of tactical and strategic analytics: tactical analytics executed against real-time data for in-transaction use; strategic analysis run against large data stores and executing complex queries across multiple dimensions.Tactical analytics are usually implemented first. As the company gains insight into its data, it begins to store historical information in a data warehouse. Query results can now be used to both upsize the current platform or justify a separate analytical environment. As your business analytics space grows, off-load analytics that is not required in real-time to a separate platform. Plan for a separate platform for strategic analytics. As your strategic analytics needs grow, begin planning for a big data solution that will be integrated into your enterprise data warehouse environment.
See all articles by Lockwood Lyon
No comments:
Post a Comment