Read Oracle Essentials Oracle Database 11g Online
Authors: Rick Greenwald
In the 1980s, many organizations began using dedicated servers for these applications, collectively known then as
decision support systems
(DSS), supplementing their management information systems. Decision-support queries tended to be particularly CPU and memory intensive using read-only data, whereas traditional OLTP was typically I/O intensive with a large number of updates to data. The characteristics of queries were much less predictable (e.g., more “ad hoc”) than what had been experienced in OLTP systems. This led to the development of data stores for decision support apart from those for OLTP.
When Bill Inmon (whose books are noted in
Appendix B)
and others popularized the term “data warehouse” in the early 1990s, a formalized common infrastructure for building a solution came into being. The topology of business intelligence solutions continued to evolve, as the next section illustrates. Today’s business intelligence solutions often include infrastructure that exposes data from data warehouses and also OLTP systems in reports. Underlying hardware has evolved such that I/O is now a more important design consideration for data warehousing hardware platforms
A Topology for Business Intelligence
The classic data warehouse topology, serving as an enterprise-wide source of infor-
mation, is represented by the multitier topology shown in Figure 10-1.
This topology developed over many years for a variety of reasons. Initial efforts at creating a single enterprise warehouse often resulted in “analysis paralysis.” Just as efforts to define an enterprise-wide OLTP model can take years (due to cross-departmental politics and the scope of the effort), similar attempts in data warehousing also took much longer than business sponsors were willing to accept.
These efforts were further hampered by the continually changing analysis requirements necessitated by a changing marketplace. While the data elements and
224
|
Chapter 10: Oracle Data Warehousing and Business Intelligence
OLTP Servers
Operational Data Store
Warehouse Server
Data Marts
Clients
Figure 10-1. Typical initial business intelligence topology
requirements for operational systems can remain relatively stagnant over time, understanding business trends can be like trying to catch lightning in a bottle.
Consequently, attempts at building such enterprise-wide models that would satisfy everyone often satisfied no one.
Data Marts
When some large-scale, enterprise-only data warehouse efforts ended in dismal failure, frustration and impatience followed. Some reacted and built department-focused independent
data marts
by extracting data from the appropriate operational
Business Intelligence Basics
|
225
source systems. Many data marts were initially quite successful because they fulfilled a specific business need relatively quickly.
However, problems began to surface. There was often no coordination between departments regarding basic definitions, such as “customer.” If a senior manager asked the same question of multiple departments, the answers provided by these independent data marts were often different, thus calling into question the validity of all of the marts. Many departments also encountered ongoing difficulty in managing these multiple data marts and in maintaining extractions from operational sources (which were often duplicated across multiple departments).
As architects took another look at their solutions, they began to realize that it was very important to have a consistent view of the detailed data at an enterprise data warehouse level. They also saw that data marts could solve business problems and provide return on investment in an incremental fashion. Today, most successful implementers simultaneously grow dependent data marts one business solution at a time while growing the enterprise warehouse server in an incremental fashion.
The currently accepted definition of a data mart is simply a subject- or application-specific data warehouse, usually implemented within a department. Typically, these data marts are built for performance and may include a large number of summary tables. Data marts were initially thought of as being small, since not all the detail data for a department or data from other departments need be loaded in the mart.
However, some marts get quite large as they incorporate data from outside sources (sometimes purchased) that isn’t relevant in other parts of the business.
In some organizations, data marts are deployed to meet specific project goals with models optimized for performance for that particular project. Such data marts are retired when the project is completed and the hardware is reused for other projects.
As the analysis requirements for a business change, the topology of any particular data warehouse is subject to evolution over time, so developers must be aware of this possibility.
Increasing focus on cost savings, manageability, and compliance are leading many to reexamine the wisdom of having a large number of physically separate data marts. As a result, consolidation of marts into the enterprise warehouse is a common trend.
More recent versions of Oracle enable effective management of different user communities, helping to make such consolidation possible.
Operational Data Store and Enterprise Warehouse
The
operational data store
(ODS) concept also grew in popularity in the 1990s. The ODS may best be described as a distribution center for current data. Like the OLTP
servers, the schema is highly normalized and the data is recent. The ODS serves as a consolidation point for reporting and can give the business one location for viewing current data that crosses divisions or departments. The popularity of the ODS grew
226
|
Chapter 10: Oracle Data Warehousing and Business Intelligence
in part as a result of companies in the midst of acquisitions and mergers. These organizations often face mixed-application environments. The ODS can act as a staging location that can be used as the source for further transformations into a data warehouse or into data marts.
The warehouse server, or
enterprise data warehouse
, is a multisubject historical information store usually supporting multiple departments and often serving as the corporate database of record. When an ODS is established, the warehouse server often extracts data from the ODS. When an ODS isn’t present, data for the warehouse is directly extracted and transformed from operational sources. External data may also feed the warehouse server.
As noted previously, platform consolidation is popular within these tiers today. The enterprise data warehouse can be the point of consolidation for the ODS and multiple data marts. Although different logical models remain, they are consolidated to a single platform and database.
OLTP Systems and Business Intelligence
True real-time data resides in the OLTP systems. Organizations can provide reporting out of such transaction processing systems side-by-side in portals or dashboards with information from data warehouse systems. A key to providing meaningful dashboards is to provide high-quality data with consistent meaning. The quality of data in OLTP systems is directly related to controlling data input to eliminate duplicate or error-prone entries.
Consistent meaning can be resolved using master data management (MDM) solutions. MDM solutions consist of data hubs that serve as a common reference point for data supporting key business measurements such as customers, products, or finance. Oracle offers a number of data hubs for these and other business areas to enable building out of such an infrastructure.
Projects that leverage data from data warehouses, OLTP systems, and MDM solutions are called data integration projects. Most business intelligence deployments, at the time of publication of this edition, use just the data warehouse infrastructure as the primary source of historic data for business intelligence. The extraction, transformation, and loading (ETL) techniques applied to the data warehouse are designed to resolve differences in common data elements, to cleanse the data, and to provide a historical database of record.
Data Warehouse Design
The database serves as the foundation of the business intelligence infrastructure: it is the place where the data is stored. But there is more to business intelligence than data—the infrastructure becomes useful only when business users use the data to
Data Warehouse Design
|
227
gain insight. This may seem like a trivial point, but we’ve seen numerous companies build elegant infrastructure without consulting business users to determine business needs or key performance indicators (KPIs) to be measured. Often, such deployed projects end up supporting very few users, generate little activity, and little business intelligence is gained.
Assuming that your infrastructure is well planned and there is a demand for the data, your next challenge will be to figure out how to handle the demand. You will be faced with the need to design your data warehouse and other infrastructure components to deliver appropriate performance to your users—performance that may initially seem far beyond your capabilities, since the information needed can involve comparisons of massive amounts of detailed data.
When you start your design, also remember that this infrastructure will never be considered finished. When the business needs change, so too must components in the infrastructure. Thus, the ability to track changes through metadata stored in a repository often becomes critical as part of the design work.
Various design tools can provide this capability. Oracle’s Warehouse Builder (OWB), included with Oracle Enterprise Edition, Standard Edition, and Standard Edition One databases (since 2006), provides a metadata repository and also the capability to import metadata from operational tables and then forward-engineer new schema and tables. A data warehouse designer creates columns for the new tables and builds constraints for the new schema. Maps are then created between source and target columns with appropriate transformations. DML scripts for creation of new tables, and PL/SQL or SQL*Loader scripts for ETL are automatically generated.
As noted previously, data warehouses historically have had a different set of usage characteristics from those of an OLTP database. One aspect that makes it easier to meet data warehousing performance requirements is the high percentage of read operations. Oracle’s locking model, described in detail in
Chapter 8,
is ideally suited for data warehouse operations. Oracle doesn’t place any locks onto data that’s being read, thus reducing contention and resource requirements for situations where there are a lot of database reads. Since locks don’t escalate, Oracle is also extremely appropriate for near real-time data feeds into the warehouse in a scenario not unlike OLTP
workloads.
Warehousing usage characteristics lead to deploying different types of schema. In OLTP databases, transaction data is usually stored in multiple tables and data items are stored only once. If a query requests data from more than one transaction table, the tables are joined together. Typically, the database query optimizer decides which table to use as the starting point for the join, based on the assumption that the data in the tables is essentially equally important.
228
|
Chapter 10: Oracle Data Warehousing and Business Intelligence
Although Oracle-based data warehouses are sometimes modeled as third normal form (3NF) (described in
Chapter 4),
when business users need an understandable schema to formulate their own ad hoc queries or analytical processing is required, key transaction data can be more appropriately stored in a central fact table, surrounded by dimension or lookup tables, as shown in
Figure 10-2.
The fact table can contain summarized data for data items duplicated elsewhere in the warehouse, and dimension tables can contain multiple hierarchies. As noted previously, when organizations consolidate their data marts into enterprise data warehouses, many now deploy a variation called a hybrid schema, a mixture of third normal form and star schema.
Ralph Kimball, author of the widely read book
The Data Warehouse Toolkit
(Wiley; see
Appendix B
for details), is largely credited with discovering that users of data warehouses typically pose their queries in such a manner that a star schema, illustrated in
Figure 10-2,
is an appropriate model to use. A typical query might be something such as the following:
Show me how many sales of computers (a product type) were sold by a store chain (a channel) in Wisconsin (a geography) over the past 6 months (a time).
The schema in
Figure 10-2
shows a relatively large sales transactions table (called a
fact table
) surrounded by smaller tables (called
dimensions
or
lookup tables
). The query just described is often called
multidimensional
, since several dimensions are included (and time is almost always one of them). Because these queries are typical in a data warehouse, the recognition of the star schema by Oracle’s cost-based optimizer can deliver enormous performance benefits.
Product
Geography
Category
Region
Type
District
Brand
State
Model
City
Fact Table
Sales
Transactions
Channel
Time
Year
Manufacturer
Quarter
Distributor
Month
Store Chain
Week
Store
Day
Figure 10-2. Typical star schema
Data Warehouse Design