• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/44

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

44 Cards in this Set

  • Front
  • Back

What is Enterprise Data Warehousing (EDW)?

Enterprise Data Warehousing (EDW) is a process for collecting, storing, and delivering decision support data for an entire enterprise or business unit. A data warehouse is one of the artifacts created by the data warehousing process.

EDW includes people, processes, and technologies to achieve the goal of providing decision support data that is:

• Consistent across the enterprise


• Integrated


• Standardized


• Easy to access from multiple viewpoints


• Easy to understand

Challenges with data warehousing

• High development cost


• Timeliness of data


• High volume data


• Specialized application


• Small and one time data

Data Mart

A data mart may be thought of as a smaller scale data warehouse. A data mart is a database that is part of a data warehouse system where data stored for presentation and user access.




In practice, the data mart is a database that is organized into facts and dimensions that can cross subjects.

Assessing Data Warehouse BI Maturity

Organization Background


History


Executive Support


Competencies


Usage


Data Sources


Data Integration Tools


Data Warehouse Platform


Data Warehouse Size


BI Tools


Results

Operational Data vs. Data Warehouse Data

• Operational data is data that many people are familiar with, such as the current checking account balances, customer service call statuses, and today's sales amounts. Operational data emphasizes current values. Operational data is:


- Optimized for Transaction Processing


- Frequently Updated


- Designed using Entity Relationship Modeling.


------


• Data warehouse data is focused on historic and trend value –what was the data in the past and how is it changing. Data warehouse / analytic data is:


- Optimized for Analytical Processing


- Not Updated (It is loaded instead)


- Designed using multiple patterns including Entity Relationship Modeling and Multidimensional Modeling.

High Quality Data

Some characteristics of high quality data are:


• Timeliness


• Consistency
• Comparability

Enterprise Architect (EA)

The Enterprise Architect (EA) builds and communicates blueprints that model the enterprise and IT solutions. These models identify the people, processes, and technologies that are needed to enable the enterprise to succeed in its mission. Architecture domains, illustrated in Figures on next slides, include:


- business architecture,


- information architecture,


- software architecture, and


- technology architecture.

Agile Data Warehousing and Business Intelligence


(Data Warehousing Trends and Hot Topics)

agile approaches reduce time to value by delivering results at regular intervals

Big Data


(Data Warehousing Trends and Hot Topics)

multi-terabytes of typically unstructured data is critical to modern analytics and contributes to the success of companies such Google, Facebook and Amazon

Data Warehousing in the Cloud


(Data Warehousing Trends and Hot Topics)

a method to rapidly ramp up and manage data warehousing capacity without investing in fixed infrastructure

Data Warehouse Appliances (Data Warehousing Trends and Hot Topics)

vendor provided solutions that include both hardware and software greatly increase performance through Massive Parallel Processing (MPP)

Columnar Databases(Data Warehousing Trends and Hot Topics)

new database software that stores and retrieves data by columns rather than by rows is dramatically improving data warehouse performance

In Memory Databases(Data Warehousing Trends and Hot Topics)

this software boosts performance by storing data in memory rather than on hard disk

Federated and Virtual Databases (Data Warehousing Trends and Hot Topics)

data distributed in multiple data stores can be made to look like a single data store, avoiding the cost and time of copying data

R (Data Warehousing Trends and Hot Topics)

this open source analytics language with supporting software is gaining wide acceptance in both the business and academic community

Predictive and Prescriptive Analytics(Data Warehousing Trends and Hot Topics)

predicting who will be a profitable customer and prescribing the best offer to make, provides competitive advantage

Data Mining (Data Warehousing Trends and Hot Topics)

analyzing data to find patterns and gain new knowledge using statistical methods is rapidly advancing

Data Visualization(Data Warehousing Trends and Hot Topics)

improved tools are enabling data to be better understood through graphical displays

Mobile Computing (Data Warehousing Trends and Hot Topics)

the growing use of smart phones and tablet computers is making business intelligence available to new audiences and requires new ways of thinking

Data Warehousing Project Management

• Defining Scope and Objectives


• Finding the Right Sponsor


• Producing the Project Roadmap and Plans


• Organizing the Team


• Executing the Plan


• Finishing the Project


• Avoiding Major Data Warehouse Mistakes

Data Warehousing Business Requirements

•Homework (Research of Documents)


•Enterprise Goals and Objectives


•Identify User Groups


•Requirement Interviews


•Requirement Workshops


•Sizing


•Data Exploration.

Data Sources

Data sources (Inputs) are the places where data is obtained. There are typically multiple places to store data. Data sources are the origination points of data. These founts of data include:


Transactional Systems, which contain business activity such as receipts and purchase orders


Process Oriented Systems, which contain workflow information


Specification Systems, which contain plans, settings, and rules


Syndicated Data, which is obtained outside the organization, and is used to enrich internal data, such as customer profiles


Big Data is typically unstructured and consists of text, images, video, and audio. The advent of social media data has increased the importance of Big Data.

Data Integration

Data is obtained from data sources through data integration approaches such as:


Extract, Transform and Load (ETL)–a process where data is obtained from a data source, modified, and placed in a data target through batch means. This is the primary method for obtaining data for most data warehouses. A related method is “Extract, Load and Transform (ELT)” where data is transformed after it is loaded.


Enterprise Application Integration (EAI) – a method of integration between software applications that utilizes application program interfaces (APIs). This approach provides immediate data and useful for near real-time data warehousing.


Enterprise Information Integration (EII) – a method of data integration that provides access to multiple data sources without the need to move the data to a common database such as a data warehouse. This method supports a federated schema, which makes multiple data sources act like a single data source. This method can save the cost of developing new physical data warehouses and copying data.

Data Storage

Multiple databasesor areas of databaseswill be required, with each database having a particular role in the data warehousing architecture.


• Data Warehouse


• Data Mart


• Staging Areas


• MDM Hub (Master Data Management)


• ODS (Operational Data Store)


• PDS (Process Data Store)


• Data Mining Database


• Big Data

Data Warehouse

This database is organized into subjects, such as customer, product, sales transaction and so forth. The EDW contains a full set of data with an enterprise scope. The Atomic Data Warehouse area contains normalized data, meaning one fact or piece of information in one place.

Data Mart

This database is organized by business process, rather than data subject, so there may be receipts, issues, insurance policy, transactions and so forth. It is a sub-set of data needed to analyze particular events or facts, so it is smaller in scope than the data warehouse.

Staging Areas

These are database where data is staged, integrated, and stored, prior to being made available through data warehouses and data marts.

MDM Hub (Master Data Management)

Master data includes the core business and reference entities, such as customers, products, and locations.

ODS (Operational Data Store)

ODS stores the data resulting from operating a business, such as orders, deliveries, receipts, issues, purchase orders, etc. The ODS can also act as a data integration hub and data warehouse.

PDS (Process Data Store)

Data for supporting business process activity to provide near real-time business intelligence. This data is associated with BAM (Business Activity Mornitor). The PDS is a relatively new kind of database that is gaining in importance.

Data Mining Database

This Database contains data that is optimized for use by data mining software. For data mining, it is best to store the data in flat structures, which is different from the data mart.

Big Data

Data that is so voluminous (over 2TB) that it cannot be managed using traditional databases such as relational databases. This data is typically unstructured, and consists of text, images, video, and audio. This data is spread across many servers.

Types of BI/Analytics tools

• Query Tool


• Reporting Tools


• Data Mining Tools


• Statistics Tools


• Data Profiling Tools

Query Tool

Query Tools are business intelligence software tools that select data through queries and present it as reports and/or graphical displays. The business or the analyst is able to explore the data and produce reports and outputs that are desired and needed to understand that data.

Reporting Tools

Reporting Tools are software tools that produce repeatable reports and graphic outputs. Reports are output in predefined formats, often on a schedule unlike query tools, which are more ad hoc in nature.

Data Mining Tools

Data Mining Tools are software tools that find patterns that are useful for predictive analytics and optimization analytics in stores of data or databases.

Statistics Tools

Statistics Tools are software tools that enable the analysis of data. There is a substantial overlap between statistics tools and data mining tools. Both can be used to describe and understand data as well as to draw conclusions about data and to make predictions. Statistics Tools work with data samples while data mining tools work with larger volumes of data (Big Data).

Data Profiling Tools

Data Profiling Tools are software tools that support understanding data through exploration and comparison. These tools are used to access existing data and explore it. The tool looks for patterns such as typical values, outlying values, ranges, and allowed values, so that you can better understand the content and quality of the data. It is very important to not just look at the definition of the data, but to actually look at the content of the data to see what is there. Application developers are notorious for using data columns for unintended purposes, so knowing what should be in a column and knowing what's actually populated in that column is very important for accuracy in reporting

Data Governance (DG)

The overall management of data and information, including the people, processes, and technology that improve the value obtained from data and information by treating data as an asset. It is the cornerstone of the data architecture.

Metadata

Metadatais information that describes the whole process and the data. The data included in system monitoring, a function that supports all areas, tracks what is happening as information moves from one place to another

Metadata Repository

A Metadata Repository is a software tool that has both a user interface and a database component to record, maintain, and look up data that describes other data.

Data warehousing Infrastructure

Data Warehouse Infractructure architecture is the combination of hardware and software that enables the operation of the data warehouse. Infrastructure elements include servers, storage devices, communication channels and operating systems.




•Communication Channels


•Storage Devices


•Database Servers


•Database Appliances


•ETL Servers


•Query and Reporting Servers


•Analytics Servers


•Applications


•Workstations

Data Warehouse Technical Stack *

The selection of tools used to analyze, design, build, operate, and maintain the data warehouse. Tool categories in the technical stack include metadata repository, data modeling tool, data movement tools, databases, and reporting/BI tools