In recent years, the e-business of enterprises is not limited to single-system issues such as whether the process is smooth or complete storage of transaction records. It often pays more attention to the integration of heterogeneous information systems, how to effectively collect and present data, and has an increasingly specific impact on the operating efficiency of enterprises. The concept of Data Warehouse refers to the concept of warehouse storage.
What Is A Data Warehouse?
Data warehouse is usually used for data mining, business intelligence, can cover mountains and seas, and can also deal with a single topic. In recent years, the e-business of enterprises is not limited to single-system issues such as whether the process is smooth or complete storage of transaction records. It often pays more attention to the integration of heterogeneous information systems, how to effectively collect and present data, and has an increasingly specific impact on the operating efficiency of enterprises. The concept of data warehouse refers to the concept of warehouse storage. It not only stores physical raw materials and finished products, but also integrates abstract file data in the information system and converts them into physical data warehouse.
The Difference Between Database, Data Warehouse and Data Warehouse System
A data warehouse is a database that stores large amounts of data, but it is not the same as a database. The data stored in the database is related to operations, and the data warehouse will organize and transfer the data to another data system for data analysis after the data has been accumulated for a period of time. Data warehouse usually refers to a database that stores integrated data, and data warehouse system generally refers to the entire decision-making support system, including system software and hardware, data and reports.
The term "Data Warehouse" was coined by Bill Inmon in 1990, so he is known as the father of Data Warehouse. In the book "What is a Data Warehouse", he believes that the data collection of data warehouse has 4 characteristics. : Subject-oriented, integrated, time-variant, and non-volatile. According to these characteristics, the data warehouse can provide data for decision-making management system for processing. Another representative of data warehouse, Ralph Kimball, believes in the book "The Data Warehouse Toolkit" that data warehouse is a structured copy of transaction data that can be queried and analyzed.
"Subject-oriented" means that the data warehouse can concentrate information related to a specific topic, not just the company's current operating information; "integrated" means that the data stored in the data warehouse is merged from different sources and maintained consistently organized ; "Change according to time" indicates that the data warehouse identifies the stored data at a specific point in time; "no loss" means that the data in the data warehouse will only continue to increase and will not be removed, which enables the management to gain business continuity observations.
Types of Data Warehouse
Data warehouse can be divided into enterprise data warehouse (EDW), operational data store and data mart. Some people think that in addition to enterprise data warehouse and data mart, data warehouse can also add virtual data warehouse and hybrid data warehouse.
- Enterprise Data Warehouse
The enterprise data warehouse contains the information of the entire enterprise and consists of several topics, such as customer, product, business, etc., which can be used for decision support, including real-time information and aggregated information.
- Operational Data Provider
"Operation" is relative to the informativeness of data warehouses. ODS provides detailed data, especially recent consolidated data, which can meet the needs of real-time reports. Operational data stores can only analyze very recent data and cannot analyze longer-term historical data. Bill Inmon published "The Operational Data Store" in 1995. He believed that the data collection of ODS is subject-oriented and integrated. 91ÊÓƵ¹ÙÍøever, the difference from data storage is that the data of ODS will be lost, and the current value is the main one. It does not contain historical and cumulative data, and ODS data can be collected in real time and integrated. According to the frequency of synchronous update of data, ODS also has grades for data transfer and storage schedule.
- Data Marketplace
Roughly the same as the definition of data warehouse, data warehouse covers the data and personnel of the entire company, while data mart only contains a specific range of data, and users will lock the personnel of a certain work group. A group of data marts can form an enterprise data warehouse, and vice versa. Assuming that a company adopts a mode where several data supermarkets exist at the same time, differences in the definition of data of the same dimension will turn the data market into a data island. Data islands are a big problem for the enterprise as a whole. The integration function is limited to departmental groups and cannot be extended to the integration of overall information. Cross-departmental data analysis cannot be performed, and different job attributes cannot be linked. Cross-departmental data analysis, the previous data market structure can only continue to accumulate in a stacked way, and cannot be integrated.
Nowadays, the construction of data warehouses still mostly starts with data marts, because the dimensional model adopted by data marts is easier to understand than the individual relationship model, and the analysis speed is faster, but it still depends on the needs of enterprises and users.
- Virtual Data Warehouse
The enterprise directly uses the existing operating database and assists some intermediary tools for effective data processing. The construction is faster, the chance of success is high, and real-time data analysis can be achieved.
- Hybrid Data warehouse
If the data mart is represented as a virtual data warehouse, it becomes a hybrid data warehouse. The storage space required is less than that of enterprise data storage. Since the data is already stored in a standardized data environment, the process of data reorganization will be simpler than reading the running data through the application program, and it will not affect the running data. The hybrid data warehouse can also cope with the data island phenomenon encountered in the data market, and can flexibly respond to different needs through virtual methods.
- Benefits of Data Warehouse
Data warehouse can achieve integration across data sources, so that data in different databases can be linked to each other. The establishment of an information system certainly solves the need for regular output and immediate storage of data. Once an enterprise wants to retrieve all kinds of integrated statistical information from the information system, it will immediately face the problem of different data sources, and it is impossible to cross-system at the same time. Access, and further automated processing and analysis is not possible. The data warehouse can be regarded as a single window for extracting data. Through the automatic conversion of the information system, the possibility of errors in manual exchange of files can be reduced.
Summary
The development of data warehouse initially only required the review of aggregated data, and then each transaction data began to be kept in the data warehouse to analyze the relationship between customer groups and products. At present, in addition to storing aggregate data and transaction data, it also retains detailed data to analyze customers' shopping.
This historical process shows that companies used to only want to know the total turnover, but now they are more concerned about how customers make choices in the transaction process.
Data warehouse is often compared with data mining and business intelligence. When used in marketing business, it can be used to understand customer habits, allowing companies to predict customer behavior in order to carry out appropriate promotions; internally, data warehouse can be used in internal operations. The evaluation allows senior executives to find out the crux of the poor operating conditions from specific data and evidence.