In the context of computing, a data warehouse is a collection of data aimed at a specific area (company, organization, etc.), integrated, non – volatile and variable over time, which helps decision making in the entity in which it is used. It is used for reporting and data analysis 1 and is considered a fundamental component of business intelligence . 2 It is, above all, a complete file of an organization, beyond transactional and operational information, stored in a database designed to favor the efficient analysis and dissemination of data (especially OLAP , online analytical processing ). The storage of the data should not be used with current usage data. Data warehouses often contain large amounts of information that are sometimes subdivided into smaller logical units depending on the subsystem of the entity they come from or for which they are needed.
Function of a data warehouse
In a data warehouse what is wanted is to contain data that are necessary or useful for an organization, that is, that is used as a repository of data to later transform them into useful information for the user. A data warehouse must deliver the correct information to the right people at the right time and in the right format. The data warehouse responds to the needs of expert users, using Decision Support Systems ( DSS ), Executive Information Systems ( EIS ) or tools to make queries or reports. End users can easily make inquiries about their data warehouses without touching or affecting the operation of the system.
In the operation of a data warehouse the following ideas are very important:
- Integration of data from databases distributed by the different units of the organization and which will often have different structures (heterogeneous sources). A comprehensive description and a comprehensive analysis of the entire organization in the data warehouse should be provided.
- Separation of the data used in daily operations of the data used in the data warehouse for the purposes of disclosure, help in decision making, for analysis and for control operations. Both types of data must not coincide in the same database, since they obey very different objectives and could interfere with each other.
Periodically, data is imported into the data warehouse of the different resource planning systems of the entity ( ERP ) and other software systems related to the business for further processing. It is common practice to normalize the data before combining it in the data warehouse using extraction, transformation and loading ( ETL ) tools . These tools read the primary data (often OLTP databases of a business), perform the transformation process to the data warehouse (filtering, adaptation, format changes, etc.) and write to the warehouse.
Advantages and disadvantages of data warehouses
If so many companies have begun to take advantage of the data warehouses it is clear that it can not be an error or a coincidence. One of the objectives that is adopted through the introduction of this resource in the company is to allow businesses to have better access to the data that may be necessary. The information is extensive and exhaustive, allowing these data to be used in different processes adopted in the company with ease and greater simplicity than would be enjoyed without the warehouses.
Another advantage is related to the way in which the company’s applications work better thanks to the use of data warehouses. The main reason for this is that these stores have the opportunity to carry out combined work processes, so that the systems are simplified in this aspect. One of the resources most appreciated by companies is that the relationships they have with customers can be managed in a more immediate way through this technology.
Lastly, we talk about how warehouses can become the tool that helps us make decisions with more immediacy even at times when there seems to be no other option. This technology enables other very convenient processes. For example, it makes available to companies exception reports, in which a clear distinction is made that takes into account the list of forecasts and the final results that have been obtained. These data help to see what has been a tendency to learn from it and act in a different way in the future. You can also manage the reports that tell us about trends, helping us to see in an analysis exercise what services or products are best working in order to increase support in this type of element.
Advantages as a list
There are many advantages for which it is advisable to use a data warehouse. Some of them are:
- Data warehouses make access to a wide variety of data easier for end users
- Facilitate the operation of decision support systems applications such as trend reports , for example: obtain the items with the most sales in a particular area within the last two years; exception reports, reports that show the actual results against the a priori objectives.
- Data warehouses can work together and, therefore, increase the operational value of business applications, especially customer relationship management.
- Provides key information for business decision making.
- Improves the quality of decisions made.
- Especially useful for the medium and long term.
- They are relatively simple systems to install if the data sources and objectives are clear.
- Very useful for storage of analyzes and queries of historical.
- It provides a great power of information processing.
- It allows greater flexibility and speed in the access to information.
- Facilitates decision making in business.
- Companies get an increase in productivity.
- Provides reliable communication between all departments of the company.
- Improve relationships with suppliers and customers.
- It allows to know what is happening in the business, that is, to be always aware of good and bad results.
- Transforms data into information and information into knowledge
- It allows you to plan more effectively.
- Reduce response times and operating costs.
Any technology, however advanced and efficient, must have a contrast, something that makes us think that it is not a definitive or mandatory method. This helps create balance and show companies that there are exceptions and cases where this technology may not be the answer. With the data warehouses we can find varied problems that do not have to occur in all cases. One of the problems is that it is not an elastic technology and that the costs of use can grow too much. On the other hand, obsolescence is another risk, since it can come too soon. And there are cases in which the effectiveness of data warehouses does not occur as we would like. There are times when the response to a query provides little and little information, which is not very useful for a complete report.
It is also important that we assess that there are times when confusion can occur in the use of these stores, since there may be aspects that are not very specific. This leads to IT teams having to take care to personalize the experience, delimit the role of the data warehouse in a specific way and squeeze their resources so that tools are not being wasted. It requires involvement on the part of the technical teams and is something that we will have to take into account, demonstrating that the data warehouses have to be implemented effectively first and foremost and with a prior analysis of the requirements.
Disadvantages as a list
Using data warehouses also poses some drawbacks, some of them are:
- Throughout his life the data warehouses can suppose high costs. The data warehouse is not usually static. Maintenance costs are high.
- Data warehouses may become obsolete relatively soon.
- Sometimes, before a request for information, they return suboptimal information, which also represents a loss for the organization.
- There is often a thin line between data warehouses and operational systems. It is necessary to determine which functionalities of these can be exploited and which ones should be implemented in the data warehouse, it would be costly to implement unnecessary operations or to stop implementing any that will be needed.
- It is not very useful for making decisions in real time due to the long processing time it may require. In any case, the tendency of current products (together with hardware advances) is to solve this problem by converting the disadvantage into an advantage.
- It requires continuous cleaning, transformation and data integration.
- In an implementation process, difficulties may be encountered in relation to the different objectives that an organization intends.
- Once implemented, it can be difficult to add new data sources.
- They require a review of the data model, objects, transactions and in addition to storage.
- They have a complex and multidisciplinary design.
- They require a restructuring of the operational systems.