What is data mining ?
Data mining (is the analysis stage “Knowledge Discovery in Databases” or KDD) is a field of statistics and computer science refers to the process that attempts to discover patterns in large volume datasets . It uses the methods of artificial intelligence , machine learning , statistics and database systems . The general objective of the data mining process is to extract information from a data set and transform it into a comprehensible structure for later use. In addition to the raw analysis stage, it assumes aspects of data and database management, data processing , model and inference considerations, interest metrics, considerations of computational complexity theory , post-processing of discovered structures, visualization and the online update.
The term is a fashionable concept, and is frequently misused to refer to any form of large-scale data or information processing (collection, extraction, storage, analysis and statistics), but it has also been generalized to any type of system Computer decision support, including artificial intelligence, machine learning and business intelligence. In the use of the word, the key term is discovery, commonly defined as “the detection of something new”. Even the popular book “Data mining: system of practical learning tools and techniques with Java” (covering all machine learning material) was originally going to be called simply “the machine of practical learning”, and the term “mining of data” It was added for marketing reasons. Often, the most general terms “(large scale) data analysis”, or “analysis” -. or when it refers to current methods, artificial intelligence and machine learning are more appropriate.
The real data mining task is the automatic or semi-automatic analysis of large amounts of data to extract interesting patterns hitherto unknown, such as groups of data records (cluster analysis), unusual records (detection of anomalies) and dependencies (mining by association rules). This usually involves the use of database techniques such as spatial indexes. These patterns can then be viewed as a kind of summary of the input data, and can be used in further analysis or, for example, in machine learning and predictive analysis. For example, the data mining step could identify several groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither data collection, data preparation, nor the interpretation of results and information are part of the data mining stage, but they belong to the entire KDD process as additional steps.
The terms related to data collection, data fishing and data spying relate to the use of data mining methods to sample parts of a set of larger established population data that are (or can be be) too small for the reliable statistical inferences that were made about the validity of any discovered pattern. These methods can, however, be used in the creation of new hypotheses that are tested against larger data populations.
Advantages of Data Mining
When performing the Data Mining , advantages such as:
- Assists in the prevention of future adverse situations by showing true data.
- Contributes to strategic decision making by discovering key information.
- Improvement in the compression of information and knowledge, facilitating reading to users.
- Data mining discovers information that was not expected to be obtained. As many different models are used, some unexpected results tend to appear. The combinations of different techniques give unexpected effects that transform into an added value to the company.
- Huge databases can be analyzed using data mining technology.
- The results are easy to understand: people without prior knowledge in computer engineering can interpret the results with their own ideas
- Contributes to making tactical and strategic decisions to detect key information
- It allows you to find, attract and retain customers. Reduce the risk of losing customers: offer specific promotions or special products to retain them.
- Improve the relationship with the client: the company can improve customer service based on the information obtained.
- It allows you to offer your customers the products or services they need.
- The models are reliable. The models are tested and tested using statistical techniques before being used, so that the predictions obtained are reliable and valid.
- For the most part, models are generated and constructed quickly. Modeling sometimes becomes easier since many algorithms have been previously tested.
- It opens new business opportunities and saves costs to the company.
Disadvantages of Data Mining
Despite all these advantages, it should be considered that there are some disadvantages in Data Mining , such as:
- Excessive work intensity may require investment in high performance teams and staff training.
- The difficulty of collecting the data. Depending on the type of data that you want to collect can be a lot of work.
- Although less and less, the requirement of a large investment can also be considered an inconvenience. Sometimes, the necessary technologies to carry out the data collection, is not an easy task and consumes many resources that could suppose a high cost..
- Depending on the amount of database it may take some time to preprocess all that information.
- The lack of an appropriate security system would put at risk the private information of the users.
- It is not a perfect process, if the information is inaccurate, it would affect the outcome of the decision making process.