ical processing (OLAP), decision support systems, data scrubbing/staging (transformation), and association rule algorithms (Dunham, 2003, p. 13, 35-39; Han & Kamber, 2001 , p. 3). the 1990s, data mining changed from being an interesting new technology to becoming part of standard business practice. This occurred because the cost of computer disk storage went down, processing power went up, and the benefits of data mining became more apparent. Businesses began using data mining to help manage all phases of the customer life cycle, including acquiring new customers, increasing revenue from existing customers, and retaining good customers (Two Crows, 1999, p. 5). mining is used by a wide variety of industries and sectors including retail, medical, telecommunications, scientific, financial, pharmaceutical, marketing, Internet-based companies, and the government (Fayyad, et al., 1996). In a May, 2004 report on Federal data mining activities, the US General Accounting Office (GAO, 2004) reported there were 199 data mining operations underway or planned in various federal agencies (p. 3), and this list doesn t include the secret data mining activities such as MATRIX and the NSA s eavesdropping (Schneier, 2006). mining is an area of ​​much research and development activity. There are many factors that drive this activity including online companies who wish to learn more about their customers and potential customers, governmental agents tasked with locating terrorists and optimizing services, and the user need for filtered information.
3. Theoretical Principles
The underlying principle of data mining is that there are hidden but useful patterns inside data and these patterns can be used to infer rules that allow for the prediction of future results (GAO, 2004, p. 4) . mining as a discipline has developed in response to the human need to make sense of the sea of ​​data that engulfs us. Per Dunham (2003), data doubles each year and yet the amount of useful information available to us is decreasing (p. xi). The goal of data mining is to identify and make use of the golden nuggets (Han & Kamber, 2001, p. 4) floating in the sea ​​of ​​data.to 1960 and the dawn of the computer age, a data analyst was an individual with expert knowledge (domain expert) and training in statistics. His job was to cull through the raw data and find patterns, make extrapolations, and locate interesting information which he then conveyed via written reports, graphs and charts. But today, the task is too complicated for a single expert (Fayyad, et al., 1996, p. 37). Information is distributed across multiple platforms and stored in a wide variety of formats, some of which are structured and some unstructured. Data repositories are often incomplete. Sometimes the data is continuous and other times discrete. But always the amount of data to be analyzed is enormous.involves searching large databases, but it distinguishes itself from database querying in that it seeks implicit patterns in the data rather than simply extracting selections from the database. Per Benoît (2002), the database query answers the question what company purchased over $ 100,000 worth of widgets last year? (p. 270) whereas data mining answers the question what company is likely to purchase over $ 100,000 worth of widgets next year and why? span> (p. 270). forms of data mining (KDD included) operate on the principle that we can learn something new from the data by applying certain algorithms to it to find patterns and to create models which we then use to make predictions, or to find new data relationships (Benoît, 2002; Fayyad, et al., 1996; Hearst, 2003). important principle of data mining is the importance of presenting the patterns in an understandable way. Recall that the final step in the KDD process is presentation and interpretation. Once patterns have been identified, they must be conveyed to the end user in a way that allows the user to act on them and to provide feedback to the system. Pie charts, decision trees, data cubes, crosstabs, and concept hierarchies are commonly used presentation tools that effectively convey the discovered patterns to a wide variety of users (Han & Kamber, 2001, pp. 157-158).
4. Technological Elements of Data Mining
Because of the inconsistent use of terminology, data mining can both be called a step in the knowledge discove...