mining
Students KA-81YanaYaroslav
2011
Table of Contents
Abstract
Introduction
. What is Data Mining
2. Developmental History of Data Mining and Knowledge Discovery
. Theoretical Principles
. Technological Elements of Data Mining
. Steps in Knowledge Discovery
.1 Step 1: Task Discovery
.2 Step 2: Data Discovery
.3 Step 3: Data Transformation
.4 Step 4: Data Reduction
.5 Step 5: Discovering Patterns (aka Data Mining)
.6 Step 6: Result Interpretation and Visualization
.7 Step 7: Putting the Knowledge to Use
. Data Mining Methods
.1 Classification
.2 Regression
.3 Clustering
.4 Summarization
.5 Change and Deviation Detection
. Related Disciplines: Information Retrieval and Text Mining
.1 Information Retrieval (IR)
.2 IR Contributions to Data Mining
.3 Data Mining Contributions to IR
. Text Mining
Abstract
mining or knowledge discovery refers to the process of finding interesting information in large repositories of data. The term data mining also refers to the step in the knowledge discovery process in which special algorithms are employed in hopes of identifying interesting patterns in the data. These interesting patterns are then analyzed yielding knowledge. The desired outcome of data mining activities is to discover knowledge that is not explicit in the data, and to put that knowledge to use.involved in digital libraries are already benefiting from data mining techniques as they explore ways to automatically classify information and explore new approaches for subject clustering (MetaCombine Project). As the field grows, new applications for libraries are likely to evolve and it will be important for library administrators to have a basic understanding of the technology.wide variety of data mining techniques are also employed by industry and government. Many of these activities pose threats to personal privacy. As professionals ethically bound to ensure that individual privacy is safe-guarded, data mining activities should be monitored and kept on every librarian s radar.paper is written for information professionals who would like a better understanding of knowledge discovery and data mining techniques. It explains the historical development of this new discipline, explains specific data mining methods, and concludes that future development should focus on developing tools and techniques that yield useful knowledge without invading individual privacy. 2
Introduction
Data mining is an ambiguous term that has been used to refer to the process of finding interesting information in large repositories of data. More precisely, the term refers to the application of special algorithms in a process built upon sound principles from numerous disciplines including statistics, artificial intelligence, machine learning, database science, and information retrieval (Han & Kamber, 2001). Mining algorithms are utilized in the process of pursuits variously called data mining, knowledge mining, data driven discovery, and deductive learning (Dunham, 2003). Data mining techniques can be performed on a wide variety of data types including databases, text, spatial data, temporal data, images, and other complex data (Frawley, Piatetsky-Shapiro, & Matheus, 1991; Hearst, 1999; Roddick & Spilio poulou, 1999; ZaГЇane, OR, Han, J., Li, Z., & Hou, J, 1998). areas of specialty have a name such as KDD (knowledge discovery in databases), text mining and Web mining. Most of these specialties utilize the same basic toolset and follow the same basi c process and (hopefully) yield the same product - useful knowledge that was not explicitly part of the original data set ( BenoГ®t, 2002; Han & Kamber, 2001, Fayyed, Piatetsky-Shapiro, & Smyth, 1996). 3
1. What is Data Mining
data knowledge information mining
Data mining refers to the process of finding interesting patterns in data that are not explicitly part of the data (Witten & Frank, 2005, p. xxiii). The interesting patterns can be used to tell us something new and to make predictions. The process of data mining is composed of several steps including selecting data to analyze, preparing the data, applying the data mining algorithms, and then interpreting and evaluating the results. Sometimes the term data mini...