Реферат Data mining

Тема: Доклады

ry process or be generalized to refer to the larger process of knowledge discovery.

5. Steps in Knowledge Discovery

5.1 Step 1: Task Discovery

goals of the data mining operation must be well understood before the process begins: The analyst must know what the problem to be solved is and what the questions that need answers are. Typically, a subject specialist works with the data analyst to refine the problem to be solved as part of the task discovery step (Benoit, 2002). br/>

.2 Step 2: Data Discovery

In this stage, the analyst and the end user determine what data they need to analyze in order to answer their questions, and then they explore the available data to see if what they need is available (Benoit, 2002).

.3 Step 3: Data Selection and Cleaning

Once data has been selected, it will need to be cleaned up: missing values вЂ‹вЂ‹must be handled in a consistent way such as eliminating incomplete records, manually filling them in, entering a constant for each missing value, or estimating a value. Other data records may be complete but wrong (noisy). These noisy elements must be handled in a consistent way (Benoit, 2002; Fayyad, et al., 1996). br/>

.4 Step 4: Data Transformation

Next, the data will be transformed into a form appropriate for mining. Per Weiss, Indurkhya, Zhang & Damerau (2005), Рѓgdata mining methods expect a highly structured format for data, necessitating extensive data preparation. Either we have to transform the original data, or the data are supplied in a highly structured formatРѓh (p. 1). Process of data transformation might include smoothing (eg using bin means to replace data errors), aggregation (eg viewing monthly data rather than daily), generalization (eg defining people as young, middle-aged, or old instead of by their exact age), normalization (scaling the data inside a fixed range), and attribute construction (adding new attributes to the data set, Han & Kamber, 2001, p. 114).

.5 Step 5: Data Reduction

The data will probably need to be reduced in order to make the analysis process manageable and cost-efficient. Data reduction techniques include data cube aggregation, dimension reduction (irrelevant or redundant attributes are removed), data compression (data is encoded to reduce the size, numerosity reduction (models or samples are used instead of the actual data), and discretization and concept hierarchy generation (attributes are replaced by some kind of higher level construct, Han & Kamber, 2001, pp. 116-117).

5.6 Step 6: Discovering Patterns (aka Data Mining)

this stage, the data is iteratively run through the data mining algorithms (see Data Mining Methods below) in an effort to find interesting and useful patterns or relationships. Often, classification and clustering algorithms are used first so that association rules can be applied (Benoit, 2002, p. 278). Rules yield patterns that are more interesting than others. This РѓginterestingnessРѓh is one of the measures used to determine the effectiveness of the particular algorithm (Fayyad, et al., 1996; Freitas, 1999; Han & Kamber, 2001)., Et al. (1996) states that interestingness is Рѓgusually taken as an overall measure of pattern value, combining validity, novelty, usefulness, and simplicityРѓh (p. 41). A pattern can be considered knowledge if it exceeds an interestingness threshold. That threshold is defined by the user, is domain specific, and Рѓgis determined by whatever functions and thresholds the user choosesРѓh (p. 41). br/>

.7 Step 7: Result Interpretation and Visualization

It is important that the output from the data mining step can be Рѓgreadily absorbed and accepted by the people who will use the resultsРѓh (Benoit, p. 272). Tools from computer graphics and graphics design are used to present and visualize the mined output. br/>

5.8 Step 8: Putting the Knowledge to Use

, the end user must make use of the output. In addition to solving the original problem, the new knowledge can also be incorporated into new models, and the entire knowledge or data mining cycle can begin again. br/>

6. Data Mining Methods

Common data mining methods include classification, regression, clustering, summarization, dependency modeling, and change and deviation detection. (Fayyad, et al., 1996, pp. 44-45)

.1 Classification

Classification is composed of two steps: supervised learning of a training set of data to create a model, and the...

Назад | сторінка 4 з 8 | Наступна сторінка

Схожі реферати:

Реферат на тему: Технології аналізу даних (Text Mining, Data Mining)

Реферат на тему: Аналіз даних за допомогою технології Data Mining

Реферат на тему: Пошук кластерів спільнот Live Journal за допомогою методів Data Mining в се ...

Реферат на тему: Creating a Data Mart for an Online E-Book Store

Реферат на тему: Опробування знімків Landsat Climate Data Record

Український реферат переглянуто разів: | Коментарів до українського реферату:

Коментарів до українського реферату: 0