Реферат Data mining

Тема: Доклады

odes or true or false. In text mining, the idea is to convert the text presented as a document to values вЂ‹вЂ‹presented in one row of a spreadsheet where each row represents a document and the c olumns contain words found in one or more documents. The values вЂ‹вЂ‹inside the spreadsheet can then be defined (categorically) as present (this word is in this document) or absent (this word is not in this document). The spreadsheet represents the entire set of documents or corpus.collection of unique words found in the entire document collection represents the dictionary and will likely be a very large set. However, many of the cells in the spreadsheet will be empty (not present). An empty cell in a data mining operation might pose a problem, as it would be interpreted as an incomplete record. However, in text mining, this sparseness of data works to reduce the processing requirements because only cells containing information need to be analyzed. The result is that the size of the spreadsheet is enormous but it is mostly empty. This Рѓgallows text mining programs to operate in what would be considered huge dimensions for regular data-mining applicationsРѓh (Weiss, et al., 2005, p. 5). Weiss, et al. (2005), the process of getting the text ready for text mining is very much like the knowledge discovery steps described earlier. In text mining, the text is usually converted first to XML format for consistency. It is then converted to a series of tokens (sometimes punctuation is interpreted as a token, sometimes as a delimiter). Then, some form of stemming is applied to the tokens to create the standardized dictionary. Familiar IR/data mining processes such as TF-IDF can be applied to assign different weights to the tokens. Once this has been done, classification and clustering algorithms are applied.on the goal of the text mining operation, it may or may not be important to incorporate linguistic processing in the text mining process. Examples of linguistic processing include marking certain types of words (part-of-speech tagging), clarifying the meaning of words (disambiguation) and parsing sentences. Per Benoit (2002), mining brings researchers closer to computational linguistics, as it tends to be highly focused on natural language elements in texts (Knight, 1999). This means TM applications (Church & Rau, 1995) discover knowledge through automatic content summarization (Kan & McKeown, 1999), content searching, document categorization, and lexical, grammatical, semantic, and linguistic analysis (Mattison, 1999). (P. 291)

Data mining is a synonym for knowledge discovery. Data mining also refers to a specific step in the knowledge discovery process, a process that focuses on the application of specific algorithms used to identify interesting patterns in the data repository. These patterns are then conveyed to an end user who converts these patterns into useful knowledge and makes use of that knowledge.mining has evolved out of the need to make sense of huge quantities of information. Usama M. Fayyad says that stored data is doubling every nine months and the Рѓgdemand for data mining and reduction tools increase exponentially (Fayyad, Piatetsky-Shapiro, & Uthurusamy, 2003, p. 192). Рѓh In 2006, $ 6 billion in text and data mining activities are anticipated (Zanasi, Brebbia, & Ebecken, 2005). US government is involved in many data mining initiatives aimed at improving services, detecting fraud and waste, and detecting terrorist activities. One such activity, the work of Able Danger, had identified one of the men who would, one year later, participate in the 9/11 attacks (Waterman, 2005). This fact emphasizes the importance of the final step of the knowledge discovery process: putting the knowledge to use.US government s data mining activities have helped stir concerns about data mining and their impact on privacy (Boyd, 2006). Privacy preserving data mining has only recently caught the attention of researchers (Verykios, Bertino, Fovino, Provenza, Saygin & Theodoridis, 2004). Is much work to done in the area of вЂ‹вЂ‹knowledge discovery and data mining, and its future depends on developing tools and techniques that yield useful knowledge without causing undue threats to individualsРѓf privacy.

References

1.Andrassoya, E., & Parali., J. (1999, September). Knowledge discovery in databases - a comparison of different views. Presented at the 10th International Conference on Information and Intelligent Systems, Sept. 1999, Varazdin, Croatia. p> 2.Baeza-Yates, & R., Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press. p>. Benoit, Gerald. (2002). Data Mining [Chapter 6, pps 265-310). In Cronin, B. (Ed.), Annual Review of Information Science and Technology: Vol. 36 (pp. 265-310). Silver Spring, MD: American Society for Information Scie...

Назад | сторінка 7 з 8 | Наступна сторінка

Схожі реферати:

Реферат на тему: Технології аналізу даних (Text Mining, Data Mining)

Реферат на тему: Аналіз даних за допомогою технології Data Mining

Реферат на тему: Пошук кластерів спільнот Live Journal за допомогою методів Data Mining в се ...

Реферат на тему: Опробування знімків Landsat Climate Data Record

Реферат на тему: Research data collection methods and stages of the research

Український реферат переглянуто разів: | Коментарів до українського реферату:

Коментарів до українського реферату: 0