Tuesday, April 9, 2019
Data Preprocessing Essay Example for Free
info Pre functioning EssayData Preprocessing 3 Todays real-world entropybases are highly susceptible to noisy, missing, and inconsistent entropy due to their typically huge size (often several gigabytes or more) and their likely origin from multiple, heterogenous sources. Low-quality data will petabyte to low-quality dig results. How tummy the data be preprocessed in order to help improve the quality of the data and, consequently, of the mining results? How can the data be preprocessed so as to improve the ef? ciency and ease of the mining process? There are several data preprocessing techniques. Data cleaning can be applied to draw off noise and correct inconsistencies in data. Data integration merges data from multiple sources into a coherent data store such as a data warehouse. Data reduction can reduce data size by, for instance, aggregating, eliminating redundant features, or clustering. Data transformations (e. g. , normalization) may be applied, where data are sca led to occur within a smaller range like 0. 0 to 1. 0. This can improve the accuracy and ef? ciency of mining algorithms involving distance measurements. These techniques are not mutually exclusive they may work together.For example, data cleaning can involve transformations to correct wrong data, such as by transforming all entries for a date ? age to a common format. In Chapter 2, we learned about the different attribute types and how to use basic statistical descriptions to study data characteristics. These can help identify erroneous values and outliers, which will be utilizable in the data cleaning and integration steps. Data processing techniques, when applied before mining, can advantageously improve the overall quality of the patterns mined and/or the time required for the actual mining.