Saturday, September 21, 2013


DMDW Assignment II Sept. 2013 2013-14

MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2013-14)
Class: BE(IT) Subject: DMDW Assignment II
________________________________________________________
1. What is data mining? Explain its characteristics.
2. Why data preprocessing is required for DM? What are the types of data preprocessing?
3. How missing values are filled for an attribute in DM data cleaning process?
4. What is a noise in data? Explain the data smoothing techniques in DM.
5. What is data transformation? What functions are performed in data transformation?
6. Define Min-max normalization. Suppose the minimum and maximum values for the attribute salary are Rs. 50000 and Rs. 95000, respectively. Using min-max normalization, transform and map value Rs. 76100 to the range [0.0, 1.0].
7. Define z-score normalization. The mean and standard deviation of the values for the attribute total_marks are 810 and 900, respectively. Using z-score normalization transform a value of 985.
8. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
9. For the data set of Q.8 above
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile
(Q3) of the data?
10. For the data set of Q.8 above
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) How is a quantile-quantile plot different from a quantile plot?
11. Define the following DM functionalities: characterization, discrimination, association and correlation analysis. Give examples of each DM functionality, using a real-life database with which you are familiar.
12. Define classification, prediction, clustering, and evolution analysis. Give examples of each using a real-life database with which you are familiar.
13. List and describe the five primitives for specifying a data mining task.
14. Describe the differences between the following approaches for the
integration of a data mining system with a database or DW system: no
coupling, loose coupling, semitight coupling, and tight coupling.
15. Write data mining query in DMQL for the following case study :
Suppose, as a marketing manager of AllElectronics, you would like to classify customers based on their buying patterns. You are especially interested in those customers whose salary is no less than $40,000, and who have bought more than $1,000 worth of items, each of which is priced at no less than $100. In particular, you are interested in the customer’s age, income, the types of items purchased, the purchase location, and where the items were made. You would like to view the resulting classification in the form of rules.
16. What is KDD? Enlist and explain the stages of KDD.



Faculty Incharge: Hashmi S A