MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2014-15)
Class: BE(IT) Subject: DMDW Assignment II
________________________________________________________
3. Imagine that you need to analyze MGM_Student exam data. You find that many tuples have no recorded value for several attributes, such as student percentage. How can you go about filling in the missing values for this attribute?
4. Given a numerical attribute such as percentage in the above data set, how can we smooth out the data to remove the noise?
5. Suppose that the data for analysis includes the attribute age. Which distributive, algebraic and holistic measures will be used for data analysis?
6. What is data transformation? What functions are performed in data transformation?
7. (a) Define Min-max normalization. Suppose the minimum and maximum values for the attribute percentage are 40 and 85, respectively. Using min-max normalization, transform and map value Rs. 76 to the range [0.0, 1.0].
Department of IT
Semester I (2014-15)
Class: BE(IT) Subject: DMDW Assignment II
________________________________________________________
1.
Define
data mining? Elaborate the characteristics of DM output.
2.
Discuss
the different forms of data preprocessing in DM.
3. Imagine that you need to analyze MGM_Student exam data. You find that many tuples have no recorded value for several attributes, such as student percentage. How can you go about filling in the missing values for this attribute?
4. Given a numerical attribute such as percentage in the above data set, how can we smooth out the data to remove the noise?
5. Suppose that the data for analysis includes the attribute age. Which distributive, algebraic and holistic measures will be used for data analysis?
6. What is data transformation? What functions are performed in data transformation?
7. (a) Define Min-max normalization. Suppose the minimum and maximum values for the attribute percentage are 40 and 85, respectively. Using min-max normalization, transform and map value Rs. 76 to the range [0.0, 1.0].
(b) Define z-score normalization. The mean and standard deviation of the values for the attribute salary are Rs.44000 and Rs. 8900, respectively.Using z-score normalization transform a value of Rs. 55000.
8. Suppose that the data for analysis includes the attribute percentage. The percentage values for the data tuples are (in increasing order) 11, 13, 14,14, 17, 18, 18, 19, 20, 20, 23, 23, 23, 23, 28, 31,31, 33, 33, 33, 33, 34, 38, 43, 44, 50, 68.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality
(i.e.,bimodal, trimodal, etc.).
9. For the data set of Q.8 above
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
10. For the data set of Q.8 above
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) How is a quantile-quantile plot different from a quantile plot?
11.Define the following DM functionalities: characterization, discrimination, association and correlation analysis. Give examples of each DM functionality, using a real-life database with which you are familiar.
12.Define classification, prediction, clustering, and evolution analysis. Give examples of each using a real-life database with which you are familiar.
13. List and describe the five primitives for specifying a data mining task.
14. Describe the differences between the following approaches for the integration of a data mining system with a database or DW system: no coupling, loose coupling, semitight coupling, and tight coupling.
15.What is KDD? Enlist and explain the stages of KDD.
16. Use the two methods below to normalize the following group of data:
200,300,400,600,1000
(a) min-max normalization by setting min=0 and max=1
(b) z-score normalization
(a) min-max normalization by setting min=0 and max=1
(b) z-score normalization
Faculty Incharge: Hashmi S A