Wednesday, September 16, 2015

Semester I (2015-16) Assignment II Class: BE(IT) Subject: DMDW

MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2015-16)
Class: BE(IT)       Subject: DMDW         Assignment II
___________________________________________________________________________
1
     1.What is data mining? What features are expected from a of DM output.
2   2.Why data preprocessing is needed in DM? Explain the different forms of data preprocessing.
3   3.Consider that you have to analyze EMart_Customer  sales data. You find that many tuples have no   recorded value for several attributes, such as customer total_amount. How can you fill the missing   values for this attribute?
4   4.Given a numerical attribute such as total_amount in the above data set, how can we smooth out the data to remove the noise?
5   5.Suppose that the data for analysis includes the attribute total_marks. Which distributive, algebraic and holistic measures will be used for data analysis?
6   6.(a) Define Min-max normalization. Suppose the minimum and maximum values for the attribute total_sales are Rs.40000 and Rs.85000, respectively. Using min-max normalization, transform and map value Rs. 76000 to the range [0.0, 1.0].
(b) Define z-score normalization. The mean and standard deviation of the values for the attribute total_marks are 4400 and 890, respectively. Using z-score normalization to transform value 5500.
7   7.Suppose that the data for analysis includes the attribute marks. The marks values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
    (a) What is the mean of the data? What is the median?
    (b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
8.For the data set of Q.8 abov
   (c) What is the midrange of the data?
   (d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the
           data?
9. For the data set of Q.8 above
   (e) Give the five-number summary of the data.
   (f) Show a boxplot of the data.
   (g) How is a quantile-quantile plot different from a quantile plot?
10.  Define the following DM functionalities:characterization,discrimination,association and correlation analysis.
11.  Explain five primitives of data mining with an appropriate example.
12.  Describe the differences between the following approaches for the integration of a 
data mining  system with a database or DW system: no coupling, loose coupling,semitight coupling, and tight coupling.13.What is KDD? Explain the stages of KDD with an appropriate example.
14.Use the two methods below to normalize the following group of data: 2000,3000,4000,6000,10000
       (a) min-max normalization by setting min=0 and max=1
       (b) z-score normalization
15. Define (i) Association rule mining (ii) Support  (iii) Confidence.
16. Enlist applications of Data Mining and Association rule mining.


Faculty Incharge: Hashmi S A