Monday, September 19, 2016

Semester I (2016-17) Assignment II Subject: DMDW Class: BE(IT)


MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2016-17)
Class: BE(IT)       Subject: DMDW         Assignment II
___________________________________________________________________________
1.      What is data mining? What features are expected from a of DM output.
2.      What are the DM primitives? Enlist and explain.
3.      What is KDD? Explain the stages of KDD with an appropriate example.
4.      Enlist applications of Data Mining and Association rule mining.
5.      Why data preprocessing is needed in DM? Explain the different forms of data preprocessing.
6.      Consider that you have to analyze Marks_of_ students. You find that many tuples have no recorded value for several attributes, such as customer total_marks. How can you fill the missing values for this attribute?
7.      Given a numerical attribute such as total_marks in the above data set, how can we smooth out the data to remove the noise?
8.      Suppose that the data for analysis includes the attribute total_amount. Which distributive, algebraic and holistic measures will be used for data analysis?
9.      (a) Define Min-max normalization. Suppose the minimum and maximum values for the attribute total_marks are 400 and 850, respectively. Using min-max normalization, transform and map value 760 to the range [0.0, 1.0].
(b) Define z-score normalization. The mean and standard deviation of the values for the attribute total_marks are 440 and 89, respectively. Using z-score normalization to transform value 550.
10.  Suppose that the data for analysis includes the attribute marks. The marks values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
11.  For the data set of Q.8 above
     (c) What is the midrange of the data?
           (d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the
           data?
12. For the data set of Q.8 above
         (e) Give the five-number summary of the data.
         (f) Show a boxplot of the data.
         (g) How is a quantile-quantile plot different from a quantile plot?
13. Describe the differences between the following approaches for the integration of a 
     data mining  system with a database or DW system: no coupling, loose coupling,
     semitight coupling, and tight coupling.
14. Define (i) Association rule mining (ii) Support  (iii) Confidence
15. Discuss the essential steps of apriori algorithm.
16. For the following transaction database, find out the frequent itemsets with support ≥ 50 % and confidence ≥ 60 %.
Tid
Items
T1
M, O , N , L
T2
D, O, N
T3
M, A,  L, D
T4
N,T,B,X
T5
C,O,N, M
T6
O,N

    
Faculty Incharge: Hashmi S A