MGM’s
College of Engineering, Nanded.
Department of IT
Semester I (2016-17)
Class: BE(IT) Subject: DMDW Assignment II
___________________________________________________________________________
Department of IT
Semester I (2016-17)
Class: BE(IT) Subject: DMDW Assignment II
___________________________________________________________________________
1. What
is data mining? What features are expected from a of DM output.
2. What
are the DM primitives? Enlist and explain.
3. What
is KDD? Explain the stages of KDD with an appropriate example.
4. Enlist
applications of Data Mining and Association rule mining.
5. Why
data preprocessing is needed in DM? Explain the different forms of data
preprocessing.
6. Consider
that you have to analyze Marks_of_
students. You find that many tuples have no recorded value for several
attributes, such as customer total_marks.
How can you fill the missing values for this attribute?
7. Given
a numerical attribute such as total_marks
in the above data set, how can we smooth out the data to remove the noise?
8. Suppose
that the data for analysis includes the attribute total_amount. Which distributive, algebraic and holistic measures
will be used for data analysis?
9. (a)
Define Min-max normalization. Suppose the minimum and maximum values for the
attribute total_marks are 400 and 850,
respectively. Using min-max normalization, transform and map value 760 to the
range [0.0, 1.0].
(b) Define z-score
normalization. The mean and standard deviation of the values for the attribute total_marks are 440 and 89,
respectively. Using z-score normalization to transform value 550.
10. Suppose
that the data for analysis includes the attribute marks. The marks values
for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22,
22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of
the data? What is the median?
(b) What is the mode of
the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
11. For the
data set of Q.8 above
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile
(Q1) and the third quartile (Q3) of the
data?
data?
12. For the data set of Q.8 above
(e) Give the
five-number summary of the data.
(f) Show a boxplot of
the data.
(g) How is a
quantile-quantile plot different from a quantile plot?
13.
Describe the differences between the following approaches for the integration
of a
data mining system with a database or DW system: no coupling,
loose coupling,
semitight coupling, and tight coupling.
semitight coupling, and tight coupling.
14. Define (i) Association rule mining (ii)
Support (iii) Confidence
15. Discuss the essential steps of apriori
algorithm.
16.
For the following transaction database, find out the frequent itemsets with
support ≥ 50 % and confidence ≥ 60 %.
Tid
|
Items
|
T1
|
M, O , N , L
|
T2
|
D, O, N
|
T3
|
M, A, L, D
|
T4
|
N,T,B,X
|
T5
|
C,O,N, M
|
T6
|
O,N
|
Faculty
Incharge: Hashmi S A