MGM’s
College of Engineering, Nanded.
Department of IT
Semester I (2015-16)
Class: BE(IT) Subject: DMDW Assignment II
___________________________________________________________________________
Department of IT
Semester I (2015-16)
Class: BE(IT) Subject: DMDW Assignment II
___________________________________________________________________________
1
1.What
is data mining? What features are expected from a of DM output.
2 2.Why
data preprocessing is needed in DM? Explain the different forms of data
preprocessing.
3 3.Consider
that you have to analyze EMart_Customer sales data. You find that many tuples have no recorded value for several attributes, such as customer total_amount. How can you fill the missing values for this
attribute?
4 4.Given
a numerical attribute such as total_amount
in the above data set, how can we smooth out the data to remove the noise?
5 5.Suppose
that the data for analysis includes the attribute total_marks. Which distributive, algebraic and holistic measures
will be used for data analysis?
6 6.(a)
Define Min-max normalization. Suppose the minimum and maximum values for the
attribute total_sales are Rs.40000
and Rs.85000, respectively. Using min-max normalization, transform and map
value Rs. 76000 to the range [0.0, 1.0].
(b) Define z-score
normalization. The mean and standard deviation of the values for the attribute total_marks are 4400 and 890,
respectively. Using z-score normalization to transform value 5500.
7 7.Suppose
that the data for analysis includes the attribute marks. The marks values
for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22,
22, 25, 25, 25, 25, 30, 33,33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of
the data? What is the median?
(b) What is the mode of
the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
8.For the
data set of Q.8 abov
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile
(Q1) and the third quartile (Q3) of the
data?
data?
9. For the data set of Q.8 above
(e) Give the
five-number summary of the data.
(f) Show a boxplot of
the data.
(g) How is a
quantile-quantile plot different from a quantile plot?
10. Define
the following DM functionalities:characterization,discrimination,association
and correlation analysis.
11. Explain
five primitives of data mining with an appropriate example.
12. Describe
the differences between the following approaches for the integration of a
data mining system with a database or DW system: no coupling,
loose coupling,semitight coupling, and tight
coupling.13.What
is KDD? Explain the stages of KDD with an appropriate example.
14.Use the two methods below to normalize the
following group of data: 2000,3000,4000,6000,10000
(a) min-max
normalization by setting min=0 and max=1
(b) z-score
normalization
15. Define (i) Association rule mining (ii)
Support (iii) Confidence.
16. Enlist applications of Data Mining and
Association rule mining.
Faculty
Incharge: Hashmi S A