MGM’s College of
Engineering, Nanded.
Department of IT
Semester I (2018-19)
Department of IT
Semester I (2018-19)
Class: BE(IT) Subject: DMDW Assignment II
1. List and describe the types of attributes used for
data mining.
2. Suppose
that the data for analysis includes the attribute age. The age values
for the data
tuples are (in increasing order) 13, 15,
16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52,
70. Calculate Mean, Median, Mode and Midrange of
the data.
the data.
3. Suppose we have the
following values for salary (in thousands of dollars), shown in
increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Calculate Variance and
Standard Deviation.
increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Calculate Variance and
Standard Deviation.
4. Given two objects represented by the tuples
(22, 1, 42, 10) and (20, 0, 36, 8):
(a)
Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan
distance between the two objects.
5. Compare (i) Discrete and
continuous attributes.
(ii)
Interval-scaled and ratio-scaled attributes.
6. Suppose that a hospital tested the age and body fat data for 18
randomly selected adults
with the following results:
Calculate the mean, median, and
standard deviation of age and %fat.
7. Draw the boxplots for age and %fat for the data of Q.No.6 above.
8. Describe the major steps involved in data preprocessing.
9. Imagine that you need to analyze AllElectronics
sales and customer data. You note that
many tuples have no recorded value for several attributes such as customer income. Which
methods will be employed to fill the missing values?
many tuples have no recorded value for several attributes such as customer income. Which
methods will be employed to fill the missing values?
10. Define Noise in data. For the following data : 13, 15, 16, 16, 19,
20, 20, 21, 22, 22, 25, 25,
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Use smoothing by bin means to
smooth these data, using a bin depth of 3.
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Use smoothing by bin means to
smooth these data, using a bin depth of 3.
11. Compare min-max and z-score normalization.
12. Use these methods to normalize the following group of data:
200, 300,
400, 600,1000
(a) min-max normalization by setting min=
0 and max = 1
(b) z-score normalization
13. Define: (i) Support (ii) Confidence (iii) closed frequent itemset
(iv) maximal frequent
itemset.
itemset.
14. Discuss the steps of Apriori algorithm.
15. For the following transaction database, calculate the frequent
itemsets using Apriori algoritm where the minimum support count is 2.
16. List the drawbacks of Apriori algorithm.
Faculty Incharge: Hashmi S A