Wednesday, October 3, 2018

Class: BE(IT) Subject: DMDW Assignment II Semester I (2018-19)


MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2018-19)

Class: BE(IT)       Subject: DMDW         Assignment II

1. List and describe the types of attributes used for data mining.
2. Suppose that the data for analysis includes the attribute age. The age values for the data
    tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
    33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Calculate Mean, Median, Mode and Midrange of     
    the data.
3. Suppose we have the following values for salary (in thousands of dollars), shown in   
    increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. Calculate Variance and
    Standard Deviation.
4. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
     (a) Compute the Euclidean distance between the two objects.
     (b) Compute the Manhattan distance between the two objects.
5. Compare      (i) Discrete and continuous attributes.
                        (ii) Interval-scaled and ratio-scaled attributes.
6. Suppose that a hospital tested the age and body fat data for 18 randomly selected adults
    with the following results:


   Calculate the mean, median, and standard deviation of age and %fat.

7. Draw the boxplots for age and %fat for the data of Q.No.6 above.
8. Describe the major steps involved in data preprocessing.
9. Imagine that you need to analyze AllElectronics sales and customer data. You note that
    many tuples have no recorded value for several attributes such as customer income. Which
    methods will be employed to fill the missing values?
10. Define Noise in data. For the following data : 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25,
    25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Use smoothing by bin means to
    smooth these data, using a bin depth of 3.
11. Compare min-max and z-score normalization.
12. Use these methods to normalize the following group of data:
                        200, 300, 400, 600,1000
(a) min-max normalization by setting min= 0 and max = 1
(b) z-score normalization
13. Define: (i) Support (ii) Confidence (iii) closed frequent itemset (iv) maximal frequent   
      itemset.
14. Discuss the steps of Apriori algorithm.
15. For the following transaction database, calculate the frequent itemsets using Apriori algoritm where the minimum support count is 2.

16. List the drawbacks of Apriori algorithm.



Faculty Incharge: Hashmi S A