Wednesday, October 30, 2019

Class: BE(IT) Subject: DMDW Assignment II Semester I (2019-20)


MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2019-20)

Class: BE(IT)       Subject: DMDW         Assignment II

1. Define Data Mining. What types of attributes are used for data mining?
2. Suppose that the data for analysis includes the attribute Marks. The Marks values for the data
    tuples are (in increasing order) 23, 25, 26, 26, 29, 30, 30, 31, 32, 32, 35, 35, 35, 35, 40,
    43, 43, 45, 45, 45, 45, 46, 50, 55, 56, 62, 80. Calculate Mean, Median, Mode and Midrange of     
    the data.
3. Suppose we have the following values for sales (in thousands of rupees), shown in   
    increasing order: 40, 46, 57, 60, 62, 62, 66, 70, 73, 80, 80, 120. Calculate Variance and
    Standard Deviation.
4. Given two objects represented by the tuples (12, 5) and (25, 20):
     (a) Compute the Euclidean distance between the two objects.
     (b) Compute the Manhattan distance between the two objects.
5. Compare      (i) Discrete and continuous attributes.
                        (ii) Interval-scaled and ratio-scaled attributes.
6. Describe the major steps involved in data preprocessing.
7. Define Noise in data. For the following data : 23, 25, 26, 26, 29, 30, 30, 31, 32, 32, 35, 35, 
    35, 35, 40, 43, 43, 45, 45, 45, 45, 46, 50, 55, 56, 62, 80. Use smoothing by bin means to
    smooth these data, using a bin depth of 3.
8. Compare min-max and z-score normalization.
9. Use these methods to normalize the following group of data:
                        100, 200, 300, 500,900
(a) min-max normalization by setting min= 0 and max = 1
(b) z-score normalization
10. Define: (i) Support (ii) Confidence (iii) closed frequent itemset (iv) maximal frequent   
      itemset.
11. Discuss the steps of the Apriori algorithm.
12. Apply the Apriori Algorithm to the following transaction database and find the frequent itemsets   using minimum support count as 2.
                                     
                      
TIDList of Items
101Lux, Tide, Colgate
102Tide,Ponds
103Tide, Vim
104Lux, Tide, Ponds
105Lux, Vim
106Tide, Vim
107Lux, Vim
108Lux, Tide, Vim, Ponds
109Lux, Tide, Vim

13. Identify the drawbacks of Apriori algorithm.
14. Evaluate the Bayesian classifier with an appropriate example.
15. Evaluate k-means clustering algorithm with an appropriate example.
16. Classify web mining techniques.



Faculty Incharge: Hashmi S A