Friday, October 6, 2017

Class: BE(IT)-CGPA Subject: DMDW Assignment II

MGM’s College of Engineering, Nanded.
Department of IT
Semester I (2017-18)
Class: BE(IT)       Subject: DMDW         Assignment II

1. Suppose we have the following values for salary (in thousands of dollars), shown in increasing order: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110. (a) Calculate mean, median, mode, midrange, standard deviation of the salary. (b) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
2. For the above values for salary (a) Give the five-number summary of the data. (b) Show a boxplot of the data.
3. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using q= 3.
4. Explain min-max and z-score normalization with an example.
5. Imagine that you need to analyze AllElectronics sales and customer data. You note that many tuples have no recorded value for several attributes such as customer income. How can you go about filling in the missing values for this attribute?
6. Price (in dollars) for a product are: 8, 4, 21, 15, 21, 24, 34, 25, 28. Apply binning methods for smoothing the Price data.
7. Write an overview of data transformation strategies.
8. Discuss issues to consider during data integration.
9. A database has five transactions. Let min_ sup= 60% and min_ conf= 80%.
TID                 Items Bought
T100                M, O, N, K, E, Y
T200                D, O, N, K, E, Y
T300                M, A, K, E
T400                M, U, C, K, Y
T500                C, O, O, K, I, E
Find all frequent itemsets using Apriori.
10. Define : (i) Support (ii) Confidence  (iii) Frequent Itemset
11. State and explain the steps of apriori algorithm.
12. For the following transaction database, Generate Association rules from frequent itemset.(take support=2)
TID                 List of item IDs
T100                I1, I2, I5
T200                I2, I4
T300                I2, I3
T400                I1, I2, I4
T500                I1, I3
T600                I2, I3
T700                I1, I3
T800                I1, I2, I3, I5
T900                I1, I2, I3
13. How can we improve the efficiency of Apriori-based mining?
14. Create an FP-tree for the above transaction database (Q.12).
15. What is classification? Enlist applications of classification.
16. Write short note on: Pattern evaluation methods.


                                                                                                          Faculty Incharge: Hashmi S A