Conceptually, the mysteryLinkage distance metric takes a vote for each data point and decides which cluster to assign the point to based on the majority vote.
查看答案
Now you run the hierarchical clustering algorithm from Problem Set 6 with the mysteryLinkage distance metric and look at the results at a cutoff of 3 clusters. The final clusters will be:
A. C0:a ||| C1:b,c,d,e,g ||| C2:f
B. C0:a,f |||C1:b,c,d ||| C2:e,g
C0:a,f,e,g ||| C1:b,c ||| C2:d,e
D. C0:a,e,g ||| C1:b,c,d ||| C2:f
Remember that in Problem Set 6, we used different linkage distance measures to calculate the distances between clusters and decide which cluster a point should belong to. Consider this new method of finding linkage distances, which makes use of the linkage distance methods from the problem set:You are given the following data points with the following feature values:Answer the following 3 questions based on the above code. You are asked to run the hierarchical clustering algorithm from Problem Set 6 with the singleLinkage, maxLinkage, averageLinkage, and mysteryLinkage distance metrics and asked to report the results at a cutoff of 4 clusters. The final clusters will be the same, no matter which linkage we use.
A. True
B. False
Your boss comes back one last time with new information. He can now tell you the topic of each document. However, he found some more documents for which the topic is still unknown. Given this information, can we use a supervised learning algorithm to classify the new documents?
A. Yes
B. No
Your boss comes back with a list of 60 specific keywords as well as 5 specific topics that each keyword is best associated with. Which of the following is true, given this additional information?
A. We can switch to a supervised learning algorithm.
B. We can use the k-means clustering algorithm with k = 60
C. We can use the k-means clustering algorithm with k = 5