A COMPARISON OF SIMILARITY METRICS FOR HIERARCHICAL CLUSTERING OF MEDICAL INTERVENTIONS
The primary aim of the study is to compare the performance of different similarity metrics for the clustering of medical interventions. A total of 458 MEDLINE articles was retrieved using the medical concepts extracted from 10 therapy questions. The UMLS semantic types and natural language processing techniques were collectively utilized to extract medical interventions from five different fields of the articles. The interventions were weighted using four weighting schemes. The similarity of interventions between two articles was calculated using 42 similarity/distance metrics. In terms of the mean difference between paired and unpaired similarities, the binary similarity metric, Yule, performed better than the other metrics. The “Titles + Full abstracts” or “Full abstracts” provide useful I and C elements for the separation of paired and unpaired interventions. A visual screening of histograms and boxplots revealed that the metrics, Yule and Yule2, gave a better separation of paired and unpaired interventions. Lastly, a comparison of the ROC curves revealed that Yule and Yule2 performed slightly better than cosine and correlation with higher sensitivity and specificity. The results of the present study confirm the main findings of the previous study.
medical intervention, PICO elements, question answering, similarity metric.