JP Journal of Heat and Mass Transfer
Special Issue, Advances in ICT-Convergence, Pages 161 - 172
(September 2020) http://dx.doi.org/10.17654/HMSI20161 |
|
THE ENHANCED VERSION OF TF-IDF FEATURE VECTOR FOR MALWARE DETECTION
Myung-Jae Lim, So-Hee Jun, Won-Mo Gal and Young-Man Kwon
|
Abstract: In this paper, we proposed two enhanced versions of traditional TF-IDF methods. The first method considered the term frequency for the total documents. This used the method that enlarges the influence of feature vector with large frequency. The second method considered relative term frequency within the same document. This used scaled TF version that divided logarithm term frequency by max term frequency and related IDF version that uses logarithm. We verified them by three machine learning algorithms, multi-layer perceptron, decision tree and k-nearest neighbor classifier. The proposed methods showed better performance than existing methods. In addition, by considering the accuracy score and running time, we recommended the best combination between proposed methods and classifiers. |
Keywords and phrases: feature extraction, machine learning, malware detection, natural language processing, TF-IDF.
|
|
Number of Downloads: 227 | Number of Views: 456 |
|