Keywords and phrases: term weighting, multinomial Naive Bayes, support vector machine, text analysis, research thesis.
Received: April 7, 2022; Accepted: May 26, 2022; Published: June 27, 2022
How to cite this article: Afra Al Manei, Iman Al Hasani and Ronald Wesonga, Investigating term weighting schemes on the classification performance for the imbalanced text data, Advances and Applications in Statistics 78 (2022), 63-82. http://dx.doi.org/10.17654/0972361722050
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References:
[1] S. M. Alzanin, A. M. Azmi and H. A. Aboalsamh, Short text classification for Arabic social media tweets, Journal of King Saud University - Computer and Information Sciences 2022 (in press). URL: https://www.sciencedirect.com/science/article/pii/S1319157822001045, doi: https://doi.org/10.1016/j.jksuci.2022.03.020. [2] W. G. Cochran, Sampling Techniques, John Wiley & Sons, 2007. [3] F. Debole and F. Sebastiani, Supervised term weighting for automated text categorization, Text Mining and its Applications, Springer, 2004, pp. 81-97. [4] G. Domeniconi, G. Moro, R. Pasolini and C. Sartori, A comparison of term weighting schemes for text classification and sentiment analysis with a supervised variant of tf.idf, International Conference on Data Management Technologies and Applications, 2015, pp. 39-58. [5] S. Dumais, J. Platt, D. Heckerman and M. Sahami, Inductive learning algorithms and representations for text categorization, Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998, pp. 148-155. [6] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Volume 112, Springer, 2013. [7] T. Joachims, Text categorization with support vector machines: learning with many relevant features, European Conference on Machine Learning, 1998, pp. 137-142. [8] K. S. Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of Documentation 28(1) (1972), 11-21. [9] J. J. Jung, Exploiting geotagged resources for spatial clustering on social network services, Concurrency and Computation: Practice and Experience 28 (2016), 1356-1367. [10] S. Kannan and V. Gurusamy, Preprocessing techniques for text mining, International Journal of Computer Science & Communication Networks 5 (2014), 7-16. [11] Y. Ko, A study of term weighting schemes using class information for text classification, Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2012, pp. 1029-1030. [12] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes and D. Brown, Text classification algorithms: a survey, Information 10 (2019), 150. [13] M. Lan, S.-Y. Sung, H.-B. Low and C.-L. Tan, A comparative study on term weighting schemes for text categorization, Proceedings of 2005 IEEE International Joint Conference on Neural Networks, IEEE, Volume 1, 2005, pp. 546-551. [14] M. Lan, C. L. Tan, J. Su and Y. Lu, Supervised and traditional term weighting methods for automatic text categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (2008), 721-735. [15] J. Lever, M. Krzywinski and N. Altman, Erratum: Corrigendum: Classification evaluation, Nature Methods 13(10) (2016), 890-890. [16] C. D. Manning, P. Raghavan and H. Schütze, Naïve Bayes text classification, Introduction to Information Retrieval, Cambridge University Press, 2008, pp. 234-265. [17] A. Mazyad, F. Teytaud and C. Fonlupt, A comparative study on term weighting schemes for text classification, International Workshop on Machine Learning, Optimization and Big Data, Springer, 2017, pp. 100-108. [18] G. Miner, J. Elder IV, A. Fast, T. Hill, R. Nisbet and D. Delen, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, Academic Press, 2012. [19] B. Naderalvojoud and E. Akcapinar Sezer, Term evaluation metrics in imbalanced text categorization, Natural Language Engineering 26 (2020), 31-47. doi:10.1017/S1351324919000317. [20] T. Pranckevičius and V. Marcinkevičius, Comparison of Naive Bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification, Baltic Journal of Modern Computing 5 (2017), 221. [21] C. Robert, Machine learning, a probabilistic perspective, CHANCE 27(2) (2014), 62-63. [22] G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, New York, 1986. [23] D. Wang and H. Zhang, Inverse-category-frequency based supervised term weighting scheme for text categorization (2010). arXiv preprint arXiv:1012.2609. [24] Y. Yang and X. Liu, A re-examination of text categorization methods, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 42-49.
|