Minireviews
Copyright ©The Author(s) 2021.
Artif Intell Gastrointest Endosc. Aug 28, 2021; 2(4): 127-135
Published online Aug 28, 2021. doi: 10.37126/aige.v2.i4.127
Table 3 Best machine learning algorithms for classification[36]
Algorithm
Pros
Cons
Naïve Bayes ClassifierSimple, easy and fast. Not sensitive to irrelevant features. Works great in practice. Needs less training data. For both multi-class and binary classification. Works with continuous and discrete dataAccepts every feature as independent. This is not always the truth
Decision TreesEasy to understand. Easy to generate rules. There are almost no hyperparameters to be tuned. Complex decision tree models can be significantly simplified by its visualizationsMight suffer from overfitting. Does not easily work with nonnumerical data. Low prediction accuracy for a dataset in comparison with other algorithms. When there are many class labels, calculations can be complex
Support Vector MachinesFast algorithm. Effective in high dimensional spaces. Great accuracy. Power and flexibility from kernels. Works very well with a clear margin of separation. Many applicationsDoes not perform well with large data sets. Not so simple to program. Does not perform so well when the data comes with more noise i.e. target classes are overlapping
Random Forest ClassifierThe overfitting problem does not exist. Can be used for feature engineering i.e. for identifying the most important features among all available features in the training dataset. Runs very well on large databases. Extremely flexible and have very high accuracy. No need for preparation of the input dataComplexity. Requires a lot of computational resources. Time-consuming. Need to choose the number of trees
KNN AlgorithmSimple to understand and easy to implement. Zero to little training time. Works easily with multi-class data sets. Has good predictive power. Does well in practiceComputationally expensive testing phase. Can have skewed class distributions. The accuracy can be decreased when it comes to high-dimension data. Needs to define a value for the parameter k