@@ -165,6 +165,17 @@ Train Set (6 Features) | Dev Set (6 Features)
#### We also tried plotting the decision boundary of features against each other, but due to the categorical nature of our data, the **datapoints mainly overlap** and it isn't very illustrative:
This isn't great either, so we'll also tune the parameters<br>
We tuned the parameter of max_depth by plotting the ROC (so the **true positive rate** against the **false positive rate**) and the accuracy of different max_depths. We tuned on the (subdivided) training dataset exclusively using sklearns GridSearchCV evaluation. <br>
@@ -207,6 +220,15 @@ Train Set (6 Features) | Dev Set (6 Features)
Unfortunately, the SVM confusion matrix doesn't stray far from previous confusion matrices. Due to using the RBF kernel, it is not possible to have a feature importance plot. Therefore let's skip to the learning curve:
@@ -308,9 +367,14 @@ Our data has immense differences in its data distribution these methods can help
## 7. Conclusion :space_invader:
(actual things to say about methods, algorithms, etc.)
Our results were always better than our baselines and each approach had an accuracy around **0.57**.
1. Random Forest: 0.579203
2. NB and SVM: 0.577259
3. Decision Tree: 0.575316
Most surprisingly the results were close together and better than expected while admittedly far from great. Fine tuning helped to improve them even just for a bit. **Undersampling** and **SMOTE** helped to make the data distribution more equal but didn't help improving the accuracy, in fact both proved worse.<br>
The results are overall quite disappointing. So what are be possible reasons, is it the data, the algorithms or was our assumption incorrect? The algorithms are fairly standard but work very well for classification tasks. The data is fine as well considering the circumstances, after all archaeological data isn't as orderd and complete as other data. So what about our assumption? Is there a connection between used materials and the preservation of a motif? Maybe. It's not an easy assumption to prove or to deny. With our results in mind though it points to be rather unrelated.
So what could be possible reasons for the rather mediocre results. Is it the data, the algorithms or was our assumption incorrect? The algorithms are fairly standard but work very well for classification tasks and as seen above they're almost identical. The data is fine as well considering the circumstances, after all archaeological data isn't as orderd and complete as other data. So what about our assumption? Is there a connection between used materials and the preservation of a motif? Maybe. It's not an easy assumption to prove or to deny. With our results in mind though it points to be rather unrelated or circumstantial.