Update project/README.md (a74f2044) · Commits · chrysanthopoulou / exp-ml-2

project/README.md

+14 −7

Original line number	Diff line number	Diff line
		@@ -412,30 +412,37 @@ Our data has immense differences in its data distribution these methods can help

		## 7. Conclusion :space_invader:

		Our results were always better than our baselines and each approach had an accuracy around 0.57.
		1. Random Forest: 0.579203
		2. NB and SVM: 0.577259
		3. Decision Tree: 0.575316

		<br>

		Our feature importance showed interesting results:
		- Decision Tree: 1. Ritzlinienform 2. Farbverteilung 3. Körnung der Magerung
		- Random Forest: 1. Ritzlinienform 2. Farbverteilung und Menge der Magerung
		- Naive Bayes: 1. Ritzlinienform 2. Menge der Magerung
		- SVM: 1. Ritzlinienform 2. Menge der Magerung 3. Farbverteilung

		<br>

		For all three algorithms "Ritzlinienform" was most important which makes sense since their visibility is directly linked to the degree of preservation. Color distribution was second in Decision Tree and Random Forest and in both cases very low. The reason for this could be aesthetics, for example an already incised pottery would be of a higher value, not necessarily in a financial sense, and using color would increase this value. At the same time color is special and not always available, making its use rare. Also it is not mentioned what color was used, the possibilities range from black to reddish tones. This assumption is primarily based on a modern day understanding of aesthetics and needs to be taken with a grain of salt too. The Naive Bayes ignores this feature. The Decision Tree considered the grading important not far behind color distribution while the other two considered the tempering of higher importance, which was almost equal to color distribution in the Random Forest. The tempering could indicate a stronger material preferred for incision lines or these materials simply survived the burning process better. Here the most interesting question is why the Decision Tree considers grading more important than tempering.
		For all four algorithms "Ritzlinienform" was most important which makes sense since their visibility is directly linked to the degree of preservation. Color distribution was second in Decision Tree and Random Forest and in both cases very low. The reason for this could be aesthetics, for example an already incised pottery would be of a higher value, not necessarily in a financial sense, and using color would increase this value. At the same time color is special and not always available, making its use rare. Also it is not mentioned what color was used, the possibilities range from black to reddish tones. This assumption is primarily based on a modern day understanding of aesthetics and needs to be taken with a grain of salt too. The Naive Bayes ignores this feature. The Decision Tree considered the grading important not far behind color distribution while the other three considered the tempering of higher importance, which was almost equal to color distribution in the Random Forest. The tempering could indicate a stronger material preferred for incision lines or these materials simply survived the burning process better. Here the most interesting question is why the Decision Tree considers grading more important than tempering. Since the other three algorithms agree mostly on the features and scored best regarding accuracy (see below), it is to be assumed that "Ritzlinienform", "Menge der Magerung" and "Farbverteilung" were well chosen features while the others could be opted out for others to improve our classification.

		<br>

		Our results were always better than our baselines and each approach had an accuracy around 0.57.
		1. Random Forest: 0.579203
		2. NB and SVM: 0.577259
		3. Decision Tree: 0.575316
		Most surprisingly the results were close together and better than expected while admittedly far from great. Fine tuning helped to improve them even just for a bit. Undersampling and SMOTE helped to make the data distribution more equal but didn't help improving the accuracy, in fact both proved worse.

		<br>

		Most surprisingly the results were close together and better than expected while admittedly far from great. Fine tuning helped to improve them even just for a bit. Undersampling and SMOTE helped to make the data distribution more equal but didn't help improving the accuracy, in fact both proved worse.<br>

		So what could be possible reasons for the rather mediocre results. Is it the data, the algorithms or was our assumption incorrect? The algorithms are fairly standard but work very well for classification tasks and as seen above they're almost identical. The data is fine as well considering the circumstances, after all archaeological data isn't as orderd and complete as other data. So what about our assumption? Is there a connection between used materials and the preservation of a motif? Maybe. It's not an easy assumption to prove or to deny. With our results in mind though it points to be rather unrelated or circumstantial. Experimental Archaeology is used to falsify only, e.g. clay houses burn down easily. Well, experiments showed it takes quite a lot to burn them to the ground. Could results like ours also help falsify certain assumptions? Here it comes down to data again. If it were possible to 3D scan every object and use that data with standardized metrics this could potentially help. Just like the study that motivated us using "neutrally" generated data could impact archaeology in fundamental ways. Unfortunately, this won't happen in the near future, even coming up with a standardized metric system would prove a challenge. A powerful tool is only as good as its data.

		<br>

		There is always this ultimate boss encounter called deadline. :scream: If we had more time though what else could be done? For starters try different random seeds for train, dev and test sets maybe even try to manually optimize them. Try out different features especially considering the results from the feature importance. As usual there is always more to try and do. Hopefully there will be more opportunities to experiment to try out these ideas.

		<br>

		For future experiments data is going to be a more important factor. Everything depends on it and therefore falls with it. Cleaning data and preparing it even fine-tune it are things we want to do in future projects. Feature selection proved impactful as well, trying different variations can improve results and give an interesting insight. Maybe it helps to shift the focus or to review ones own assumptions about their influence. Also the capabilities of each algorithm need to be explored further, there are various parameters to be understood and not all made sense at the very beginning or are adequate for the intented project. Knowing how to approach a project using these powerful tools will prove very useful in the future! So Thanks, this was actually really fun. :smile: