push (02724dea) · Commits · wernicke / exp-ml-2

project/README.md

+23 −39

Original line number	Original line	Diff line number	Diff line
	@@ -8,44 +8,24 @@

	<!-- code_chunk_output -->		<!-- code_chunk_output -->

	- [__Prediction of alcohol consumption among teenagers__ 🍷🍻](#prediction-of-alcohol-consumption-among-teenagers-)		- [__Prediction of alcohol consumption among teenagers__ 🍷🍻](#__prediction-of-alcohol-consumption-among-teenagers__)
	- [__Why does the world need this project?__ ✨🔥💥](#why-does-the-world-need-this-project-)		- [__Table of contents__ <!-- omit in toc -->](#__table-of-contents__-omit-in-toc-)
	- [__What Data is used? What features are offered? How is the data distributed?__ 📑](#what-data-is-used-what-features-are-offered-how-is-the-data-distributed-)		- [__Why does the world need this project?__ ✨🔥💥](#__why-does-the-world-need-this-project__)
	- [__Amount of data points:__ ✨](#amount-of-data-points-)		- [__What Data is used? What features are offered? How is the data distributed?__ 📑](#__what-data-is-used-what-features-are-offered-how-is-the-data-distributed__)
	- [__Features__ 👩‍🎓👨‍🎓](#features-)		- [__Amount of data points:__ ✨](#__amount-of-data-points__)
	- [__More details about the features__: 👓](#more-details-about-the-features-)		- [__Features__ 👩‍🎓👨‍🎓](#__features__)
	- [__Modified data example with integer values:__ 👀](#modified-data-example-with-integer-values-)		- [__More details about the features__: 👓](#__more-details-about-the-features__)
	- [__Categories for Classification:__ 🍻](#categories-for-classification-)		- [__Modified data example with integer values:__ 👀](#__modified-data-example-with-integer-values__)
	- [__Imbalance of Dataset__ ⚖](#imbalance-of-dataset-)		- [__Categories for Classification:__ 🍻](#__categories-for-classification__)
	- [__Different groups of features:__ 🤹‍♂️](#different-groups-of-features-️)		- [__Imbalance of Dataset__ ⚖](#__imbalance-of-dataset__)
	- [__Data splits__ ✂](#data-splits-)		- [__Different groups of features:__ 🤹‍♂️](#__different-groups-of-features__-️)
	- [__Baselines__ ✏](#baselines-)		- [__Data splits__ ✂](#__data-splits__)
	- [__Baseline Algorithms__](#baseline-algorithms)		- [__Baselines__ ✏](#__baselines__)
	- [__Baseline Evaluation__ 🥉](#baseline-evaluation-)		- [__Baseline Algorithms__](#__baseline-algorithms__)
	- [__Classification Task__](#classification-task)		- [__Baseline Evaluation__ 🥉](#__baseline-evaluation__)
	- [__Decision Tree:__](#decision-tree)		- [__Classification Task__](#__classification-task__)
	- [__Feature Importance (DT)__](#feature-importance-dt)		- [__Decision Tree:__](#__decision-tree__)
	- [__Decision Boundary (DT)__](#decision-boundary-dt)		- [__Feature Importance (DT)__](#__feature-importance-dt__)
	- [__Validation Curve (DT)__](#validation-curve-dt)
	- [__Random Forest__](#random-forest)
	- [__Feature Importance (RF)__](#feature-importance-rf)
	- [__Decision Boundary (RF)__](#decision-boundary-rf)
	- [__Validation Curve (RF)__](#validation-curve-rf)
	- [__Naive Bayes__](#naive-bayes)
	- [__Feature Importance (NB)__](#feature-importance-nb)
	- [__Smoothing (NB)__](#smoothing-nb)
	- [__Support Vector Machines Kernel__](#support-vector-machines-kernel)
	- [__Decision Boundary (SVM)__](#decision-boundary-svm)
	- [__Multilayer Perceptron__](#multilayer-perceptron)
	- [__Decision Boundary (MLP)__](#decision-boundary-mlp)
	- [__Evaluation__](#evaluation)
	- [__Accuracies__](#accuracies)
	- [__Confusion Matrix__](#confusion-matrix)
	- [__Learning Curves__](#learning-curves)
	- [__Decision Tree and RandomForest__](#decision-tree-and-randomforest)
	- [__Support Vector Machine and Multilayer Perceptron__](#support-vector-machine-and-multilayer-perceptron)
	- [Authors 🧍‍♀️](#authors--️)
	- [Contact](#contact)

	<!-- /code_chunk_output -->		<!-- /code_chunk_output -->

	@@ -283,6 +263,9 @@ In order to get an idea about the performance of our baseline algorithms, that n
	\| Precision (macro) \| 0.22 \| 0.22 \| 0.2 \|		\| Precision (macro) \| 0.22 \| 0.22 \| 0.2 \|
	\| Precision (weighted) \| 0.28 \| 0.22 \| 1 \|		\| Precision (weighted) \| 0.28 \| 0.22 \| 1 \|

			Accuracy is highest for the majority baseline at 0.33, followed by the random baseline with a stratified draw. The random baseline with a draw from the uniform distribution performs worst. These results can be attributed to the unbalanced nature of the data, i.e. the fact that a large proportion of the students drink little or no alcohol.<br>
			However, if the choose the macro recall as he evaluation criterion, the majority baseline performs the worsts.

	<p><div align="center">		<p><div align="center">
	![eva_bl_barplot](/uploads/4afad6c0bf73e80b0b8c3543291bb097/eva_bl_barplot.png)		![eva_bl_barplot](/uploads/4afad6c0bf73e80b0b8c3543291bb097/eva_bl_barplot.png)
	</div></p>		</div></p>
	@@ -426,6 +409,7 @@ As our data is obviously not linearly separable we expected the __SVM Kernel__ a

	### __Accuracies__		### __Accuracies__


	### __Confusion Matrix__		### __Confusion Matrix__

	The highest accuracy have the SVMs with features most important.This can be explained, among other things, by the fact that SVMs require few features. Too many features usually have a negative effect on the value of the machine prediction.		The highest accuracy have the SVMs with features most important.This can be explained, among other things, by the fact that SVMs require few features. Too many features usually have a negative effect on the value of the machine prediction.