Fix merge conflict (c5e5aae4) · Commits · wernicke / exp-ml-2

01_übung/README.md

+31 −14

Original line number	Diff line number	Diff line
		# expml-2 01_übung {ignore=true}
		# expml-2 01_übung <!-- omit in toc -->

		## Table of contents {ignore=true}
		## Table of contents <!-- omit in toc -->
		<!-- @import "[TOC]" {cmd="toc" depthFrom=1 depthTo=6 orderedList=false} -->
		<!-- code_chunk_output -->
		- [expml-2 01_übung {ignore=true}](#expml-2-01_übung-ignoretrue)
		- [Goal](#goal)
		- [What is this special task about?](#what-is-this-special-task-about)
		- [It did't work. Why?](#it-didt-work-why)
		- [About this folder 🤓](#about-this-folder-)
		- [expml-2 01_übung <!-- omit in toc -->](#expml-2-01_übung-omit-in-toc-)
		- [Table of contents <!-- omit in toc -->](#table-of-contents-omit-in-toc-)
		- [About this folder 🤓](#about-this-folder)
		- [Structure](#structure)
		- [Goals 🏆](#goals)
		- [What is this special task about? 🤭](#what-is-this-special-task-about)
		- [It did't work. Why? 🤯](#it-didt-work-why)

		<!-- /code_chunk_output -->

		## Goal
		## About this folder 🤓

		### Structure
		This folder contains all our work for the first exercise.
		This includes the following materials:
		- exercise sheet (pdf)
		@@ -18,11 +24,22 @@ This includes the following materials:
		- slides from the lecture 2 (pdf)
		- README (md)

		## What is this special task about?
		We should try to push an empty folder as a change to gitlab. This of course didn't work.
		### Goals 🏆
		Exercise 2 has several educational purposes:
		- [X] getting to know the lecture's concept
		- [X] take the first steps in git 👶🚶‍♀️
		- [X] wondering whether our wifi, git or we are the problem
		- [X] creating our first repository 💥
		- [X] pushing code for the frist time 🎉🥳
		- [X] celebrating to have the best Zettelparterin ever 💁‍♀️



		## What is this special task about? 🤭
		We should try to push an empty folder as a change to gitlab. This of course didn't work. <br>
		The reason for this is written down below.

		## It did't work. Why?
		The folder was empty. Thus, there was nothing to commit.
		Due to this README the folder can now be commited.
		BOAM!!💥💥💥💥💥💥
		No newline at end of file
		## It did't work. Why? 🤯
		The folder was empty. Thus, there was nothing to commit.<br>
		Due to this README the folder can now be commited.<br>
		BOOOM!! 💥💥💥💥💥💥
		No newline at end of file

02_übung/README.md

+58 −28

Original line number	Diff line number	Diff line
		# expml-2 02_übung {ignore=true}
		# expml-2 02_übung <!-- omit in toc -->

		## Table of contents {ignore=true}
		## Table of contents <!-- omit in toc -->
		<!-- @import "[TOC]" {cmd="toc" depthFrom=1 depthTo=6 orderedList=false} -->

		<!-- code_chunk_output -->

		- [About this folder 🤓](#about-this-folder)
		- [About this folder 🤓](#about-this-folder-)
		- [Structure](#structure)
		- [Goals 🏆](#goals)
		- [Example code: classifier 🤯](#example-code-classifier)
		- [What is the example code about? 🔍](#what-is-the-example-code-about)
		- [Run time analysis ⏳](#run-time-analysis)
		- [Goals 🏆](#goals-)
		- [Example code: classifier 🤯](#example-code-classifier-)
		- [What is the example code about? 🔍](#what-is-the-example-code-about-)
		- [Run time analysis ⏳](#run-time-analysis-)
		- [Time consumption](#time-consumption)
		- [Hardware facts 💻](#hardware-facts)
		- [Why are there different time values?](#why-are-there-different-time-values)
		- [Hardware facts 💻](#hardware-facts-)
		- [Why are there different time values? 📈](#why-are-there-different-time-values-)
		- [What does that mean for us and our knowledge about testing of machine learning algorithms?](#what-does-that-mean-for-us-and-our-knowledge-about-testing-of-machine-learning-algorithms)
		- [The datasets 📊](#the-datasets)
		- [The datasets 📊](#the-datasets-)
		- [Dataset A](#dataset-a)
		- [Dataset B](#dataset-b)
		- [Dataset C](#dataset-c)
		- [The classifiers 📶](#the-classifiers)
		- [Linear separability - What is it about?](#linear-separability-emwhat-is-it-aboutem)
		- [The classifiers 📶](#the-classifiers-)
		- [Linear separability - <em>What is it about?</em>](#linear-separability---what-is-it-about)

		<!-- /code_chunk_output -->

		@@ -39,12 +39,12 @@ This includes the following materials:

		### Goals 🏆
		Exercise 2 has several educational purposes:
		- [X] get familiar with git
		- [X] practice your code reading ability
		- [X] learn the basic structure of a typical machine learning workflow
		- [X] measure run times and make sense of their differences and variation
		- [X] describe and compare datasets as well as classifiers
		- [ ] find the perfect project topic
		- [X] get familiar with git
		- [X] practice our code reading ability
		- [X] learn the basic structure of a typical machine learning workflow
		- [X] measure run times and make sense of their differences and variation
		- [X] describe and compare datasets as well as classifiers
		- [X] find the perfect project topic 🍻

		## Example code: classifier 🤯
		### What is the example code about? 🔍
		@@ -96,10 +96,10 @@ The difference between these time values is not big and the average is 3.35. To
		\| RAM \| 8 GB \| 32.0 GB (31.6 GB usable) \|
		\| Device type \| Harddrive (HDD) \| Solid state device (SSD) \|

		#### Why are there different time values?
		As you can see above the amount of time differs with every execution of code. We first believed that the reason
		for that was the use of random parameters right at the start. That is not the case because there is a seed that is set for the random initialization. The seed is the reason, why the data is not random every time but is generated the same way in every run (pseudo random numbers).
		The real reason for the difference in time is (drumroll) the CPU and all the other tasks that the computer runs in the background while we execute the code. The more background processes, the longer it might take to process everything. Nobody can say in advance how long it will take.
		#### Why are there different time values? 📈
		As you can see above the amount of time differs with every execution of code. 🪀 <br>
		We first believed that the reason for that was the use of random parameters right at the start. That is not the case because there is a seed that is set for the random initialization. The seed is the reason, why the data is not random every time but is generated the same way in every run (pseudo random numbers). 🔢
		The real reason for the difference in time is (🥁drumroll🥁) the CPU and all the other tasks that the computer runs in the background while we execute the code. The more background processes, the longer it might take to process everything. Nobody can say in advance how long it will take. 🤯

		#### What does that mean for us and our knowledge about testing of machine learning algorithms?
		- First lesson: code takes a lot longer to run if you don't have the most modern hardware.
		@@ -114,13 +114,43 @@ Dataset B has the shape of an inner and an outer circle.
		In C, the data can be split in the left and the right side.

		### The classifiers 📶
		- Nearest Neighbors: Dataset 1: two crescents interlocked with each other; dataset 2: shape of an inner and an outer circle, circles are not completely closed; dataset 3: split into left and right side; many corners
		- Nearest Neighbors:
		- Dataset 1:
		- Two crescents interlocked with each other 🥐🌙
		- Dataset 2:
		- Shape of an inner and an outer circle ⭕
		- Circles are not completely closed 🔓
		- Dataset 3:
		- ⬅️ split into left and right side ➡️
		- Many corners 🏪
		- Linear SVM: All decision boundaries are linear (straight lines without edges)
		- Kernel SVM: Dataset 1: two crescents interlocked with each other; dataset 2: shape of an inner and an outer circle, circles are not completely closed; dataset 3: split into left and right oval; all shapes are smoothed no corners
		- Decision Tree: Only consists of squares; dichotomy of categories (yes or no)
		- Random Forest: Only consists of squares; more than two categories / colors
		- Neural Net: Dataset 1: curve; dataset 2: circle, dataset 3: line; with 7 categories each; each line is nearly equally wight at every point
		- Naive Bayes: Dataset 1: curve; dataset 2: circle, dataset 3: line; with 7 categories each; all corners are smoothed
		- Kernel SVM:
		- Dataset 1:
		- Two crescents interlocked with each other 🥐🌙
		- Dataset 2:
		- Shape of an inner and an outer circle ⭕
		- Circles are not completely closed 🔓
		- Dataset 3:
		- 💫split into left and right oval 💫
		- All shapes are smoothed no corners
		- Decision Tree:
		- Only consists of squares ⏹️
		- Dichotomy of categories (yes👍 or no👎)
		- Random Forest:
		- Only consists of squares ⏹️
		- More than two categories / colors 🌈
		- Neural Net:
		- Dataset 1: curve ⤵️
		- Dataset 2: circle ⭕
		- Dataset 3: line 📏
		- with 7 categories each
		- each line is nearly equally wide at every point
		- Naive Bayes:
		- Dataset 1: curve ⤵️
		- Dataset 2: circle ⭕
		- Dataset 3: line 📏
		- with 7 categories each
		- all corners are smoothed 🌊

		### Linear separability - <em>What is it about?</em>
		<em>Two sets are linearly separable if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side.</em>[^1]