This folder contains the work for our final project of the EML Proseminar. The project is about classifying different types of fruit based on their images (:arrow_right: **multi-class image classification**)
:arrow_right: Given any picture, we want to make the correct prediction: *Is it a banana, an apple, a strawberry, ...?* :banana: :apple: :strawberry:
:construction: We will add the most important information from the project proposal to this README file.
## Data
@@ -64,8 +60,9 @@ Run the `fruit_dataset_splitter.py` script found [here](data_preprocessing/fruit
Your data is ready!
## Baseline Model Evaluation Results
## Metrics
## Baseline
### Overview
We have implemented two types of baseline models: Random and Majority. These are implemented both as custom models and using scikit-learn's `DummyClassifier`. Our dataset involves classifying one out of 30 classes, with a balanced dataset of about 26,500 data points.
@@ -110,14 +107,22 @@ The following table summarizes the performance of different baseline models on t
- Performance Lower Than Majority Baseline: This scenario is more alarming because the majority baseline is a very naive model. If a model performs worse than this, it might indicate that the model is doing worse than a naive guess of the most frequent class. This could be due to incorrect model architecture, data preprocessing errors, or other significant issues in the training pipeline.
## Experiments
| Classifier | Best Features | Best Parameters | Best Accuracy | Training Time |
**Poor results** for all experiments with a Naive Bayes classifier :thumbsdown:. The best results are achieved using the HSV + Sobel filters. The accuracy (0.178) though is still better than our baseline.
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
- for some classes, the diagonal is quite bright (e.g. apricots and passion fruits) :arrow_right: the classifier is quite good at predicting these classes
- but we also see that the classifier has a **strong bias** towards some classes (e.g. apricots, jostaberries and passion fruits and figs)
### Decision Tree
### Random Forest
**Feature Combinations:**
@@ -218,50 +251,21 @@ Results for RandomForestClassifier classifier on 100x100_standard images:
**Poor results** for all experiments with a Naive Bayes classifier :thumbsdown:. The best results are achieved using the HSV + Sobel filters. The accuracy (0.178) though is still better than our baseline.
| Resized | Features | Accuracy (Dev) | Best Parameters | Comments |
- for some classes, the diagonal is quite bright (e.g. apricots and passion fruits) :arrow_right: the classifier is quite good at predicting these classes
- but we also see that the classifier has a **strong bias** towards some classes (e.g. apricots, jostaberries and passion fruits and figs)
## :computer: Usage of the `classify_images.py` script
### CNN
### Set-Up
### Final results
Create a virtual environment and install the required packages:
Having tested different feature combinations, hyperparameters and picture sizes on the development set, we have found our optimal parameters for the final tests on the **test set**.