Commit e706431a authored by luetzel's avatar luetzel
Browse files

updated HOWTO-ZENITH.md

parent eef2239f
Loading
Loading
Loading
Loading
+21 −12
Original line number Diff line number Diff line
@@ -2,7 +2,7 @@

## 1. Preparing enviroment for Zenith

1. Our ensembling shell script requires a python3.6 virtual enviroment named zenithenv, placed in `/swp-metaphors/zenith/`. 
Our ensembling shell script requires a python3.6 virtual enviroment named zenithenv, placed in `/swp-metaphors/zenith/`. 
	- If you are interested in running an ensemble model execute these commands in `/swp-metaphors/zenith/`
      ```
      virtualenv -p python3.6 --system-site-packages zenithenv
@@ -34,33 +34,42 @@ To prepare the data run `python data_preparation.py vua`. It creates the followi
    * `model.py` contains the model classes and defines the architecture for the corresponding model
    * `programm.sh` trains seven seeds with varying parameters for the corresponding model
    * `util.py` supportive utility class for the corresponding model containing all helper functions
    
    **TODO**: hab demo/ vergessen 
   
2.
    - For ensembling: Execute the shell script with the following command:  
      `./programm.sh MODELNAME`
    - For a single seed model: Execute the `main_vua.py` script.

      Example: 
      ```
      python main_vua.py --epochs 7 --dropout1 0 --dropout2 0.1 --dropout3 0.5 --losslit 1.2 --lossmet 1.8
      ```
      **TODO**: vielleicht parameter losslit/met erklären?

      Running the script for the first time creates several files that are reused by subsequent executions:
      
    Model(s) | Files needed |
    --- | --- |
    **zenith-baseline** | `tokens.txt`, `metaphor.txt`, `pos.txt`, `biasup.txt`, `biasdown.txt`, `biasupdown.txt`, `corp.txt`, `topic.txt`, `verbnet.txt`, `wordnet.txt` |
    **zenith-concat** | all from illinimet-baseline plus `numberbatch.txt` |
    **zenith-nb-only** | `tokens.txt`, `metaphor.txt` plus `train_numberbatch_embeddings.txt` (obtained by downloading, unzipping and renaming [numberbatch-en-19.08.txt.gz](https://github.com/commonsense/conceptnet-numberbatch#downloads)) |
    **zenith-cn-features** | all from illinimet-baseline plus `cn-features.txt` (generated from [illinimet-cn-preparation.py](illinimet/scripts/illinimet-cn-preparation.py)) |
    **zenith-cn-features-only** | `tokens.txt`, `metaphor.txt` plus `cn-features.txt` (obtained by moving the files from the desired subfolder in  the `illinimet/data/` directory of the external data download) |
    File| Directory | Used by | Content |
    --- | --- | --- | --- |
    `embeddings_glove_vua.pkl` | `../../data/` | **baseline**, **cn-features**, **concat**, **glove-only** | Python dictionary of GloVe vectors for words in the VUA corpus vocabulary. |
    `embeddings_numberbatch_vua.pkl` | `../../data/` | **concat**, **nb-only** | Python dictionary of [ConceptNet Numberbatch](https://github.com/commonsense/conceptnet-numberbatch) vectors for words in the VUA corpus vocabulary. |
    `numberbatch_embeddings_dict.pkl` | `../../data/` | **concat**, **nb-only**, **demo** | Python dictionary of all Numberbatch embeddings in [numberbatch-en-19.08.txt.gz](https://github.com/commonsense/conceptnet-numberbatch#downloads). |
    `char_vocab.pkl` | `../../data/vua/` | **baseline**, **cn-features**, **concat**, **nb-only**, **glove-only** | Python set of the vocabulary of characters in the VUA dataset. |

    **TODO**: cn-features-only auch?
    
3. After the training and testing is complete, the model is saved in `zenith/metaphor-detection/models/MODELNAME/` and VUA predictions in the shared task format can be found in `zenith/metaphor-detection/predictions/MODELNAME/`.

## 4. Evaluating Zenith

1. When using ensembling, run the majority_vote.py script in `/zenith/metaphor-detection/predictions/MODELNAME/`

2. Run [automatic_evaluation.py](analysis/scripts/automatic_evaluation.py) with the following arguments:
    - `--pred_label_file`: the file created in the previous step.
    - `--gold_label_file`: the VUA test gold labels ([all_pos_tokens.csv](data/vua/test_gold_labels/all_pos_tokens.csv) for evaluation
      on VUA All-POS, [verb_tokens.csv](data/vua/test_gold_labels/verb_tokens.csv) for evaluation on VUA Verbs).

3. Stats and scores will be printed to the console.

The ensemble prediction files created in the first step can be found in `/zenith/prediction_ensemble/` for each of the models.
 No newline at end of file