updated HOWTO-ZENITH.md (e706431a) · Commits · kaiser / SWP Metaphors

HOWTO-ZENITH.md

+21 −12

Original line number	Diff line number	Diff line
		@@ -2,7 +2,7 @@

		## 1. Preparing enviroment for Zenith

		1. Our ensembling shell script requires a python3.6 virtual enviroment named zenithenv, placed in `/swp-metaphors/zenith/`.
		Our ensembling shell script requires a python3.6 virtual enviroment named zenithenv, placed in `/swp-metaphors/zenith/`.
		- If you are interested in running an ensemble model execute these commands in `/swp-metaphors/zenith/`
		```
		virtualenv -p python3.6 --system-site-packages zenithenv
		@@ -34,33 +34,42 @@ To prepare the data run `python data_preparation.py vua`. It creates the followi
		* `model.py` contains the model classes and defines the architecture for the corresponding model
		* `programm.sh` trains seven seeds with varying parameters for the corresponding model
		* `util.py` supportive utility class for the corresponding model containing all helper functions

		TODO: hab demo/ vergessen

		2.
		- For ensembling: Execute the shell script with the following command:
		`./programm.sh MODELNAME`
		- For a single seed model: Execute the `main_vua.py` script.

		Example:
		```
		python main_vua.py --epochs 7 --dropout1 0 --dropout2 0.1 --dropout3 0.5 --losslit 1.2 --lossmet 1.8
		```
		TODO: vielleicht parameter losslit/met erklären?

		Running the script for the first time creates several files that are reused by subsequent executions:

		Model(s) \| Files needed \|
		--- \| --- \|
		zenith-baseline \| `tokens.txt`, `metaphor.txt`, `pos.txt`, `biasup.txt`, `biasdown.txt`, `biasupdown.txt`, `corp.txt`, `topic.txt`, `verbnet.txt`, `wordnet.txt` \|
		zenith-concat \| all from illinimet-baseline plus `numberbatch.txt` \|
		zenith-nb-only \| `tokens.txt`, `metaphor.txt` plus `train_numberbatch_embeddings.txt` (obtained by downloading, unzipping and renaming [numberbatch-en-19.08.txt.gz](https://github.com/commonsense/conceptnet-numberbatch#downloads)) \|
		zenith-cn-features \| all from illinimet-baseline plus `cn-features.txt` (generated from [illinimet-cn-preparation.py](illinimet/scripts/illinimet-cn-preparation.py)) \|
		zenith-cn-features-only \| `tokens.txt`, `metaphor.txt` plus `cn-features.txt` (obtained by moving the files from the desired subfolder in the `illinimet/data/` directory of the external data download) \|
		File\| Directory \| Used by \| Content \|
		--- \| --- \| --- \| --- \|
		`embeddings_glove_vua.pkl` \| `../../data/` \| baseline, cn-features, concat, glove-only \| Python dictionary of GloVe vectors for words in the VUA corpus vocabulary. \|
		`embeddings_numberbatch_vua.pkl` \| `../../data/` \| concat, nb-only \| Python dictionary of [ConceptNet Numberbatch](https://github.com/commonsense/conceptnet-numberbatch) vectors for words in the VUA corpus vocabulary. \|
		`numberbatch_embeddings_dict.pkl` \| `../../data/` \| concat, nb-only, demo \| Python dictionary of all Numberbatch embeddings in [numberbatch-en-19.08.txt.gz](https://github.com/commonsense/conceptnet-numberbatch#downloads). \|
		`char_vocab.pkl` \| `../../data/vua/` \| baseline, cn-features, concat, nb-only, glove-only \| Python set of the vocabulary of characters in the VUA dataset. \|

		TODO: cn-features-only auch?

		3. After the training and testing is complete, the model is saved in `zenith/metaphor-detection/models/MODELNAME/` and VUA predictions in the shared task format can be found in `zenith/metaphor-detection/predictions/MODELNAME/`.

		## 4. Evaluating Zenith

		1. When using ensembling, run the majority_vote.py script in `/zenith/metaphor-detection/predictions/MODELNAME/`

		2. Run [automatic_evaluation.py](analysis/scripts/automatic_evaluation.py) with the following arguments:
		- `--pred_label_file`: the file created in the previous step.
		- `--gold_label_file`: the VUA test gold labels ([all_pos_tokens.csv](data/vua/test_gold_labels/all_pos_tokens.csv) for evaluation
		on VUA All-POS, [verb_tokens.csv](data/vua/test_gold_labels/verb_tokens.csv) for evaluation on VUA Verbs).

		3. Stats and scores will be printed to the console.

		The ensemble prediction files created in the first step can be found in `/zenith/prediction_ensemble/` for each of the models.
		No newline at end of file