Delete unnecessary files and directories (5a42fa5b) · Commits · hillengass / SynDRA

grade_text_preprocessing.txt

deleted100644 → 0

+0 −0

Empty file deleted.

human_ctx.txt

deleted100644 → 0

+0 −44207

File deleted.

Preview size limit exceeded, changes collapsed.

human_hyp.txt

deleted100644 → 0

+0 −44207

File deleted.

Preview size limit exceeded, changes collapsed.

metrics/grade/GRADE/.gitignore

deleted100644 → 0

+0 −5

Original line number	Diff line number	Diff line
		output/
		*.pyc
		__pycache__
		tools/
		.vscode
		No newline at end of file

metrics/grade/GRADE/README.md

deleted100644 → 0

+0 −108

Original line number	Diff line number	Diff line
		# GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems


		This repository contains the source code for the following paper:


		[GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems](https://arxiv.org/abs/2010.03994)
		Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

		## Model Overview
		![GRADE](images/GRADE.png)

		## Prerequisites
		Create virtural environment (recommended):
		```
		conda create -n GRADE python=3.6
		source activate GRADE
		```
		Install the required packages:
		```
		pip install -r requirements.txt
		```

		Install Texar locally:
		```
		cd texar-pytorch
		pip install .
		```

		Note: Make sure that your environment has installed cuda 10.1.

		## Data Preparation
		GRADE is trained on the DailyDialog Dataset proposed by ([Li et al.,2017](https://arxiv.org/abs/1710.03957)).

		For convenience, we provide the [processed data](https://drive.google.com/file/d/1sj3Z_GZfYzrhmleWazA-QawhUEhlNmJd/view?usp=sharing) of DailyDialog. And you should also download it and unzip into the `data` directory. And you should also download [tools](https://drive.google.com/file/d/1CaRhHnO0YsQHOnJsmMUJuL4w9HXJZQYw/view?usp=sharing) and unzip it into the root directory of this repo.

		If you wanna prepare the training data from scratch, please follow the steps:
		1. Install [Lucene](https://lucene.apache.org/);
		2. Run the preprocessing script:
		```
		cd ./script
		bash preprocess_training_dataset.sh
		```


		## Training
		To train GRADE, please run the following script:
		```
		cd ./script
		bash train.sh
		```

		Note that the [checkpoint](https://drive.google.com/file/d/1v9o-fSohFDegicakrSEnKNcKliOqhYfH/view?usp=sharing) of our final GRADE is provided. You could download it and unzip into the root directory.

		## Evaluation
		We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the `evaluation` directory has the following file structure:
		```
		.
		└── evaluation
		└── eval_data
		\| └── DIALOG_DATASET_NAME
		\| └── DIALOG_MODEL_NAME
		\| └── human_ctx.txt
		\| └── human_hyp.txt
		└── human_score
		└── DIALOG_DATASET_NAME
		\| └── DIALOG_MODEL_NAME
		\| └── human_score.txt
		└── human_judgement.json
		```
		Note: the entire human judgement data we proposed for metric evaluation is in `human_judgement.json`.


		To evaluate GRADE, please run the following script:
		```
		cd ./script
		bash eval.sh
		```

		## Using GRADE
		To use GRADE on your own dialog dataset:
		1. Put the whole dataset (raw data) into `./preprocess/dataset`;
		2. Update the function load_dataset in `./preprocess/extract_keywords.py` for loading the dataset;
		3. Prepare the context-response data that you want to evaluate and convert it into the following format:
		```
		.
		└── evaluation
		└── eval_data
		└── YOUR_DIALOG_DATASET_NAME
		└── YOUR_DIALOG_MODEL_NAME
		├── human_ctx.txt
		└── human_hyp.txt
		```
		4. Run the following script to evaluate the context-response data with GRADE:
		```
		cd ./script
		bash inference.sh
		```
		5. Lastly, the scores given by GRADE can be found as below:
		```
		.
		└── evaluation
		└── infer_result
		└── YOUR_DIALOG_DATASET_NAME
		└── YOUR_DIALOG_MODEL_NAME
		├── non_reduced_results.json
		└── reduced_results.json
		```