Add grade repository as subfolder (642e6f73) · Commits · hillengass / SynDRA

.gitignore

+6 −0

Original line number	Diff line number	Diff line
		@@ -8,6 +8,12 @@
		.local
		.npm

		# Ignore virtual environments
		grade_venv/

		# Ignore older python / other program versions
		python=3.6

		# Ignore models-folder except readme
		models/*
		!models/README.md

metrics/grade/GRADE/.gitignore

0 → 100644

+5 −0

Original line number	Diff line number	Diff line
		output/
		*.pyc
		__pycache__
		tools/
		.vscode
		No newline at end of file

metrics/grade/GRADE/README.md

0 → 100644

+108 −0

Original line number	Diff line number	Diff line
		# GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems


		This repository contains the source code for the following paper:


		[GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems](https://arxiv.org/abs/2010.03994)
		Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

		## Model Overview
		![GRADE](images/GRADE.png)

		## Prerequisites
		Create virtural environment (recommended):
		```
		conda create -n GRADE python=3.6
		source activate GRADE
		```
		Install the required packages:
		```
		pip install -r requirements.txt
		```

		Install Texar locally:
		```
		cd texar-pytorch
		pip install .
		```

		Note: Make sure that your environment has installed cuda 10.1.

		## Data Preparation
		GRADE is trained on the DailyDialog Dataset proposed by ([Li et al.,2017](https://arxiv.org/abs/1710.03957)).

		For convenience, we provide the [processed data](https://drive.google.com/file/d/1sj3Z_GZfYzrhmleWazA-QawhUEhlNmJd/view?usp=sharing) of DailyDialog. And you should also download it and unzip into the `data` directory. And you should also download [tools](https://drive.google.com/file/d/1CaRhHnO0YsQHOnJsmMUJuL4w9HXJZQYw/view?usp=sharing) and unzip it into the root directory of this repo.

		If you wanna prepare the training data from scratch, please follow the steps:
		1. Install [Lucene](https://lucene.apache.org/);
		2. Run the preprocessing script:
		```
		cd ./script
		bash preprocess_training_dataset.sh
		```


		## Training
		To train GRADE, please run the following script:
		```
		cd ./script
		bash train.sh
		```

		Note that the [checkpoint](https://drive.google.com/file/d/1v9o-fSohFDegicakrSEnKNcKliOqhYfH/view?usp=sharing) of our final GRADE is provided. You could download it and unzip into the root directory.

		## Evaluation
		We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the `evaluation` directory has the following file structure:
		```
		.
		└── evaluation
		└── eval_data
		\| └── DIALOG_DATASET_NAME
		\| └── DIALOG_MODEL_NAME
		\| └── human_ctx.txt
		\| └── human_hyp.txt
		└── human_score
		└── DIALOG_DATASET_NAME
		\| └── DIALOG_MODEL_NAME
		\| └── human_score.txt
		└── human_judgement.json
		```
		Note: the entire human judgement data we proposed for metric evaluation is in `human_judgement.json`.


		To evaluate GRADE, please run the following script:
		```
		cd ./script
		bash eval.sh
		```

		## Using GRADE
		To use GRADE on your own dialog dataset:
		1. Put the whole dataset (raw data) into `./preprocess/dataset`;
		2. Update the function load_dataset in `./preprocess/extract_keywords.py` for loading the dataset;
		3. Prepare the context-response data that you want to evaluate and convert it into the following format:
		```
		.
		└── evaluation
		└── eval_data
		└── YOUR_DIALOG_DATASET_NAME
		└── YOUR_DIALOG_MODEL_NAME
		├── human_ctx.txt
		└── human_hyp.txt
		```
		4. Run the following script to evaluate the context-response data with GRADE:
		```
		cd ./script
		bash inference.sh
		```
		5. Lastly, the scores given by GRADE can be found as below:
		```
		.
		└── evaluation
		└── infer_result
		└── YOUR_DIALOG_DATASET_NAME
		└── YOUR_DIALOG_MODEL_NAME
		├── non_reduced_results.json
		└── reduced_results.json
		```

metrics/grade/GRADE/config/config_data_for_metric.py

0 → 100644

+80 −0

Original line number	Diff line number	Diff line
		import copy
		init_embd_file = './tools/numberbatch-en-19.08.txt'
		pickle_data_dir = './data/convai2'
		max_keyword_length = 16
		max_seq_length = 128
		num_classes = 2
		num_test_data = 150

		vocab_file = './data/DailyDialog/keyword.vocab'
		train_batch_size = 8
		max_train_epoch = 20
		pretrained_epoch = -1
		display_steps = 50 # Print training loss every display_steps; -1 to disable


		eval_steps = 100 # Eval on the dev set every eval_steps; -1 to disable
		# Proportion of training to perform linear learning rate warmup for.
		# E.g., 0.1 = 10% of training.
		warmup_proportion = 0.1
		eval_batch_size = 32
		test_batch_size = 32


		feature_types = {
		# Reading features from pickled data file.
		# E.g., Reading feature "input_ids" as dtype `int64`;
		# "FixedLenFeature" indicates its length is fixed for all data instances;
		# and the sequence length is limited by `max_seq_length`.
		"input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],

		"input_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_context": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],

		"input_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_response": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
		}


		test_hparam = {
		"allow_smaller_final_batch": True,
		"batch_size": test_batch_size,
		"datasets": [
		{
		"files": "{}/test/pair-1/test_text.pkl".format(pickle_data_dir),
		'data_name': 'pair_1',
		'data_type': 'record',
		"feature_types": feature_types,
		},
		{
		"files": "{}/test/pair-1/original_dialog_merge.keyword".format(pickle_data_dir),
		'data_name': 'keyword_pair_1',
		'vocab_file': vocab_file,
		"embedding_init": {
		"file": init_embd_file,
		'dim':300,
		'read_fn':"load_glove"
		},
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/test/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir),
		'data_name': 'ctx_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/test/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir),
		'data_name': 'rep_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		}
		],
		"shuffle": False,
		}

metrics/grade/GRADE/config/config_data_grade.py

0 → 100644

+193 −0

Original line number	Diff line number	Diff line
		import copy
		max_train_bert_epoch = 5
		num_train_data = 118528
		pickle_data_dir = './data/DailyDialog'
		train_batch_size = 16
		init_embd_file = './tools/numberbatch-en-19.08.txt'
		max_keyword_length = 16
		max_seq_length = 128
		num_classes = 2

		vocab_file = '{}/keyword.vocab'.format(pickle_data_dir)
		display_steps = 1000 # Print training loss every display_steps; -1 to disable
		save_steps = -1
		eval_steps = 5000 # Eval on the dev set every eval_steps; -1 to disable
		# Proportion of training to perform linear learning rate warmup for.
		# E.g., 0.1 = 10% of training.
		warmup_proportion = 0.1
		eval_batch_size = 16
		test_batch_size = 16

		metric_pickle_data_dir = './data/DailyDialog/daily_metric'

		feature_types = {
		"input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],

		"input_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_context": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],

		"input_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_response": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
		}

		metricData_feature_types = {
		"input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
		"segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length]
		}



		train_hparam = {
		"allow_smaller_final_batch": False,
		"batch_size": train_batch_size,
		"datasets": [

		############################# pair-1 #############################
		{
		"files": "{}/train/pair-1/train_text.pkl".format(pickle_data_dir),
		'data_name': 'pair_1',
		'data_type': 'record',
		"feature_types": feature_types,
		},
		{
		"files": "{}/train/pair-1/original_dialog_merge.keyword".format(pickle_data_dir),
		'data_name': 'keyword_pair_1',
		'vocab_file': vocab_file,
		"embedding_init": {
		"file": init_embd_file,
		'dim':300,
		'read_fn':"load_glove"
		},
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/train/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir),
		'data_name': 'ctx_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/train/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir),
		'data_name': 'rep_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},


		############################# pair-2 #############################
		{
		"files": "{}/train/pair-2/train_text.pkl".format(pickle_data_dir),
		'data_name': 'pair_2',
		'data_type': 'record',
		"feature_types": feature_types,
		},
		{
		"files": "{}/train/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir),
		'data_name': 'keyword_pair_2',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/train/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir),
		'data_name': 'ctx_keyword_pair_2',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/train/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir),
		'data_name': 'rep_keyword_pair_2',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},


		{
		"files": "{}/train/gt_preference_label.pkl".format(pickle_data_dir),
		'data_type': 'record',
		"feature_types": {
		'gt_preference_label': ["int64", "stacked_tensor"]
		},
		}
		],
		"shuffle": True
		}


		eval_hparam = copy.deepcopy(train_hparam)
		eval_hparam['allow_smaller_final_batch'] = True
		eval_hparam['batch_size'] = eval_batch_size
		eval_hparam['shuffle'] = False
		eval_hparam['datasets'][0]['files'] = "{}/validation/pair-1/validation_text.pkl".format(pickle_data_dir)
		eval_hparam['datasets'][1]['files'] = "{}/validation/pair-1/original_dialog_merge.keyword".format(pickle_data_dir)
		eval_hparam['datasets'][2]['files'] = "{}/validation/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir)
		eval_hparam['datasets'][3]['files'] = "{}/validation/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir)
		eval_hparam['datasets'][4]['files'] = "{}/validation/pair-2/validation_text.pkl".format(pickle_data_dir)
		eval_hparam['datasets'][5]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir)
		eval_hparam['datasets'][6]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir)
		eval_hparam['datasets'][7]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir)
		eval_hparam['datasets'][8]['files'] = "{}/validation/gt_preference_label.pkl".format(pickle_data_dir)


		test_hparam = copy.deepcopy(train_hparam)
		test_hparam['allow_smaller_final_batch'] = True
		test_hparam['batch_size'] = test_batch_size
		test_hparam['shuffle'] = False
		test_hparam['datasets'][0]['files'] = "{}/test/pair-1/test_text.pkl".format(pickle_data_dir)
		test_hparam['datasets'][1]['files'] = "{}/test/pair-1/original_dialog_merge.keyword".format(pickle_data_dir)
		test_hparam['datasets'][2]['files'] = "{}/test/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir)
		test_hparam['datasets'][3]['files'] = "{}/test/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir)
		test_hparam['datasets'][4]['files'] = "{}/test/pair-2/test_text.pkl".format(pickle_data_dir)
		test_hparam['datasets'][5]['files'] = "{}/test/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir)
		test_hparam['datasets'][6]['files'] = "{}/test/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir)
		test_hparam['datasets'][7]['files'] = "{}/test/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir)
		test_hparam['datasets'][8]['files'] = "{}/test/gt_preference_label.pkl".format(pickle_data_dir)


		metric_hparam = {
		"allow_smaller_final_batch": True,
		"batch_size": test_batch_size,
		"datasets": [
		{
		"files": "{}/dialog.pkl".format(metric_pickle_data_dir),
		'data_name': 'metric',
		'data_type': 'record',
		"feature_types": metricData_feature_types,
		},
		{
		"files": "{}/dialog_merge.keyword".format(metric_pickle_data_dir),
		'data_name': 'keyword_pair_1',
		'vocab_file': vocab_file,
		"embedding_init": {
		"file": init_embd_file,
		'dim':300,
		'read_fn':"load_glove"
		},
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/dialog_merge.ctx_keyword".format(metric_pickle_data_dir),
		'data_name': 'ctx_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		{
		"files": "{}/dialog_merge.rep_keyword".format(metric_pickle_data_dir),
		'data_name': 'rep_keyword_pair_1',
		"vocab_share_with":1,
		"embedding_init_share_with":1,
		"max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
		},
		],
		"shuffle": False
		}
		No newline at end of file