Commit 642e6f73 authored by chrysanthopoulou's avatar chrysanthopoulou
Browse files

Add grade repository as subfolder

parent 4ef1d85f
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -8,6 +8,12 @@
.local
.npm

# Ignore virtual environments
grade_venv/

# Ignore older python / other program versions
python=3.6

# Ignore models-folder except readme
models/*
!models/README.md
+5 −0
Original line number Diff line number Diff line
output/
*.pyc
__pycache__
tools/
.vscode
 No newline at end of file
+108 −0
Original line number Diff line number Diff line
# **GRADE**: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems


This repository contains the source code for the following paper:


[GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems](https://arxiv.org/abs/2010.03994)   
Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

## Model Overview
![GRADE](images/GRADE.png)

## Prerequisites
Create virtural environment (recommended):
```
conda create -n GRADE python=3.6
source activate GRADE
```
Install the required packages:
```
pip install -r requirements.txt
```

Install Texar locally:
```
cd texar-pytorch
pip install .
```

Note: Make sure that your environment has installed **cuda 10.1**.

## Data Preparation
GRADE is trained on the DailyDialog Dataset proposed by ([Li et al.,2017](https://arxiv.org/abs/1710.03957)).

For convenience, we provide the [processed data](https://drive.google.com/file/d/1sj3Z_GZfYzrhmleWazA-QawhUEhlNmJd/view?usp=sharing) of DailyDialog. And you should also download it and unzip into the `data` directory. And you should also download [tools](https://drive.google.com/file/d/1CaRhHnO0YsQHOnJsmMUJuL4w9HXJZQYw/view?usp=sharing) and unzip it into the root directory of this repo.

If you wanna prepare the training data from scratch, please follow the steps:
1. Install [Lucene](https://lucene.apache.org/);
2. Run the preprocessing script:
```
cd ./script
bash preprocess_training_dataset.sh
```


## Training
To train GRADE, please run the following script:
```
cd ./script
bash train.sh
```

Note that the [checkpoint](https://drive.google.com/file/d/1v9o-fSohFDegicakrSEnKNcKliOqhYfH/view?usp=sharing) of our final GRADE is provided. You could download it and unzip into the root directory.

## Evaluation
We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the `evaluation` directory has the following file structure:
```
.
└── evaluation
    └── eval_data
    |   └── DIALOG_DATASET_NAME
    |       └── DIALOG_MODEL_NAME
    |           └── human_ctx.txt
    |           └── human_hyp.txt
    └── human_score
        └── DIALOG_DATASET_NAME
        |   └── DIALOG_MODEL_NAME
        |       └── human_score.txt
        └── human_judgement.json
```
Note: the entire human judgement data we proposed for metric evaluation is in `human_judgement.json`.


To evaluate GRADE, please run the following script:
```
cd ./script
bash eval.sh
```

## Using GRADE
To use GRADE on your own dialog dataset:
1. Put the whole dataset (raw data) into `./preprocess/dataset`;
2. Update the function **load_dataset**  in `./preprocess/extract_keywords.py` for loading the dataset;
3. Prepare the context-response data that you want to evaluate and convert it into the following format:
```
.
└── evaluation
    └── eval_data
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── human_ctx.txt
                └── human_hyp.txt
```
4. Run the following script to evaluate the context-response data with GRADE:
```
cd ./script
bash inference.sh
```
5. Lastly, the scores given by GRADE can be found as below:
```
.
└── evaluation
    └── infer_result
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── non_reduced_results.json
                └── reduced_results.json
```
+80 −0
Original line number Diff line number Diff line
import copy
init_embd_file = './tools/numberbatch-en-19.08.txt'
pickle_data_dir = './data/convai2'
max_keyword_length = 16
max_seq_length = 128
num_classes = 2
num_test_data = 150

vocab_file = './data/DailyDialog/keyword.vocab'
train_batch_size = 8
max_train_epoch = 20
pretrained_epoch = -1
display_steps = 50  # Print training loss every display_steps; -1 to disable


eval_steps = 100  # Eval on the dev set every eval_steps; -1 to disable
# Proportion of training to perform linear learning rate warmup for.
# E.g., 0.1 = 10% of training.
warmup_proportion = 0.1
eval_batch_size = 32
test_batch_size = 32


feature_types = {
    # Reading features from pickled data file.
    # E.g., Reading feature "input_ids" as dtype `int64`;
    # "FixedLenFeature" indicates its length is fixed for all data instances;
    # and the sequence length is limited by `max_seq_length`.
    "input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],

    "input_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_context": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],

    "input_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_response": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
}


test_hparam = {
    "allow_smaller_final_batch": True,
    "batch_size": test_batch_size,
    "datasets": [
        {
            "files": "{}/test/pair-1/test_text.pkl".format(pickle_data_dir),
            'data_name': 'pair_1',
            'data_type': 'record',
            "feature_types": feature_types,
        },
        {
            "files": "{}/test/pair-1/original_dialog_merge.keyword".format(pickle_data_dir),
            'data_name': 'keyword_pair_1',
            'vocab_file': vocab_file,
            "embedding_init": {
                "file": init_embd_file,
                'dim':300,
                'read_fn':"load_glove"
            },
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/test/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir),
            'data_name': 'ctx_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/test/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir),
            'data_name': 'rep_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        }
    ],
    "shuffle": False,
}
+193 −0
Original line number Diff line number Diff line
import copy
max_train_bert_epoch = 5
num_train_data = 118528
pickle_data_dir = './data/DailyDialog'
train_batch_size = 16
init_embd_file = './tools/numberbatch-en-19.08.txt'
max_keyword_length = 16 
max_seq_length = 128
num_classes = 2

vocab_file = '{}/keyword.vocab'.format(pickle_data_dir)
display_steps = 1000 # Print training loss every display_steps; -1 to disable
save_steps = -1
eval_steps = 5000 # Eval on the dev set every eval_steps; -1 to disable
# Proportion of training to perform linear learning rate warmup for.
# E.g., 0.1 = 10% of training.
warmup_proportion = 0.1
eval_batch_size = 16
test_batch_size = 16

metric_pickle_data_dir = './data/DailyDialog/daily_metric'

feature_types = {
    "input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],

    "input_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_context": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_context": ["int64", "stacked_tensor", max_seq_length],

    "input_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_response": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_response": ["int64", "stacked_tensor", max_seq_length],
}

metricData_feature_types = {
    "input_ids_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "input_mask_raw_text": ["int64", "stacked_tensor", max_seq_length],
    "segment_ids_raw_text": ["int64", "stacked_tensor", max_seq_length]
}



train_hparam = {
    "allow_smaller_final_batch": False,
    "batch_size": train_batch_size,
    "datasets": [

        ############################# pair-1 #############################
        {
            "files": "{}/train/pair-1/train_text.pkl".format(pickle_data_dir),
            'data_name': 'pair_1',
            'data_type': 'record',
            "feature_types": feature_types,
        },
        {
            "files": "{}/train/pair-1/original_dialog_merge.keyword".format(pickle_data_dir),
            'data_name': 'keyword_pair_1',
            'vocab_file': vocab_file,
            "embedding_init": {
                "file": init_embd_file,
                'dim':300,
                'read_fn':"load_glove"
            },
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/train/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir),
            'data_name': 'ctx_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/train/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir),
            'data_name': 'rep_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },


        ############################# pair-2 #############################
        {
            "files": "{}/train/pair-2/train_text.pkl".format(pickle_data_dir),
            'data_name': 'pair_2',
            'data_type': 'record',
            "feature_types": feature_types,
        },
        {
            "files": "{}/train/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir),
            'data_name': 'keyword_pair_2',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/train/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir),
            'data_name': 'ctx_keyword_pair_2',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/train/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir),
            'data_name': 'rep_keyword_pair_2',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },


        {
            "files": "{}/train/gt_preference_label.pkl".format(pickle_data_dir),
            'data_type': 'record',
            "feature_types": {
                'gt_preference_label': ["int64", "stacked_tensor"]
            },
        }
    ],
    "shuffle": True
}


eval_hparam = copy.deepcopy(train_hparam)
eval_hparam['allow_smaller_final_batch'] = True
eval_hparam['batch_size'] = eval_batch_size
eval_hparam['shuffle'] = False
eval_hparam['datasets'][0]['files'] = "{}/validation/pair-1/validation_text.pkl".format(pickle_data_dir)
eval_hparam['datasets'][1]['files'] = "{}/validation/pair-1/original_dialog_merge.keyword".format(pickle_data_dir)
eval_hparam['datasets'][2]['files'] = "{}/validation/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir)
eval_hparam['datasets'][3]['files'] = "{}/validation/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir)
eval_hparam['datasets'][4]['files'] = "{}/validation/pair-2/validation_text.pkl".format(pickle_data_dir)
eval_hparam['datasets'][5]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir)
eval_hparam['datasets'][6]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir)
eval_hparam['datasets'][7]['files'] = "{}/validation/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir)
eval_hparam['datasets'][8]['files'] = "{}/validation/gt_preference_label.pkl".format(pickle_data_dir)


test_hparam = copy.deepcopy(train_hparam)
test_hparam['allow_smaller_final_batch'] = True
test_hparam['batch_size'] = test_batch_size
test_hparam['shuffle'] = False
test_hparam['datasets'][0]['files'] = "{}/test/pair-1/test_text.pkl".format(pickle_data_dir)
test_hparam['datasets'][1]['files'] = "{}/test/pair-1/original_dialog_merge.keyword".format(pickle_data_dir)
test_hparam['datasets'][2]['files'] = "{}/test/pair-1/original_dialog_merge.ctx_keyword".format(pickle_data_dir)
test_hparam['datasets'][3]['files'] = "{}/test/pair-1/original_dialog_merge.rep_keyword".format(pickle_data_dir)
test_hparam['datasets'][4]['files'] = "{}/test/pair-2/test_text.pkl".format(pickle_data_dir)
test_hparam['datasets'][5]['files'] = "{}/test/pair-2/perturbed_dialog_merge.keyword".format(pickle_data_dir)
test_hparam['datasets'][6]['files'] = "{}/test/pair-2/perturbed_dialog_merge.ctx_keyword".format(pickle_data_dir)
test_hparam['datasets'][7]['files'] = "{}/test/pair-2/perturbed_dialog_merge.rep_keyword".format(pickle_data_dir)
test_hparam['datasets'][8]['files'] = "{}/test/gt_preference_label.pkl".format(pickle_data_dir)


metric_hparam = {
    "allow_smaller_final_batch": True,
    "batch_size": test_batch_size,
    "datasets": [
        {
            "files": "{}/dialog.pkl".format(metric_pickle_data_dir),
            'data_name': 'metric',
            'data_type': 'record',
            "feature_types": metricData_feature_types,
        },
        {
            "files": "{}/dialog_merge.keyword".format(metric_pickle_data_dir),
            'data_name': 'keyword_pair_1',
            'vocab_file': vocab_file,
            "embedding_init": {
                "file": init_embd_file,
                'dim':300,
                'read_fn':"load_glove"
            },
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/dialog_merge.ctx_keyword".format(metric_pickle_data_dir),
            'data_name': 'ctx_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
        {
            "files": "{}/dialog_merge.rep_keyword".format(metric_pickle_data_dir),
            'data_name': 'rep_keyword_pair_1',
            "vocab_share_with":1,
            "embedding_init_share_with":1,
            "max_seq_length": max_keyword_length, #The length does not include any added "bos_token" or "eos_token"
        },
    ],
    "shuffle": False
}
 No newline at end of file
Loading