Commit 391b6af7 authored by friebolin's avatar friebolin
Browse files

Finish references

parent ddef8b4d
Loading
Loading
Loading
Loading
+27 −34
Original line number Diff line number Diff line
@@ -24,15 +24,15 @@ Members of the project:
## 📚 Project documents <a name="documents"></a>
This README gives a rough overview of the project. The full documentation and additional information can be found in the documents listed below.

- 📝 [Research Plan](Organization/research_plan.pdf)
- 🧭 [Specification Presentation](LINK)
- 📖 [Project Report](LINK)
- 🎤 [Final Presentation](LINK)
- 📝 [Research Plan](documentation/organization/research_plan.pdf)
- 🧭 [Specification Presentation](documentation/organization/specification_presentation.pdf)
- 📖 [Project Report](LINK) ---------> ADD 
- 🎤 [Final Presentation](LINK) ---------> ADD 

***

## 🔎 Metonymy Resolution <a name="metonymy"></a>
A metonymy is the replacement of the actual expression by another one that is closely related to the first one. 
A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4]. 

Metonymies use a contiguity relation between two domains.

@@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains.

**Metonymy resolution** is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations. 

ℹ️ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
ℹ️ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term *Nigeria* prompts both a metonymic and a literal reading.
    
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007]
- "They arrived in **Nigeria**, hitherto a leading critic of [...]" [^8]

➡️ Hence, we use the two classes `non-literal` and `literal` for this binary classification task.
➡️ Hence, we use the two classes `non-literal` and `literal` for our binary classification task.

***

@@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a
***

## 💡 Methods <a name="methods"></a>
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g.
case changing of single characters or embedding replacements [ZITAT einfügen?]), we chose more innovative and challenging methods.
When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods.

To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.

### 📝 1. Backtranslation (Data Space)<a name="backtranslation"></a>
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models :
As a comparatively safe (= label preserving) data augmentation strategy, we selected *backtranslation* using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :

    - [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
    - [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
@@ -108,7 +107,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele


### 🍸 2. MixUp (Feature Space)<a name="mixup"></a>
Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]).
Our method adopts the framework of the *MixUp* transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`).

To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples:

@@ -120,22 +119,18 @@ $$\hat{x} = \lambda T(x_i) + (1- \lambda)T(x_j)$$

$$\hat{y} = \lambda T(y_i) + (1- \lambda)T(y_j)$$

Here, $T(x_i)$ and $T(x_j)$ 
represent the hidden representations of the two instances, $T(y_i)$ 
and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation.
We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The MixUp process can be used dynamically during training at any epoch.
Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation.

We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
The *MixUp* process can be used dynamically during training at any epoch.

***

## 🗃️ Data <a name="data"></a>
The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three:

- **SemEval:** [ZITAT: Markert and Nissim, 2007 ]
The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three:

| 1. **SemEval: Locations** | 2. **SemEval: Companies & Organizations** | 3. **ReLocar: Locations** |
| 1. **SemEval: Locations** [^8]| 2. **SemEval: Companies & Organizations** [^8]| 3. **ReLocar: Locations** [^5]|
| ---------------- | -------------------------------- | -----------------------------------------|
|[ZITAT: Markert and Nissim, 2007 ]| [ZITAT: Markert and Nissim, 2007 ]|[ZITAT: Gritta et al., 2017 ]|
| <img src="documentation/images/semeval_loc_metonym_ratio.png"> |  <img src="documentation/images/semeval_org_metonym_ratio.png"> | <img src="documentation/images/relocar_metonym_ratio.png">
    
🖊️ **Data Point Example:** 
@@ -212,24 +207,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w

## 📑 References <a name="references"></a>

[^1]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.

[^2]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. 


[^3]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
[^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.

[^4]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.

[^5]: Longpre, Shayne, Wang, Yu & DuBois, Chris. ["How effective is task-agnostic data augmentation for pretrained transformers?"](https://aclanthology.org/) Findings of the Association for Computational Linguistics: EMNLP 2020, 2020.
[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018. 

[^4]: English Oxford Dictionary. ["Metonymy"](https://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0252.xml)

[^6]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
[^5]: Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017.

[^6]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.

[^7]:  Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
[^7]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.

[^8]: Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394.
[^8]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.

[^9]:  Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.

[^9]: Zhang, Hongyi, Cissé, Moustapha, Dauphin, Yann N. & Lopez-Paz, David. mixup: Beyond empirical risk minimization. *CoRR*, 2017.
[^10]: Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S. & He, Lifang. ["Mixup-transformer: dynamic data augmentation for NLP tasks."](https://arxiv.org/abs/2010.02394) 2020.