Finish references (391b6af7) · Commits · friebolin / Data Augmentation for Metonymy Resolution

README.md

+27 −34

Original line number	Diff line number	Diff line
		@@ -24,15 +24,15 @@ Members of the project:
		## 📚 Project documents <a name="documents"></a>
		This README gives a rough overview of the project. The full documentation and additional information can be found in the documents listed below.

		- 📝 [Research Plan](Organization/research_plan.pdf)
		- 🧭 [Specification Presentation](LINK)
		- 📖 [Project Report](LINK)
		- 🎤 [Final Presentation](LINK)
		- 📝 [Research Plan](documentation/organization/research_plan.pdf)
		- 🧭 [Specification Presentation](documentation/organization/specification_presentation.pdf)
		- 📖 [Project Report](LINK) ---------> ADD
		- 🎤 [Final Presentation](LINK) ---------> ADD

		***

		## 🔎 Metonymy Resolution <a name="metonymy"></a>
		A metonymy is the replacement of the actual expression by another one that is closely related to the first one.
		A metonymy is the replacement of the actual expression by another one that is closely associated with it [^4].

		Metonymies use a contiguity relation between two domains.

		@@ -52,11 +52,11 @@ Metonymies use a contiguity relation between two domains.

		Metonymy resolution is about determining whether a potentially metonymic word is used metonymically in a particular context. In this project we focus on `metonymic` and `literal` readings for locations and organizations.

		ℹ️ Sentences that allow for mixed readings, where both a literal and metonymic sense is evoked, are considered `metonymic` in this project. This is true of the following sentence, in which the term Nigeria prompts both a metonymic and a literal reading.
		ℹ️ Sentences that allow for mixed readings, where a literal and metonymic sense is evoked, are considered `non-literal` in this project. This is true of the following sentence, in which the term Nigeria prompts both a metonymic and a literal reading.

		- "They arrived in Nigeria, hitherto a leading critic of [...]" [ZITAT: SemEval-2007 Task 08: Metonymy Resolution at SemEval-2007]
		- "They arrived in Nigeria, hitherto a leading critic of [...]" [^8]

		➡️ Hence, we use the two classes `non-literal` and `literal` for this binary classification task.
		➡️ Hence, we use the two classes `non-literal` and `literal` for our binary classification task.

		***

		@@ -75,13 +75,12 @@ Consequently, it is a vital technique for evaluating the robustness of models, a
		***

		## 💡 Methods <a name="methods"></a>
		When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^6] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g.
		case changing of single characters or embedding replacements [ZITAT einfügen?]), we chose more innovative and challenging methods.
		When selecting methods for our task, the main goal was to find a tradeoff between label preserving methods and diversifying our dataset. Since the language models BERT [^3] and RoBERTa [^7] have not been found to profit from very basic augmentation strategies (e.g. case changing of single characters or embedding replacements [^1]), we chose more innovative and challenging methods.

		To be able to compare the influence of augmentations in different spaces, we select a method for data space and two methods for the feature space.

		### 📝 1. Backtranslation (Data Space)<a name="backtranslation"></a>
		As a comparatively safe (= label preserving) data augmentation strategy, we selected backtranslation using the machine translation model Fairseq[^1] [[Ott et al., 2019] ]. Similar to Chen et al.[^2] we use the pre-trained single models :
		As a comparatively safe (= label preserving) data augmentation strategy, we selected backtranslation using the machine translation model Fairseq [^9]. Adapting the approach of Chen et al. [^2] we use the pre-trained single models :

		- [`transformer.wmt19.en-de.single_model`](https://huggingface.co/facebook/wmt19-en-de)
		- [`transformer.wmt19.de-en.single_model`](https://huggingface.co/facebook/wmt19-de-en)
		@@ -108,7 +107,7 @@ As a comparatively safe (= label preserving) data augmentation strategy, we sele


		### 🍸 2. MixUp (Feature Space)<a name="mixup"></a>
		Our method adopts the framework of the mixup transformer proposed by Sun et al. [^4]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, BERT-base-uncased [^6]).
		Our method adopts the framework of the MixUp transformer proposed by Sun et al. [^10]. This approach involves interpolating the representation of two instances on the last hidden state of the transformer model (in our case, `BERT-base-uncased`).

		To derive the interpolated hidden representation and corresponding label, we use the following formulas on the representation of two data samples:

		@@ -120,22 +119,18 @@ $$\hat{x} = \lambda T(x_i) + (1- \lambda)T(x_j)$$

		$$\hat{y} = \lambda T(y_i) + (1- \lambda)T(y_j)$$

		Here, $T(x_i)$ and $T(x_j)$
		represent the hidden representations of the two instances, $T(y_i)$
		and $T(y_j)$ represent their corresponding labels, and $\lambda$ is a mixing coefficient that determines the degree of interpolation.
		We used a fixed $\lambda$ which was set for the entire training process. In the following the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
		The MixUp process can be used dynamically during training at any epoch.
		Here, $T(x_i)$ and $T(x_j)$ represent the hidden representations of the two instances, and $T(y_i)$ and $T(y_j)$ represent their corresponding labels. $\lambda$ is a mixing coefficient that determines the degree of the interpolation.

		We used a fixed $\lambda$ which was set for the entire training process. In the following, the derived instances $\hat{x}$ with the derived label $\hat{y}$ as new true label are given into the classifier to generate a prediction.
		The MixUp process can be used dynamically during training at any epoch.

		***

		## 🗃️ Data <a name="data"></a>
		The datasets used in this project will be taken from Li et al.[^5] We confine ourselves to the following three:

		- SemEval: [ZITAT: Markert and Nissim, 2007 ]
		The datasets used in this project will be taken from Li et al.[^6] We confine ourselves to the following three:

		\| 1. SemEval: Locations \| 2. SemEval: Companies & Organizations \| 3. ReLocar: Locations \|
		\| 1. SemEval: Locations [^8]\| 2. SemEval: Companies & Organizations [^8]\| 3. ReLocar: Locations [^5]\|
		\| ---------------- \| -------------------------------- \| -----------------------------------------\|
		\|[ZITAT: Markert and Nissim, 2007 ]\| [ZITAT: Markert and Nissim, 2007 ]\|[ZITAT: Gritta et al., 2017 ]\|
		\| <img src="documentation/images/semeval_loc_metonym_ratio.png"> \| <img src="documentation/images/semeval_org_metonym_ratio.png"> \| <img src="documentation/images/relocar_metonym_ratio.png">

		🖊️ Data Point Example:
		@@ -212,24 +207,22 @@ For `<COMMAND>` you must enter one of the commands you find in the list below, w

		## 📑 References <a name="references"></a>

		[^1]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.

		[^2]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.


		[^3]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.
		[^1]: Bayer, Markus, Kaufhold, Marc-André & Reuter, Christian. ["A survey on data augmentation for text classification."](https://arxiv.org/abs/2107.03158) CoRR, 2021.

		[^4]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.
		[^2]: Chen, Jiaao, Wu, Yuwei & Yang, Diyi. ["Semi-supervised models via data augmentation for classifying interactive affective responses."](https://arxiv.org/abs/2004.10972) 2020.

		[^5]: Longpre, Shayne, Wang, Yu & DuBois, Chris. ["How effective is task-agnostic data augmentation for pretrained transformers?"](https://aclanthology.org/) Findings of the Association for Computational Linguistics: EMNLP 2020, 2020.
		[^3]: Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton & Toutanova, Kristina. ["BERT: pre-training of deep bidirectional transformers for language understanding."](http://arxiv.org/abs/1810.04805) CoRR, 2018.

		[^4]: English Oxford Dictionary. ["Metonymy"](https://www.oxfordbibliographies.com/view/document/obo-9780199772810/obo-9780199772810-0252.xml)

		[^6]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.
		[^5]: Gritta, Milan, Pilehvar, Mohammad, Taher, Limsopatham, Nut & Collier, Nigel. ["Vancouver welcomes you! minimalist location metonymy resolution."](https://aclanthology.org/P17-1115) Proceedings of the 55th Annual Meeting of the Association for Computational, 2017.

		[^6]: Li, Haonan, Vasardani, Maria, Tomko, Martin & Baldwin, Timothy. ["Target word masking for location metonymy resolution."](https://aclanthology.org/2020) Proceedings of the 28th International Conference on Computational Linguistics, December 2020.

		[^7]: Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
		[^7]: Liu, Yinhan, Ott, Myle, Goyal, Naman, Du, Jingfei, Joshi, Mandar, Chen, Danqi, Levy, Omer, Lewis, Mike, Stoyanov, Veselin & Stoyanov, Veselin. ["RoBERTa: A robustly optimized BERT pretraining approach."](https://dblp.org/rec/journals/corr/abs-1907-11692.bib) CoRR, 2019.

		[^8]: Sun, L., Xia, C., Yin, W., Liang, T., Yu, P. S., & He, L. (2020). Mixup-transformer: dynamic data augmentation for NLP tasks. arXiv preprint arXiv:2010.02394.
		[^8]: Markert, Katja & Nissim, Malvina. ["SemEval-2007 task 08: Metonymy resolution at SemEval-2007."](https://aclanthology.org/S07-1007) Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), 2007.

		[^9]: Ott, Myle, Edunov, Sergey, Baevski, Alexei, Fan, Angela, Gross, Sam, Ng, Nathan, Grangier, David & Auli, Michael. ["fairseq: A fast, extensible toolkit for sequence modeling."](https://aclanthology.org/N19-4009) Proceedings of NAACL-HLT 2019: Demonstrations, 2019.

		[^9]: Zhang, Hongyi, Cissé, Moustapha, Dauphin, Yann N. & Lopez-Paz, David. mixup: Beyond empirical risk minimization. CoRR, 2017.
		[^10]: Sun, Lichao, Xia, Congying, Yin, Wenpeng, Liang, Tingting, Yu, Philip S. & He, Lifang. ["Mixup-transformer: dynamic data augmentation for NLP tasks."](https://arxiv.org/abs/2010.02394) 2020.