@@ -215,11 +215,11 @@ Clustering the verbs for a specific relation. In `all_guesses_cluster` all resul
### Baselines
The baselines used were a majority baseline, a random prediction, and a random classification weighted by category frequency. The **majority baseline** was implemented as the verb that occurred most frequently in the gold standard: affect (relation: OBJECTIVE). The **random prediction** was randomly drawn from the set of gold verbs - implemented both as a draw from a **uniform distribution** as well as a draw **weighted by occurrence** in the gold standard. It was drawn only from the set of gold verbs and not from the sets of all, respectively, the most frequent English verbs; not to mention a random draw from all English words. Thus, these random baselines are still far above what BERT would predict by a completely naive, semantics-illiterate prediction.
The baselines used were a majority baseline, a random prediction, and a random classification weighted by category frequency. The **majority baseline** was implemented as the verb that occurred most frequently in the gold standard: *affect* and *done* (relations: *OBJECTIVE* and *CAUSAL*). The **random prediction** was randomly drawn from the set of gold verbs - implemented both as a draw from a **uniform distribution** as well as a draw **weighted by occurrence** in the gold standard. It was drawn only from the set of gold verbs and not from the sets of all, respectively, the most frequent English verbs; not to mention a random draw from all English words. Thus, these random baselines are still far above what BERT would predict by a completely naive, semantics-illiterate prediction.
#### Fine Grained
The results below indicate that BERT's prediction for the fine-grained relations is significantly better than a uniform random prediction as well as a stratified random prediction. However, the prediction is below the majority baseline. Even though, the most frequent category accounts for only 6,6% of the data.
The results below indicate that BERT's prediction for the fine-grained relations is significantly better than a uniform random prediction as well as a stratified random prediction. However, the prediction is below the majority baseline. Even though, *OBJECTIVE*, the most frequent relation represented by the gold verb *affect*, accounts for only 6,6% of the data.
<div align="center">
@@ -235,7 +235,7 @@ The results below indicate that BERT's prediction for the fine-grained relations
#### Coarse Grained
Consistent with the results of the significance tests for the fine-grained relations, the coarse-grained prediction is significantly better than the two randomized baselines, but is below the majority baseline. Even though, the most frequent relation CAUSAL with the gold verb x accounts for only 12.6% of the data.
Consistent with the results of the significance tests for the fine-grained relations, the coarse-grained prediction is significantly better than the two randomized baselines, but is below the majority baseline. Even though, the most frequent relation *CAUSAL* with the gold verb *done* accounts for only 12.6% of the data.