Commit 22ccde5f authored by wernicke's avatar wernicke
Browse files

Explain plots in README

parent d1a051d4
Loading
Loading
Loading
Loading
+3 −3
Original line number Diff line number Diff line
@@ -215,11 +215,11 @@ Clustering the verbs for a specific relation. In `all_guesses_cluster` all resul

### Baselines

The baselines used were a majority baseline, a random prediction, and a random classification weighted by category frequency. The **majority baseline** was implemented as the verb that occurred most frequently in the gold standard: affect (relation: OBJECTIVE). The **random prediction** was randomly drawn from the set of gold verbs - implemented both as a draw from a **uniform distribution** as well as a draw **weighted by occurrence** in the gold standard. It was drawn only from the set of gold verbs and not from the sets of all, respectively, the most frequent English verbs; not to mention a random draw from all English words. Thus, these random baselines are still far above what BERT would predict by a completely naive, semantics-illiterate prediction.
The baselines used were a majority baseline, a random prediction, and a random classification weighted by category frequency. The **majority baseline** was implemented as the verb that occurred most frequently in the gold standard: *affect* and *done* (relations: *OBJECTIVE* and *CAUSAL*). The **random prediction** was randomly drawn from the set of gold verbs - implemented both as a draw from a **uniform distribution** as well as a draw **weighted by occurrence** in the gold standard. It was drawn only from the set of gold verbs and not from the sets of all, respectively, the most frequent English verbs; not to mention a random draw from all English words. Thus, these random baselines are still far above what BERT would predict by a completely naive, semantics-illiterate prediction.

#### Fine Grained

The results below indicate that BERT's prediction for the fine-grained relations is significantly better than a uniform random prediction as well as a stratified random prediction. However, the prediction is below the majority baseline. Even though, the most frequent category accounts for only 6,6% of the data.
The results below indicate that BERT's prediction for the fine-grained relations is significantly better than a uniform random prediction as well as a stratified random prediction. However, the prediction is below the majority baseline. Even though, *OBJECTIVE*, the most frequent relation represented by the gold verb *affect*, accounts for only 6,6% of the data.

<div align="center">

@@ -235,7 +235,7 @@ The results below indicate that BERT's prediction for the fine-grained relations


#### Coarse Grained
Consistent with the results of the significance tests for the fine-grained relations, the coarse-grained prediction is significantly better than the two randomized baselines, but is below the majority baseline. Even though, the most frequent relation CAUSAL with the gold verb x accounts for only 12.6% of the data.
Consistent with the results of the significance tests for the fine-grained relations, the coarse-grained prediction is significantly better than the two randomized baselines, but is below the majority baseline. Even though, the most frequent relation *CAUSAL* with the gold verb *done* accounts for only 12.6% of the data.

<div align="center">