Verified Commit b64d6f5f authored by Jakob Moser's avatar Jakob Moser
Browse files

Add links to Huggingface impls

parent f0db648d
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -11,16 +11,19 @@ Preferably, the models should also be popular. This is not a hard requirement, b

1. [BERT](https://arxiv.org/abs/1810.04805)
   * [Original TensorFlow implementation](https://github.com/google-research/bert/blob/master/run_pretraining.py)
   * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bert)
   * > Training BERT_BASE on 4 cloud TPU (16 TPU chips total) took 4 days, at an estimated cost of 500 USD. Training BERT_LARGE on 16 cloud TPU (64 TPU chips total) took 4 days. (says [Wikipedia](https://en.wikipedia.org/wiki/BERT_(language_model)))

## Encoder-Decoder

1. [T5](https://arxiv.org/abs/1910.10683)
   * [Original TensorFlow implementation](https://github.com/google-research/text-to-text-transfer-transformer)
   * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/t5)
   * [JAX/Flax implementation](https://github.com/google-research/t5x)
   * How expensive is training?
2. [BART](https://aclanthology.org/2020.acl-main.703/)
   * [Original PyTorch implementation](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/bart/model.py)
   * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bart)
   * How expensive is training?

## Decoder-only
@@ -28,4 +31,5 @@ Preferably, the models should also be popular. This is not a hard requirement, b
1. [GPT2](https://arxiv.org/abs/1908.09203)
   * [Original TensorFlow implementation](https://github.com/openai/gpt-2)
   * [Re-implementation in JAX/Equinox by Becky and Jakob](https://gitlab.cl.uni-heidelberg.de/moser/emotion)
   * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/gpt2)
   * How expensive is training?