Add links to Huggingface impls (b64d6f5f) · Commits · 🧃 / Orangensaft

docs/models.md

+4 −0

Original line number	Diff line number	Diff line
		@@ -11,16 +11,19 @@ Preferably, the models should also be popular. This is not a hard requirement, b

		1. [BERT](https://arxiv.org/abs/1810.04805)
		* [Original TensorFlow implementation](https://github.com/google-research/bert/blob/master/run_pretraining.py)
		* [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bert)
		* > Training BERT_BASE on 4 cloud TPU (16 TPU chips total) took 4 days, at an estimated cost of 500 USD. Training BERT_LARGE on 16 cloud TPU (64 TPU chips total) took 4 days. (says [Wikipedia](https://en.wikipedia.org/wiki/BERT_(language_model)))

		## Encoder-Decoder

		1. [T5](https://arxiv.org/abs/1910.10683)
		* [Original TensorFlow implementation](https://github.com/google-research/text-to-text-transfer-transformer)
		* [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/t5)
		* [JAX/Flax implementation](https://github.com/google-research/t5x)
		* How expensive is training?
		2. [BART](https://aclanthology.org/2020.acl-main.703/)
		* [Original PyTorch implementation](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/bart/model.py)
		* [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bart)
		* How expensive is training?

		## Decoder-only
		@@ -28,4 +31,5 @@ Preferably, the models should also be popular. This is not a hard requirement, b
		1. [GPT2](https://arxiv.org/abs/1908.09203)
		* [Original TensorFlow implementation](https://github.com/openai/gpt-2)
		* [Re-implementation in JAX/Equinox by Becky and Jakob](https://gitlab.cl.uni-heidelberg.de/moser/emotion)
		* [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/gpt2)
		* How expensive is training?