Loading docs/models.md +4 −0 Original line number Diff line number Diff line Loading @@ -11,16 +11,19 @@ Preferably, the models should also be popular. This is not a hard requirement, b 1. [BERT](https://arxiv.org/abs/1810.04805) * [Original TensorFlow implementation](https://github.com/google-research/bert/blob/master/run_pretraining.py) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bert) * > Training BERT_BASE on 4 cloud TPU (16 TPU chips total) took 4 days, at an estimated cost of 500 USD. Training BERT_LARGE on 16 cloud TPU (64 TPU chips total) took 4 days. (says [Wikipedia](https://en.wikipedia.org/wiki/BERT_(language_model))) ## Encoder-Decoder 1. [T5](https://arxiv.org/abs/1910.10683) * [Original TensorFlow implementation](https://github.com/google-research/text-to-text-transfer-transformer) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/t5) * [JAX/Flax implementation](https://github.com/google-research/t5x) * How expensive is training? 2. [BART](https://aclanthology.org/2020.acl-main.703/) * [Original PyTorch implementation](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/bart/model.py) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bart) * How expensive is training? ## Decoder-only Loading @@ -28,4 +31,5 @@ Preferably, the models should also be popular. This is not a hard requirement, b 1. [GPT2](https://arxiv.org/abs/1908.09203) * [Original TensorFlow implementation](https://github.com/openai/gpt-2) * [Re-implementation in JAX/Equinox by Becky and Jakob](https://gitlab.cl.uni-heidelberg.de/moser/emotion) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/gpt2) * How expensive is training? Loading
docs/models.md +4 −0 Original line number Diff line number Diff line Loading @@ -11,16 +11,19 @@ Preferably, the models should also be popular. This is not a hard requirement, b 1. [BERT](https://arxiv.org/abs/1810.04805) * [Original TensorFlow implementation](https://github.com/google-research/bert/blob/master/run_pretraining.py) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bert) * > Training BERT_BASE on 4 cloud TPU (16 TPU chips total) took 4 days, at an estimated cost of 500 USD. Training BERT_LARGE on 16 cloud TPU (64 TPU chips total) took 4 days. (says [Wikipedia](https://en.wikipedia.org/wiki/BERT_(language_model))) ## Encoder-Decoder 1. [T5](https://arxiv.org/abs/1910.10683) * [Original TensorFlow implementation](https://github.com/google-research/text-to-text-transfer-transformer) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/t5) * [JAX/Flax implementation](https://github.com/google-research/t5x) * How expensive is training? 2. [BART](https://aclanthology.org/2020.acl-main.703/) * [Original PyTorch implementation](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/bart/model.py) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/bart) * How expensive is training? ## Decoder-only Loading @@ -28,4 +31,5 @@ Preferably, the models should also be popular. This is not a hard requirement, b 1. [GPT2](https://arxiv.org/abs/1908.09203) * [Original TensorFlow implementation](https://github.com/openai/gpt-2) * [Re-implementation in JAX/Equinox by Becky and Jakob](https://gitlab.cl.uni-heidelberg.de/moser/emotion) * [Huggingface PyTorch implementation](https://huggingface.co/docs/transformers/v5.3.0/en/model_doc/gpt2) * How expensive is training?