Fine-tune GPT (without LoRA) on some fine-tuning dataset
We already have the GPT-2, we already have the classification head.
@laewen found a training loop in the Equinox documentation, maybe I can be inspired by that: Eamples → Advanced → BERT langugage model
Here we should use the optimizations (e.g. @jax.jit)
Edited by Jakob Moser