LLM Laboratory

18.08.2025 · general

TensorFlow-friendly causal LMs in Transformers 4.x

Date: 18.08.2025

TensorFlow-friendly causal LMs in 🤗 Transformers 4.x

Table of Contents

Overview

This table lists decoder-only (causal LM) model families that have TensorFlow classes in Transformers 4.x and publish TensorFlow weights (tf_model.h5) on the Hugging Face Hub.
If tf_model.h5 is present for the checkpoint, you can load with TFAutoModelForCausalLM.from_pretrained(...) without converting from PyTorch.

Model family Hugging Face repo TF weights present? (Conversion needed?)
OpenAI GPT (GPT‑1) openai-community/openai-gpt Yestf_model.h5 available → No conversion needed
GPT‑2 (family) openai-community/gpt2 Yestf_model.h5 available → No conversion needed
DistilGPT‑2 distilbert/distilgpt2 Yestf_model.h5 available → No conversion needed
DialoGPT (GPT‑2 based) microsoft/DialoGPT-medium Yestf_model.h5 available → No conversion needed
CTRL Salesforce/ctrl Yestf_model.h5 available → No conversion needed
Transformer‑XL transfo-xl/transfo-xl-wt103 Yestf_model.h5 available → No conversion needed
XLNet xlnet/xlnet-base-cased Yestf_model.h5 available → No conversion needed
XLM (CLM variant) FacebookAI/xlm-clm-ende-1024 Yestf_model.h5 available → No conversion needed
OPT (small) facebook/opt-125m Yestf_model.h5 available → No conversion needed
OPT (2.7B) facebook/opt-2.7b Yestf_model.h5 available → No conversion needed
GPT‑J‑6B EleutherAI/gpt-j-6b Yestf_model.h5 available → No conversion needed

Notes

  • Transformers v4.x only. TensorFlow classes (e.g., TFAutoModelForCausalLM, TFGPT2LMHeadModel) are part of the 4.x line. In Transformers v5, TF support has been removed. Pin transformers<5 for TF usage.
  • Per‑checkpoint variation. Some orgs publish TF weights for certain sizes but not others. Always check the target checkpoint’s Files & versions tab.
    • GPT‑2 medium/large/xl also provide TF weights (tf_model.h5).
  • If there is no tf_model.h5 but the family has a TF class: you can often load from PyTorch with from_pt=True and then save_pretrained(...) to create a local tf_model.h5.
  • Modern LLMs (e.g., LLaMA, Mistral, Mixtral, Gemma) generally do not ship TF classes in Transformers; they are PyTorch‑only in the library.
  • Keras 3 compatibility: with TF ≥ 2.16 use tf-keras~=2.16 and set TF_USE_LEGACY_KERAS=1 before importing TensorFlow so transformers 4.x works smoothly.