Date: 18.08.2025
TensorFlow-friendly causal LMs in 🤗 Transformers 4.x
This table lists decoder-only (causal LM) model families that have TensorFlow classes in Transformers 4.x and publish TensorFlow weights (tf_model.h5) on the Hugging Face Hub.
If tf_model.h5 is present for the checkpoint, you can load with TFAutoModelForCausalLM.from_pretrained(...) without converting from PyTorch.
| Model family | Hugging Face repo | TF weights present? (Conversion needed?) | 
|---|---|---|
| OpenAI GPT (GPT‑1) | openai-community/openai-gpt | Yes — tf_model.h5 available → No conversion needed | 
    
| GPT‑2 (family) | openai-community/gpt2 | Yes — tf_model.h5 available → No conversion needed | 
    
| DistilGPT‑2 | distilbert/distilgpt2 | Yes — tf_model.h5 available → No conversion needed | 
    
| DialoGPT (GPT‑2 based) | microsoft/DialoGPT-medium | Yes — tf_model.h5 available → No conversion needed | 
    
| CTRL | Salesforce/ctrl | Yes — tf_model.h5 available → No conversion needed | 
    
| Transformer‑XL | transfo-xl/transfo-xl-wt103 | Yes — tf_model.h5 available → No conversion needed | 
    
| XLNet | xlnet/xlnet-base-cased | Yes — tf_model.h5 available → No conversion needed | 
    
| XLM (CLM variant) | FacebookAI/xlm-clm-ende-1024 | Yes — tf_model.h5 available → No conversion needed | 
    
| OPT (small) | facebook/opt-125m | Yes — tf_model.h5 available → No conversion needed | 
    
| OPT (2.7B) | facebook/opt-2.7b | Yes — tf_model.h5 available → No conversion needed | 
    
| GPT‑J‑6B | EleutherAI/gpt-j-6b | Yes — tf_model.h5 available → No conversion needed | 
    
Notes
- Transformers v4.x only. TensorFlow classes (e.g., 
TFAutoModelForCausalLM,TFGPT2LMHeadModel) are part of the 4.x line. In Transformers v5, TF support has been removed. Pintransformers<5for TF usage. - Per‑checkpoint variation. Some orgs publish TF weights for certain sizes but not others. Always check the target checkpoint’s Files & versions tab.
    
- GPT‑2 
medium/large/xlalso provide TF weights (tf_model.h5). 
 - GPT‑2 
 - If there is no 
tf_model.h5but the family has a TF class: you can often load from PyTorch withfrom_pt=Trueand thensave_pretrained(...)to create a localtf_model.h5. - Modern LLMs (e.g., LLaMA, Mistral, Mixtral, Gemma) generally do not ship TF classes in Transformers; they are PyTorch‑only in the library.
 - Keras 3 compatibility: with TF ≥ 2.16 use 
tf-keras~=2.16and setTF_USE_LEGACY_KERAS=1before importing TensorFlow sotransformers4.x works smoothly.