requirements : update transformers to 5.5.1 (#21617)

* requirements : update transformers to 5.5.0 This commit updates the transformers dependency to version 5.5.0. The motivation for this is that transformers 5.5.0 includes support for Gemma4 and is required to be able to convert Gemma4 models. This is also causing issues for user of gguf-my-repo. Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/202 * fix huggingface_hub version * set version of transformers to 5.5.0 * convert : add ty ignore directives to convert_hf_to_gguf.py This commit adds `ty: ignore` directives to transformers tokenizers field/methods to avoid type check errors. There might be better ways to handle this and perhaps this can be done in a follow up commit. The motivation for this is that it looks like in transformers 5.5.0 AutoTokenizer.from_pretrained can return generic tokenizer types or None and the type checker now produces an error when the conversion script accesses field like tokenizer.vocab. * convert : add ty ignore to suppress type check errors * convert : remove incorrect type ignores * convert : fix remaining python checks I was running a newer version of ty locally but I've switched to version 0.0.26 which is what CI uses and I was then able to reproduce the errors. Sorry about the noise. * update transformers version to 5.5.1
2026-04-09 12:36:29 +02:00
parent 4ef9301e4d
commit c8ac02fa1b
12 changed files with 108 additions and 108 deletions
@@ -543,7 +543,7 @@ class LlamaHfVocab(Vocab):
            cache_dir=base_path,
            local_files_only=True,
        )
-        assert self.tokenizer.is_fast  # assume tokenizer.json is used
+        assert self.tokenizer.is_fast  # assume tokenizer.json is used  # ty: ignore[unresolved-attribute]

        # Initialize lists and dictionaries for added tokens
        self.added_tokens_list = []
@@ -552,30 +552,30 @@ class LlamaHfVocab(Vocab):

        # Process added tokens
        for tok, tokidx in sorted(
-            self.tokenizer.get_added_vocab().items(), key=lambda x: x[1]
+            self.tokenizer.get_added_vocab().items(), key=lambda x: x[1]  # ty: ignore[unresolved-attribute]
        ):
            # Only consider added tokens that are not in the base vocabulary
-            if tokidx >= self.tokenizer.vocab_size:
+            if tokidx >= self.tokenizer.vocab_size:  # ty: ignore[unresolved-attribute]
                self.added_tokens_list.append(tok)
                self.added_tokens_dict[tok] = tokidx
                self.added_tokens_ids.add(tokidx)

        # Store special tokens and their IDs
        self.specials = {
-            tok: self.tokenizer.get_vocab()[tok]
-            for tok in self.tokenizer.all_special_tokens
+            tok: self.tokenizer.get_vocab()[tok]  # ty: ignore[unresolved-attribute]
+            for tok in self.tokenizer.all_special_tokens  # ty: ignore[unresolved-attribute]
        }
-        self.special_ids = set(self.tokenizer.all_special_ids)
+        self.special_ids = set(self.tokenizer.all_special_ids)  # ty: ignore[unresolved-attribute]

        # Set vocabulary sizes
-        self.vocab_size_base = self.tokenizer.vocab_size
+        self.vocab_size_base = self.tokenizer.vocab_size  # ty: ignore[unresolved-attribute]
        self.vocab_size      = self.vocab_size_base + len(self.added_tokens_list)

        self.fname_tokenizer = fname_tokenizer

    def hf_tokens(self) -> Iterable[tuple[bytes, float, gguf.TokenType]]:
        reverse_vocab = {
-            id: encoded_tok for encoded_tok, id in self.tokenizer.get_vocab().items()
+            id: encoded_tok for encoded_tok, id in self.tokenizer.get_vocab().items()  # ty: ignore[unresolved-attribute]
        }

        for token_id in range(self.vocab_size_base):
@@ -616,7 +616,7 @@ class LlamaHfVocab(Vocab):
            yield text.encode("utf-8"), score, toktype

    def has_newline_token(self):
-        return "<0x0A>" in self.tokenizer.vocab or "\n" in self.tokenizer.vocab
+        return "<0x0A>" in self.tokenizer.vocab or "\n" in self.tokenizer.vocab  # ty: ignore[unresolved-attribute]

    def all_tokens(self) -> Iterable[tuple[bytes, float, gguf.TokenType]]:
        yield from self.hf_tokens()