TokenizeVocabulary#
2025-07-04
21 min read time
Applies to Linux
Constructor#
|
A vocabulary object used to tokenize input text. |
|
Parameters text cudf string series The strings to be tokenized. delimiter str Delimiter to identify tokens. Default is whitespace. default_id int Value to use for tokens not found in the vocabulary. Default is -1. |