TokenizeVocabulary

TokenizeVocabulary#

2025-07-04

21 min read time

Applies to Linux

`TokenizeVocabulary`(vocabulary)	A vocabulary object used to tokenize input text.
`TokenizeVocabulary.tokenize`(text[, ...])	Parameters text cudf string series The strings to be tokenized. delimiter str Delimiter to identify tokens. Default is whitespace. default_id int Value to use for tokens not found in the vocabulary. Default is -1.