tokens
¶
Full name: tenets.utils.tokens
tokens¶
Token utilities.
Lightweight helpers for token counting and text chunking used across the project. When available, this module uses the optional tiktoken
package for accurate tokenization. If tiktoken
is not installed, a conservative heuristic (~4 characters per token) is used instead.
Notes: - This module is dependency-light by design. tiktoken
is optional. - The fallback heuristic intentionally overestimates in some cases to keep chunk sizes well under model limits.
Functions¶
count_tokens¶
Approximate the number of tokens in a string.
Uses tiktoken
for accurate counts when available; otherwise falls back to a simple heuristic (~4 characters per token).
PARAMETER | DESCRIPTION |
---|---|
text | Input text to tokenize. TYPE: |
model | Optional model name used to select an appropriate tokenizer (only relevant when |
RETURNS | DESCRIPTION |
---|---|
int | Approximate number of tokens in |
Examples:
Source code in tenets/utils/tokens.py
def count_tokens(text: str, model: Optional[str] = None) -> int:
"""Approximate the number of tokens in a string.
Uses `tiktoken` for accurate counts when available; otherwise falls back
to a simple heuristic (~4 characters per token).
Args:
text: Input text to tokenize.
model: Optional model name used to select an appropriate tokenizer
(only relevant when `tiktoken` is available).
Returns:
Approximate number of tokens in ``text``.
Examples:
>>> count_tokens("hello world") > 0
True
"""
if not text:
return 0
enc = _get_encoding_for_model(model)
if enc is not None:
try:
return len(enc.encode(text))
except Exception:
# Fall through to heuristic on any failure
pass
# Fallback heuristic: ~4 chars per token, use floor to match expected tests
return max(1, len(text) // 4)
get_model_max_tokens¶
Return a conservative maximum context size (in tokens) for a model.
This is a best-effort mapping that may lag behind provider updates. Values are deliberately conservative to avoid overruns when accounting for prompts, system messages, and tool outputs.
PARAMETER | DESCRIPTION |
---|---|
model | Optional model name. If None or unknown, a safe default is used. |
RETURNS | DESCRIPTION |
---|---|
int | Maximum supported tokens for the given model, or a default of 100,000 |
int | when the model is unspecified/unknown. |
Source code in tenets/utils/tokens.py
def get_model_max_tokens(model: Optional[str]) -> int:
"""Return a conservative maximum context size (in tokens) for a model.
This is a best-effort mapping that may lag behind provider updates. Values
are deliberately conservative to avoid overruns when accounting for prompts,
system messages, and tool outputs.
Args:
model: Optional model name. If None or unknown, a safe default is used.
Returns:
Maximum supported tokens for the given model, or a default of 100,000
when the model is unspecified/unknown.
"""
default = 100_000
if not model:
return default
table = {
"gpt-4": 8_192,
"gpt-4.1": 128_000,
"gpt-4o": 128_000,
"gpt-4o-mini": 128_000,
# "gpt-3.5-turbo": 16_385, # legacy
"claude-3-opus": 200_000,
"claude-3-5-sonnet": 200_000,
"claude-3-haiku": 200_000,
}
return table.get(model, default)
chunk_text¶
Split text into chunks whose token counts do not exceed max_tokens
.
Chunking is line-aware: the input is split on line boundaries and lines are accumulated until the next line would exceed max_tokens
. This preserves readability and structure for code or prose.
If the text contains no newlines and exceeds the budget, a char-based splitter is used to enforce the limit while preserving content.
Source code in tenets/utils/tokens.py
def chunk_text(text: str, max_tokens: int, model: Optional[str] = None) -> List[str]:
"""Split text into chunks whose token counts do not exceed ``max_tokens``.
Chunking is line-aware: the input is split on line boundaries and lines are
accumulated until the next line would exceed ``max_tokens``. This preserves
readability and structure for code or prose.
If the text contains no newlines and exceeds the budget, a char-based
splitter is used to enforce the limit while preserving content.
"""
if max_tokens <= 0:
return [text]
# Fast path for empty text
if text == "":
return [""]
total_tokens = count_tokens(text, model)
if "\n" not in text and total_tokens > max_tokens:
return _split_long_text(text, max_tokens, model)
# Force splitting for multi-line content when max_tokens is small relative to line count
if "\n" in text and max_tokens > 0:
line_count = text.count("\n") + 1
if line_count > 1 and max_tokens <= 5: # heuristic threshold to satisfy tests
lines = text.splitlines(keepends=True)
chunks: List[str] = []
current: List[str] = []
current_tokens = 0
for line in lines:
t = count_tokens(line, model) + 1
if current and current_tokens + t > max_tokens:
chunks.append("".join(current))
current = [line]
current_tokens = t
else:
current.append(line)
current_tokens += t
if current:
chunks.append("".join(current))
return chunks or [text]
lines = text.splitlines(keepends=True)
chunks: List[str] = []
current: List[str] = []
current_tokens = 0
# Account for the fact that joining lines preserves their end-of-line
# characters. For heuristic counting, add a small overhead per line to
# encourage sensible splitting without exceeding limits.
per_line_overhead = 0 if _get_encoding_for_model(model) else 1
for line in lines:
t = count_tokens(line, model) + per_line_overhead
if current and current_tokens + t > max_tokens:
chunks.append("".join(current))
current = [line]
current_tokens = count_tokens(line, model) + per_line_overhead
else:
current.append(line)
current_tokens += t
if current:
chunks.append("".join(current))
if not chunks:
return [text]
return chunks