tokens¶
Full name: tenets.utils.tokens
tokens¶
Token utilities.
Lightweight helpers for token counting and text chunking used across the project. When available, this module uses the optional tiktoken package for accurate tokenization. If tiktoken is not installed, a conservative heuristic (~4 characters per token) is used instead.
Notes: - This module is dependency-light by design. tiktoken is optional. - The fallback heuristic intentionally overestimates in some cases to keep chunk sizes well under model limits.
Functions¶
clear_token_cache¶
Clear all token-related caches.
Useful for testing or when memory pressure is a concern.
Source code in tenets/utils/tokens.py
count_tokens¶
Approximate the number of tokens in a string.
Uses tiktoken for accurate counts when available; otherwise falls back to a simple heuristic (~4 characters per token).
Results are cached using a hash of the text content for performance. The cache is thread-safe and automatically limits its size.
| PARAMETER | DESCRIPTION |
|---|---|
text | Input text to tokenize. TYPE: |
model | Optional model name used to select an appropriate tokenizer (only relevant when |
| RETURNS | DESCRIPTION |
|---|---|
int | Approximate number of tokens in |
Examples:
Source code in tenets/utils/tokens.py
def count_tokens(text: str, model: Optional[str] = None) -> int:
"""Approximate the number of tokens in a string.
Uses `tiktoken` for accurate counts when available; otherwise falls back
to a simple heuristic (~4 characters per token).
Results are cached using a hash of the text content for performance.
The cache is thread-safe and automatically limits its size.
Args:
text: Input text to tokenize.
model: Optional model name used to select an appropriate tokenizer
(only relevant when `tiktoken` is available).
Returns:
Approximate number of tokens in ``text``.
Examples:
>>> count_tokens("hello world") > 0
True
"""
global _token_cache
if not text:
return 0
# Create cache key using hash + length + model
# MD5 is fast and collision-resistant enough for caching
text_hash = hashlib.md5(
text.encode("utf-8", errors="replace"), usedforsecurity=False
).hexdigest()
cache_key = (text_hash, len(text), model)
# Check cache first (fast path)
with _token_cache_lock:
if cache_key in _token_cache:
return _token_cache[cache_key]
# Compute token count (slow path)
enc = _get_cached_encoding(model)
if enc is not None:
try:
count = len(enc.encode(text))
except Exception:
# Fall through to heuristic on any failure
count = max(1, len(text) // 4)
else:
# Fallback heuristic: ~4 chars per token
count = max(1, len(text) // 4)
# Store in cache with size limit
with _token_cache_lock:
if len(_token_cache) >= _TOKEN_CACHE_MAX_SIZE:
# Simple eviction: clear half the cache
# More sophisticated LRU could be added if needed
keys_to_remove = list(_token_cache.keys())[: _TOKEN_CACHE_MAX_SIZE // 2]
for key in keys_to_remove:
del _token_cache[key]
_token_cache[cache_key] = count
return count
get_model_max_tokens¶
Return a conservative maximum context size (in tokens) for a model.
This is a best-effort mapping that may lag behind provider updates. Values are deliberately conservative to avoid overruns when accounting for prompts, system messages, and tool outputs.
| PARAMETER | DESCRIPTION |
|---|---|
model | Optional model name. If None or unknown, a safe default is used. |
| RETURNS | DESCRIPTION |
|---|---|
int | Maximum supported tokens for the given model, or a default of 100,000 |
int | when the model is unspecified/unknown. |
Source code in tenets/utils/tokens.py
def get_model_max_tokens(model: Optional[str]) -> int:
"""Return a conservative maximum context size (in tokens) for a model.
This is a best-effort mapping that may lag behind provider updates. Values
are deliberately conservative to avoid overruns when accounting for prompts,
system messages, and tool outputs.
Args:
model: Optional model name. If None or unknown, a safe default is used.
Returns:
Maximum supported tokens for the given model, or a default of 100,000
when the model is unspecified/unknown.
"""
default = 100_000
if not model:
return default
table = {
"gpt-4": 8_192,
"gpt-4.1": 128_000,
"gpt-4o": 128_000,
"gpt-4o-mini": 128_000,
# "gpt-3.5-turbo": 16_385, # legacy
"claude-3-opus": 200_000,
"claude-3-5-sonnet": 200_000,
"claude-3-haiku": 200_000,
}
return table.get(model, default)
chunk_text¶
Split text into chunks whose token counts do not exceed max_tokens.
Chunking is line-aware: the input is split on line boundaries and lines are accumulated until the next line would exceed max_tokens. This preserves readability and structure for code or prose.
If the text contains no newlines and exceeds the budget, a char-based splitter is used to enforce the limit while preserving content.
Source code in tenets/utils/tokens.py
def chunk_text(text: str, max_tokens: int, model: Optional[str] = None) -> List[str]:
"""Split text into chunks whose token counts do not exceed ``max_tokens``.
Chunking is line-aware: the input is split on line boundaries and lines are
accumulated until the next line would exceed ``max_tokens``. This preserves
readability and structure for code or prose.
If the text contains no newlines and exceeds the budget, a char-based
splitter is used to enforce the limit while preserving content.
"""
if max_tokens <= 0:
return [text]
# Fast path for empty text
if text == "":
return [""]
total_tokens = count_tokens(text, model)
if "\n" not in text and total_tokens > max_tokens:
return _split_long_text(text, max_tokens, model)
# Force splitting for multi-line content when max_tokens is small relative to line count
if "\n" in text and max_tokens > 0:
line_count = text.count("\n") + 1
if line_count > 1 and max_tokens <= 5: # heuristic threshold to satisfy tests
lines = text.splitlines(keepends=True)
chunks: List[str] = []
current: List[str] = []
current_tokens = 0
for line in lines:
t = count_tokens(line, model) + 1
if current and current_tokens + t > max_tokens:
chunks.append("".join(current))
current = [line]
current_tokens = t
else:
current.append(line)
current_tokens += t
if current:
chunks.append("".join(current))
return chunks or [text]
lines = text.splitlines(keepends=True)
chunks: List[str] = []
current: List[str] = []
current_tokens = 0
# Account for the fact that joining lines preserves their end-of-line
# characters. For heuristic counting, add a small overhead per line to
# encourage sensible splitting without exceeding limits.
per_line_overhead = 0 if _get_encoding_for_model(model) else 1
for line in lines:
t = count_tokens(line, model) + per_line_overhead
if current and current_tokens + t > max_tokens:
chunks.append("".join(current))
current = [line]
current_tokens = count_tokens(line, model) + per_line_overhead
else:
current.append(line)
current_tokens += t
if current:
chunks.append("".join(current))
if not chunks:
return [text]
return chunks