stopwords
¶
Full name: tenets.core.nlp.stopwords
stopwords¶
Stopword management for different contexts.
This module manages multiple stopword sets for different purposes: - Minimal set for code search (preserve accuracy) - Aggressive set for prompt parsing (extract intent) - Custom sets for specific domains
Classes¶
StopwordSetdataclass
¶
A set of stopwords with metadata.
ATTRIBUTE | DESCRIPTION |
---|---|
name | Name of this stopword set TYPE: |
words | Set of stopword strings |
description | What this set is used for TYPE: |
source_file | Path to source file |
StopwordManager¶
Manages multiple stopword sets for different contexts.
Initialize stopword manager.
PARAMETER | DESCRIPTION |
---|---|
data_dir | Directory containing stopword files |
Source code in tenets/core/nlp/stopwords.py
def __init__(self, data_dir: Optional[Path] = None):
"""Initialize stopword manager.
Args:
data_dir: Directory containing stopword files
"""
self.logger = get_logger(__name__)
self.data_dir = data_dir or self.DEFAULT_DATA_DIR
self._sets: dict[str, StopwordSet] = {}
# Load default sets
self._load_default_sets()
Functions¶
get_set¶
Get a stopword set by name.
PARAMETER | DESCRIPTION |
---|---|
name | Name of stopword set ('code', 'prompt', etc.) TYPE: |
RETURNS | DESCRIPTION |
---|---|
Optional[StopwordSet] | StopwordSet or None if not found |
add_custom_set¶
Add a custom stopword set.
PARAMETER | DESCRIPTION |
---|---|
name | Name for the set TYPE: |
words | Set of stopword strings |
description | What this set is for TYPE: |
RETURNS | DESCRIPTION |
---|---|
StopwordSet | Created StopwordSet |
Source code in tenets/core/nlp/stopwords.py
def add_custom_set(self, name: str, words: Set[str], description: str = "") -> StopwordSet:
"""Add a custom stopword set.
Args:
name: Name for the set
words: Set of stopword strings
description: What this set is for
Returns:
Created StopwordSet
"""
stopword_set = StopwordSet(
name=name, words={w.lower() for w in words}, description=description
)
self._sets[name] = stopword_set
return stopword_set
combine_sets¶
Combine multiple stopword sets.
PARAMETER | DESCRIPTION |
---|---|
sets | Names of sets to combine |
name | Name for combined set TYPE: |
RETURNS | DESCRIPTION |
---|---|
StopwordSet | Combined StopwordSet |
Source code in tenets/core/nlp/stopwords.py
def combine_sets(self, sets: List[str], name: str = "combined") -> StopwordSet:
"""Combine multiple stopword sets.
Args:
sets: Names of sets to combine
name: Name for combined set
Returns:
Combined StopwordSet
"""
combined_words = set()
for set_name in sets:
if set_name in self._sets:
combined_words |= self._sets[set_name].words
return StopwordSet(
name=name, words=combined_words, description=f"Combined from: {', '.join(sets)}"
)