Skip to content

stopwords

Full name: tenets.core.nlp.stopwords

stopwords

Stopword management for different contexts.

This module manages multiple stopword sets for different purposes: - Minimal set for code search (preserve accuracy) - Aggressive set for prompt parsing (extract intent) - Custom sets for specific domains

Classes

StopwordSetdataclass

Python
StopwordSet(name: str, words: Set[str], description: str, source_file: Optional[Path] = None)

A set of stopwords with metadata.

ATTRIBUTEDESCRIPTION
name

Name of this stopword set

TYPE:str

words

Set of stopword strings

TYPE:Set[str]

description

What this set is used for

TYPE:str

source_file

Path to source file

TYPE:Optional[Path]

Functions
filter
Python
filter(words: List[str]) -> List[str]

Filter stopwords from word list.

PARAMETERDESCRIPTION
words

List of words to filter

TYPE:List[str]

RETURNSDESCRIPTION
List[str]

Filtered list without stopwords

Source code in tenets/core/nlp/stopwords.py
Python
def filter(self, words: List[str]) -> List[str]:
    """Filter stopwords from word list.

    Args:
        words: List of words to filter

    Returns:
        Filtered list without stopwords
    """
    return [w for w in words if w.lower() not in self.words]

StopwordManager

Python
StopwordManager(data_dir: Optional[Path] = None)

Manages multiple stopword sets for different contexts.

Initialize stopword manager.

PARAMETERDESCRIPTION
data_dir

Directory containing stopword files

TYPE:Optional[Path]DEFAULT:None

Source code in tenets/core/nlp/stopwords.py
Python
def __init__(self, data_dir: Optional[Path] = None):
    """Initialize stopword manager.

    Args:
        data_dir: Directory containing stopword files
    """
    self.logger = get_logger(__name__)
    self.data_dir = data_dir or self.DEFAULT_DATA_DIR
    self._sets: dict[str, StopwordSet] = {}

    # Load default sets
    self._load_default_sets()
Functions
get_set
Python
get_set(name: str) -> Optional[StopwordSet]

Get a stopword set by name.

PARAMETERDESCRIPTION
name

Name of stopword set ('code', 'prompt', etc.)

TYPE:str

RETURNSDESCRIPTION
Optional[StopwordSet]

StopwordSet or None if not found

Source code in tenets/core/nlp/stopwords.py
Python
def get_set(self, name: str) -> Optional[StopwordSet]:
    """Get a stopword set by name.

    Args:
        name: Name of stopword set ('code', 'prompt', etc.)

    Returns:
        StopwordSet or None if not found
    """
    return self._sets.get(name)
add_custom_set
Python
add_custom_set(name: str, words: Set[str], description: str = '') -> StopwordSet

Add a custom stopword set.

PARAMETERDESCRIPTION
name

Name for the set

TYPE:str

words

Set of stopword strings

TYPE:Set[str]

description

What this set is for

TYPE:strDEFAULT:''

RETURNSDESCRIPTION
StopwordSet

Created StopwordSet

Source code in tenets/core/nlp/stopwords.py
Python
def add_custom_set(self, name: str, words: Set[str], description: str = "") -> StopwordSet:
    """Add a custom stopword set.

    Args:
        name: Name for the set
        words: Set of stopword strings
        description: What this set is for

    Returns:
        Created StopwordSet
    """
    stopword_set = StopwordSet(
        name=name, words={w.lower() for w in words}, description=description
    )
    self._sets[name] = stopword_set
    return stopword_set
combine_sets
Python
combine_sets(sets: List[str], name: str = 'combined') -> StopwordSet

Combine multiple stopword sets.

PARAMETERDESCRIPTION
sets

Names of sets to combine

TYPE:List[str]

name

Name for combined set

TYPE:strDEFAULT:'combined'

RETURNSDESCRIPTION
StopwordSet

Combined StopwordSet

Source code in tenets/core/nlp/stopwords.py
Python
def combine_sets(self, sets: List[str], name: str = "combined") -> StopwordSet:
    """Combine multiple stopword sets.

    Args:
        sets: Names of sets to combine
        name: Name for combined set

    Returns:
        Combined StopwordSet
    """
    combined_words = set()

    for set_name in sets:
        if set_name in self._sets:
            combined_words |= self._sets[set_name].words

    return StopwordSet(
        name=name, words=combined_words, description=f"Combined from: {', '.join(sets)}"
    )

Functions