Skip to content

stats

Full name: tenets.core.git.stats

stats

Git statistics module.

This module provides comprehensive statistical analysis of git repositories, including commit patterns, contributor metrics, file statistics, and repository growth analysis. It helps understand repository health, development patterns, and team dynamics through data-driven insights.

The statistics module aggregates various git metrics to provide actionable insights for project management and technical decision-making.

Classes

CommitStatsdataclass

Python
CommitStats(total_commits: int = 0, commits_per_day: float = 0.0, commits_per_week: float = 0.0, commits_per_month: float = 0.0, commit_size_avg: float = 0.0, commit_size_median: float = 0.0, commit_size_std: float = 0.0, largest_commit: Dict[str, Any] = dict(), smallest_commit: Dict[str, Any] = dict(), merge_commits: int = 0, revert_commits: int = 0, fix_commits: int = 0, feature_commits: int = 0, hourly_distribution: List[int] = (lambda: [0] * 24)(), daily_distribution: List[int] = (lambda: [0] * 7)(), monthly_distribution: List[int] = (lambda: [0] * 12)())

Statistics for commits.

Provides detailed statistical analysis of commit patterns including frequency, size, timing, and distribution metrics.

ATTRIBUTEDESCRIPTION
total_commits

Total number of commits

TYPE:int

commits_per_day

Average commits per day

TYPE:float

commits_per_week

Average commits per week

TYPE:float

commits_per_month

Average commits per month

TYPE:float

commit_size_avg

Average commit size (lines changed)

TYPE:float

commit_size_median

Median commit size

TYPE:float

commit_size_std

Standard deviation of commit size

TYPE:float

largest_commit

Largest single commit

TYPE:Dict[str, Any]

smallest_commit

Smallest single commit

TYPE:Dict[str, Any]

merge_commits

Number of merge commits

TYPE:int

revert_commits

Number of revert commits

TYPE:int

fix_commits

Number of fix commits

TYPE:int

feature_commits

Number of feature commits

TYPE:int

hourly_distribution

Commits by hour of day

TYPE:List[int]

daily_distribution

Commits by day of week

TYPE:List[int]

monthly_distribution

Commits by month

TYPE:List[int]

Attributes
merge_ratioproperty
Python
merge_ratio: float

Calculate merge commit ratio.

RETURNSDESCRIPTION
float

Ratio of merge commits to total

TYPE:float

fix_ratioproperty
Python
fix_ratio: float

Calculate fix commit ratio.

RETURNSDESCRIPTION
float

Ratio of fix commits to total

TYPE:float

peak_hourproperty
Python
peak_hour: int

Find peak commit hour.

RETURNSDESCRIPTION
int

Hour with most commits (0-23)

TYPE:int

peak_dayproperty
Python
peak_day: str

Find peak commit day.

RETURNSDESCRIPTION
str

Day with most commits

TYPE:str

ContributorStatsdataclass

Python
ContributorStats(total_contributors: int = 0, active_contributors: int = 0, new_contributors: int = 0, contributor_commits: Dict[str, int] = dict(), contributor_lines: Dict[str, int] = dict(), contributor_files: Dict[str, Set[str]] = dict(), top_contributors: List[Tuple[str, int]] = list(), contribution_inequality: float = 0.0, collaboration_graph: Dict[Tuple[str, str], int] = dict(), timezone_distribution: Dict[str, int] = dict(), retention_rate: float = 0.0, churn_rate: float = 0.0)

Statistics for contributors.

Provides analysis of contributor patterns, productivity metrics, and team dynamics based on git history.

ATTRIBUTEDESCRIPTION
total_contributors

Total unique contributors

TYPE:int

active_contributors

Contributors active in last 30 days

TYPE:int

new_contributors

New contributors in period

TYPE:int

contributor_commits

Commits per contributor

TYPE:Dict[str, int]

contributor_lines

Lines changed per contributor

TYPE:Dict[str, int]

contributor_files

Files touched per contributor

TYPE:Dict[str, Set[str]]

top_contributors

Most active contributors

TYPE:List[Tuple[str, int]]

contribution_inequality

Gini coefficient of contributions

TYPE:float

collaboration_graph

Who works with whom

TYPE:Dict[Tuple[str, str], int]

timezone_distribution

Contributors by timezone

TYPE:Dict[str, int]

retention_rate

Contributor retention rate

TYPE:float

churn_rate

Contributor churn rate

TYPE:float

Attributes
avg_commits_per_contributorproperty
Python
avg_commits_per_contributor: float

Calculate average commits per contributor.

RETURNSDESCRIPTION
float

Average commits

TYPE:float

bus_factorproperty
Python
bus_factor: int

Calculate bus factor.

RETURNSDESCRIPTION
int

Number of key contributors

TYPE:int

collaboration_scoreproperty
Python
collaboration_score: float

Calculate collaboration score.

RETURNSDESCRIPTION
float

Collaboration score (0-100)

TYPE:float

FileStatsdataclass

Python
FileStats(total_files: int = 0, active_files: int = 0, new_files: int = 0, deleted_files: int = 0, file_changes: Dict[str, int] = dict(), file_sizes: Dict[str, int] = dict(), largest_files: List[Tuple[str, int]] = list(), most_changed: List[Tuple[str, int]] = list(), file_age: Dict[str, int] = dict(), file_churn: Dict[str, float] = dict(), hot_files: List[str] = list(), stable_files: List[str] = list(), file_types: Dict[str, int] = dict())

Statistics for files.

Provides analysis of file-level metrics including change frequency, size distribution, and file lifecycle patterns.

ATTRIBUTEDESCRIPTION
total_files

Total files in repository

TYPE:int

active_files

Files changed in period

TYPE:int

new_files

Files added in period

TYPE:int

deleted_files

Files deleted in period

TYPE:int

file_changes

Number of changes per file

TYPE:Dict[str, int]

file_sizes

Size distribution of files

TYPE:Dict[str, int]

largest_files

Largest files by line count

TYPE:List[Tuple[str, int]]

most_changed

Most frequently changed files

TYPE:List[Tuple[str, int]]

file_age

Age distribution of files

TYPE:Dict[str, int]

file_churn

Churn rate per file

TYPE:Dict[str, float]

hot_files

Files with high activity

TYPE:List[str]

stable_files

Files with low activity

TYPE:List[str]

file_types

Distribution by file type

TYPE:Dict[str, int]

Attributes
avg_file_sizeproperty
Python
avg_file_size: float

Calculate average file size.

RETURNSDESCRIPTION
float

Average size in lines

TYPE:float

file_stabilityproperty
Python
file_stability: float

Calculate overall file stability.

RETURNSDESCRIPTION
float

Stability score (0-100)

TYPE:float

churn_rateproperty
Python
churn_rate: float

Calculate overall churn rate.

RETURNSDESCRIPTION
float

Average churn rate

TYPE:float

RepositoryStatsdataclass

Python
RepositoryStats(repo_age_days: int = 0, total_commits: int = 0, total_contributors: int = 0, total_files: int = 0, total_lines: int = 0, languages: Dict[str, int] = dict(), commit_stats: CommitStats = CommitStats(), contributor_stats: ContributorStats = ContributorStats(), file_stats: FileStats = FileStats(), growth_rate: float = 0.0, activity_trend: str = 'stable', health_score: float = 0.0, risk_factors: List[str] = list(), strengths: List[str] = list())

Overall repository statistics.

Aggregates various statistical analyses to provide comprehensive insights into repository health and development patterns.

ATTRIBUTEDESCRIPTION
repo_age_days

Age of repository in days

TYPE:int

total_commits

Total commits

TYPE:int

total_contributors

Total contributors

TYPE:int

total_files

Total files

TYPE:int

total_lines

Total lines of code

TYPE:int

languages

Programming languages used

TYPE:Dict[str, int]

commit_stats

Commit statistics

TYPE:CommitStats

contributor_stats

Contributor statistics

TYPE:ContributorStats

file_stats

File statistics

TYPE:FileStats

growth_rate

Repository growth rate

TYPE:float

activity_trend

Recent activity trend

TYPE:str

health_score

Overall health score

TYPE:float

risk_factors

Identified risk factors

TYPE:List[str]

strengths

Identified strengths

TYPE:List[str]

Functions
to_dict
Python
to_dict() -> Dict[str, Any]

Convert to dictionary representation.

RETURNSDESCRIPTION
Dict[str, Any]

Dict[str, Any]: Dictionary representation

Source code in tenets/core/git/stats.py
Python
def to_dict(self) -> Dict[str, Any]:
    """Convert to dictionary representation.

    Returns:
        Dict[str, Any]: Dictionary representation
    """
    return {
        "overview": {
            "repo_age_days": self.repo_age_days,
            "total_commits": self.total_commits,
            "total_contributors": self.total_contributors,
            "total_files": self.total_files,
            "total_lines": self.total_lines,
            "health_score": round(self.health_score, 1),
        },
        "languages": dict(
            sorted(self.languages.items(), key=lambda x: x[1], reverse=True)[:10]
        ),
        "commit_metrics": {
            "total": self.commit_stats.total_commits,
            "per_day": round(self.commit_stats.commits_per_day, 2),
            "merge_ratio": round(self.commit_stats.merge_ratio * 100, 1),
            "fix_ratio": round(self.commit_stats.fix_ratio * 100, 1),
            "peak_hour": self.commit_stats.peak_hour,
            "peak_day": self.commit_stats.peak_day,
        },
        "contributor_metrics": {
            "total": self.contributor_stats.total_contributors,
            "active": self.contributor_stats.active_contributors,
            "bus_factor": self.contributor_stats.bus_factor,
            "collaboration_score": round(self.contributor_stats.collaboration_score, 1),
            "top_contributors": self.contributor_stats.top_contributors[:5],
        },
        "file_metrics": {
            "total": self.file_stats.total_files,
            "active": self.file_stats.active_files,
            "stability": round(self.file_stats.file_stability, 1),
            "churn_rate": round(self.file_stats.churn_rate, 2),
            "hot_files": len(self.file_stats.hot_files),
        },
        "trends": {
            "growth_rate": round(self.growth_rate, 2),
            "activity_trend": self.activity_trend,
        },
        "risk_factors": self.risk_factors,
        "strengths": self.strengths,
    }

GitStatsAnalyzer

Python
GitStatsAnalyzer(config: TenetsConfig)

Analyzer for git repository statistics.

Provides comprehensive statistical analysis of git repositories to understand development patterns, team dynamics, and code health.

ATTRIBUTEDESCRIPTION
config

Configuration object

logger

Logger instance

git_analyzer

Git analyzer instance

TYPE:Optional[GitAnalyzer]

Initialize statistics analyzer.

PARAMETERDESCRIPTION
config

TenetsConfig instance

TYPE:TenetsConfig

Source code in tenets/core/git/stats.py
Python
def __init__(self, config: TenetsConfig):
    """Initialize statistics analyzer.

    Args:
        config: TenetsConfig instance
    """
    self.config = config
    self.logger = get_logger(__name__)
    self.git_analyzer: Optional[GitAnalyzer] = None
Functions
analyze
Python
analyze(repo_path: Path, since: Optional[str] = None, until: Optional[str] = None, branch: Optional[str] = None, include_files: bool = True, include_languages: bool = True, max_commits: int = 10000) -> RepositoryStats

Analyze repository statistics.

Performs comprehensive statistical analysis of a git repository to provide insights into development patterns and health.

PARAMETERDESCRIPTION
repo_path

Path to git repository

TYPE:Path

since

Start date or relative time

TYPE:Optional[str]DEFAULT:None

until

End date or relative time

TYPE:Optional[str]DEFAULT:None

branch

Specific branch to analyze

TYPE:Optional[str]DEFAULT:None

include_files

Whether to include file statistics

TYPE:boolDEFAULT:True

include_languages

Whether to analyze languages

TYPE:boolDEFAULT:True

max_commits

Maximum commits to analyze

TYPE:intDEFAULT:10000

RETURNSDESCRIPTION
RepositoryStats

Comprehensive statistics

TYPE:RepositoryStats

Example

analyzer = GitStatsAnalyzer(config) stats = analyzer.analyze(Path(".")) print(f"Health score: {stats.health_score}")

Source code in tenets/core/git/stats.py
Python
def analyze(
    self,
    repo_path: Path,
    since: Optional[str] = None,
    until: Optional[str] = None,
    branch: Optional[str] = None,
    include_files: bool = True,
    include_languages: bool = True,
    max_commits: int = 10000,
) -> RepositoryStats:
    """Analyze repository statistics.

    Performs comprehensive statistical analysis of a git repository
    to provide insights into development patterns and health.

    Args:
        repo_path: Path to git repository
        since: Start date or relative time
        until: End date or relative time
        branch: Specific branch to analyze
        include_files: Whether to include file statistics
        include_languages: Whether to analyze languages
        max_commits: Maximum commits to analyze

    Returns:
        RepositoryStats: Comprehensive statistics

    Example:
        >>> analyzer = GitStatsAnalyzer(config)
        >>> stats = analyzer.analyze(Path("."))
        >>> print(f"Health score: {stats.health_score}")
    """
    self.logger.debug(f"Analyzing statistics for {repo_path}")

    # Initialize git analyzer
    self.git_analyzer = GitAnalyzer(repo_path)

    if not self.git_analyzer.is_repo():
        self.logger.warning(f"Not a git repository: {repo_path}")
        return RepositoryStats()

    # Initialize stats
    stats = RepositoryStats()

    # Get time period
    start_date, end_date = self._parse_time_period(since, until)

    # Get commits
    commits = self._get_commits(start_date, end_date, branch, max_commits)

    if not commits:
        self.logger.info("No commits found in specified period")
        return stats

    # Calculate basic metrics
    stats.total_commits = len(commits)
    stats.repo_age_days = (end_date - start_date).days

    # Analyze commits
    stats.commit_stats = self._analyze_commits(commits, start_date, end_date)

    # Analyze contributors
    stats.contributor_stats = self._analyze_contributors(commits, end_date)
    stats.total_contributors = stats.contributor_stats.total_contributors

    # Analyze files if requested
    if include_files:
        stats.file_stats = self._analyze_files(commits, repo_path)
        stats.total_files = stats.file_stats.total_files

        # Get total lines
        stats.total_lines = sum(stats.file_stats.file_sizes.values())

    # Analyze languages if requested
    if include_languages:
        stats.languages = self._analyze_languages(repo_path)

    # Calculate trends
    stats.growth_rate = self._calculate_growth_rate(commits)
    stats.activity_trend = self._determine_activity_trend(commits)

    # Calculate health score
    stats.health_score = self._calculate_health_score(stats)

    # Identify risks and strengths
    stats.risk_factors = self._identify_risks(stats)
    stats.strengths = self._identify_strengths(stats)

    self.logger.debug(
        f"Statistics analysis complete: {stats.total_commits} commits, "
        f"{stats.total_contributors} contributors"
    )

    return stats

Functions

analyze_git_stats

Python
analyze_git_stats(repo_path: Path, config: Optional[TenetsConfig] = None, **kwargs: Any) -> RepositoryStats

Convenience function to analyze git statistics.

PARAMETERDESCRIPTION
repo_path

Path to repository

TYPE:Path

config

Optional configuration

TYPE:Optional[TenetsConfig]DEFAULT:None

**kwargs

Additional arguments

TYPE:AnyDEFAULT:{}

RETURNSDESCRIPTION
RepositoryStats

Repository statistics

TYPE:RepositoryStats

Example

from tenets.core.git.stats import analyze_git_stats stats = analyze_git_stats(Path(".")) print(f"Health score: {stats.health_score}")

Source code in tenets/core/git/stats.py
Python
def analyze_git_stats(
    repo_path: Path, config: Optional[TenetsConfig] = None, **kwargs: Any
) -> RepositoryStats:
    """Convenience function to analyze git statistics.

    Args:
        repo_path: Path to repository
        config: Optional configuration
        **kwargs: Additional arguments

    Returns:
        RepositoryStats: Repository statistics

    Example:
        >>> from tenets.core.git.stats import analyze_git_stats
        >>> stats = analyze_git_stats(Path("."))
        >>> print(f"Health score: {stats.health_score}")
    """
    if config is None:
        config = TenetsConfig()

    analyzer = GitStatsAnalyzer(config)
    return analyzer.analyze(repo_path, **kwargs)