Conflict Collection¶
Data collection toolkit for Git merge conflicts: structural types, social signals, and similarity metrics.
conflict-collection builds on conflict-parser to let you:
- Enumerate and classify raw merge conflict cases (modify-modify, delete-modify, add-add, etc.)
- Capture rich 5‑tuple representations (A / B / O / M / R) suitable for ML datasets
- Extract social / ownership signals (recency, blame composition, prior integrator behaviour)
- Compute research-oriented metrics like the 3‑way anchored similarity ratio
Why this exists¶
Academic & engineering studies of merge conflicts often re‑implement similar collection logic scattered across scripts. This project offers a typed, test‑covered, reusable core to standardise merge conflict data pipelines.
Core Concepts¶
| Concept | Description |
|---|---|
| Conflict Case | A single logical conflict instance (may aggregate multiple Git index stages) |
| 5‑Tuple | Ordered versions: ours (A), theirs (B), base/virtual base (O), conflicted (M), resolved (R) |
| Social Signals | Ownership & recency meta-data helpful for predicting / understanding resolution difficulty |
| Anchored Ratio | A 3‑way normalised similarity score R vs R̂ relative to base O |
Install¶
pip install conflict-collection
# or with docs extras
pip install conflict-collection[docs]
Quick Links¶
- Quick Start
- Conflict Type Collector
- Societal Signals Collector
- Anchored Ratio Metric
- Data Models
- API Reference
Status¶
Beta. Interfaces may refine prior to 0.1. Feedback & PRs welcome.