Skip to content

feat(examples): Add customer churn prediction ML example#561

Open
Drowser2430 wants to merge 2 commits intopromptdriven:mainfrom
Drowser2430:main
Open

feat(examples): Add customer churn prediction ML example#561
Drowser2430 wants to merge 2 commits intopromptdriven:mainfrom
Drowser2430:main

Conversation

@Drowser2430
Copy link
Copy Markdown

Adds a complete customer churn prediction ML example using sklearn LogisticRegression.

Files added:

  • customer_churn.py — main ML module (train + predict functions)
  • test_customer_churn.py — 18 pytest unit tests (all passing ✅)
  • example_customer_churn.py — runnable demo script
  • customer_churn_python.prompt — PDD prompt (source of truth)
  • README.md — setup and usage docs

Note: Files should be organized under examples/customer_churn/ — happy to restructure if needed.

This adds PDD's first data science/ML example, demonstrating the full PDD workflow on a real-world use case. Related to my application for the AI Engineer role.

feat(examples): Add customer churn prediction ML example
feat(examples): Add customer churn prediction ML example
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ML/data-science example demonstrating Prompt-Driven Development (PDD) end-to-end for customer churn prediction using a scikit-learn LogisticRegression pipeline.

Changes:

  • Introduces a churn prediction module (train + predict) plus a runnable demo script.
  • Adds a pytest-based unit test suite for the example.
  • Adds accompanying PDD prompt and updates examples documentation.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pdd-contribution-Drowser2430.zip Adds a zipped contribution bundle (currently includes build/test artifacts and duplicates).
examples/customer_churn.py New churn training/prediction module using sklearn Pipeline + ColumnTransformer.
examples/example_customer_churn.py New runnable demo generating synthetic data and printing evaluation + predictions.
examples/test_customer_churn.py New pytest suite validating train/predict behavior and edge cases.
examples/customer_churn_python.prompt New PDD prompt describing the churn module requirements.
examples/README.md Replaces the examples index with churn-specific documentation (needs restructuring).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +11 to +15
| PDD Concept | Implementation |
|---|---|
| Prompt as source of truth | `prompts/customer_churn_python.prompt` |
| Code generated from prompt | `customer_churn.py` |
| Usage example | `example_customer_churn.py` |
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README points to prompts/customer_churn_python.prompt, but the prompt file added by this PR is examples/customer_churn_python.prompt (no examples/prompts/ directory). Please fix the documented path (or move the prompt file) so the README reflects the actual layout.

Copilot uses AI. Check for mistakes.
import pytest
import pandas as pd
import numpy as np
from customer_churn import train, predict
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests import from customer_churn import train, predict, which depends on how pytest is invoked and the working directory/PYTHONPATH. Given the intended layout under examples/customer_churn/, please ensure the test import path matches the final structure so pytest can be run as documented (and without relying on implicit cwd behavior).

Suggested change
from customer_churn import train, predict
from examples.customer_churn import train, predict

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +27
examples/customer_churn/
├── prompts/
│ └── customer_churn_python.prompt # PDD prompt (source of truth)
├── customer_churn.py # Generated module
├── example_customer_churn.py # Runnable demo
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented file tree assumes an examples/customer_churn/ folder, but this PR currently adds the churn files directly under examples/. Please either move the files into the documented directory structure or update the tree and commands accordingly.

Copilot uses AI. Check for mistakes.
Create a Python module that trains a binary classification model to predict
customer churn. The module should:

1. Accept a dataset (as a pandas DataFrame or CSV path) with customer features
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt says the module should accept a dataset as a DataFrame or CSV path, but train() in this PR only accepts pd.DataFrame. Since the prompt is treated as source-of-truth, either update this line or implement CSV-path support.

Suggested change
1. Accept a dataset (as a pandas DataFrame or CSV path) with customer features
1. Accept a dataset as a pandas DataFrame with customer features

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# Customer Churn Prediction — PDD Example

This directory contains examples that demonstrate comparisons between using Cursor and Prompt-Driven Development (PDD) for various programming tasks. These examples serve as practical illustrations of how PDD can be used to generate and modify code, via the pdd sync command, and how it compares to traditional development approaches.
This example demonstrates a complete **Prompt-Driven Development** workflow for a real-world machine learning use case: **predicting customer churn** using logistic regression.

## Getting Started
It is a companion to the core `hello` and `factorial_calculator` examples, showing PDD applied to a **data science / ML context** — a domain not previously covered in the official examples.
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/README.md has been replaced with churn-specific documentation, which removes the overview/index for all other example projects under examples/. Please restore the examples index README and move the churn docs into a dedicated examples/customer_churn/README.md (then link to it from the main examples README).

Copilot uses AI. Check for mistakes.
"""
Customer Churn Prediction Module
Generated via PDD (Prompt-Driven Development) workflow.
Prompt: prompts/customer_churn_python.prompt
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring says Prompt: prompts/customer_churn_python.prompt, but the prompt file added in this PR is examples/customer_churn_python.prompt (and there is no examples/prompts/ folder). Update the reference so the source-of-truth prompt path is correct after the final directory layout is decided.

Suggested change
Prompt: prompts/customer_churn_python.prompt
Prompt: examples/customer_churn_python.prompt

Copilot uses AI. Check for mistakes.

categorical_transformer = Pipeline(steps=[
("imputer", SimpleImputer(strategy="most_frequent")),
("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OneHotEncoder(..., sparse_output=False) requires scikit-learn >= 1.2; the README currently installs scikit-learn without a minimum version. Either document the minimum required scikit-learn version for this example or use an encoder argument compatible with older versions to avoid runtime failures for users.

Suggested change
("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
from customer_churn import train, predict


Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example script imports from customer_churn import train, predict, which only works if customer_churn.py is on the Python path (e.g., running from the same directory). This conflicts with the README’s cd examples/customer_churn instructions (directory doesn’t exist in this PR). Please align the import with the final folder structure (e.g., move files under examples/customer_churn/ and keep relative execution consistent, or adjust the import/package layout accordingly).

Suggested change
from customer_churn import train, predict
import sys
from pathlib import Path
try:
from customer_churn import train, predict
except ImportError:
# Allow running this example from the `examples/` directory by
# adding the repository root (parent of `examples/`) to sys.path.
repo_root = Path(__file__).resolve().parents[1]
if str(repo_root) not in sys.path:
sys.path.insert(0, str(repo_root))
from customer_churn import train, predict

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gltanaka gltanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Drowser2430 — thanks so much for putting this together! It's awesome to see someone take the initiative to build a full ML example for PDD, and the customer churn use case is a great choice. The code itself is clean, well-tested, and clearly demonstrates the train → evaluate → predict workflow. Really appreciate the effort here. 🙌

I have some feedback that should help get this ready to merge. Most of it is structural, and a few items relate to how PDD prompts are typically written (we have a prompting guide that covers the conventions).


Structural Issues (please fix before merge)

1. examples/README.md was overwritten

The existing README documents all the other examples (agentic fallback, edit file tool, handpaint, hello world, hello you, pi calc, QR code sandwich). This PR replaces it entirely with content only about customer churn. Could you restore the original and instead place your README at examples/customer_churn/README.md? You could also add a short entry for the new example in the top-level README.

2. Binary zip file committed

pdd-contribution-Drowser2430.zip was included at the repo root — this should be removed from the PR.

3. Files should live in a subdirectory

Your PR description and README both describe an examples/customer_churn/ directory structure (which is the right idea!), but the files are currently placed flat in examples/. Moving them into examples/customer_churn/ (with the prompt under prompts/) would match the description and keep things tidy.


Prompt Guide Alignment

These aren't blockers, but aligning with the project's prompting guide would make this a stronger example of the PDD workflow.

4. Prompt-to-code ratio is a bit high (~46%)

The guide recommends 10–30% of expected code size. Right now the prompt is 76 lines for ~165 lines of code. A lot of what's in the "Technical Requirements" section (pipeline structure, max_iter=1000, random_state=42, test_size=0.2) is specifying how to implement rather than what the module should do. Trimming those implementation details would bring the ratio down nicely.

The guide puts it well: "Focus on Interfaces, Invariants, and Outcomes. Let grounding handle implementation patterns."

5. Missing shared preamble <include>

PDD prompts typically start with something like <include>context/project_preamble.prompt</include> for shared style rules. Things like "All functions must include type hints" and "Include docstrings" are great conventions — they just belong in a preamble rather than the individual prompt.

6. Consider adding PDD metadata tags

Tags like <pdd-reason>, <pdd-interface>, and <pdd-dependency> help with architecture sync. Not required, but they'd make this example more complete as a PDD reference.

7. Prompt format

PDD prompts typically use % section markers or XML-style tags rather than markdown ## headings. Check the example in the prompting guide for the conventional structure.

8. Example Usage section in the prompt

The ## Example Usage block in the prompt largely duplicates example_customer_churn.py. Consider removing it from the prompt and using <include> to reference the example file if needed — this keeps the prompt focused on requirements.


Minor Code/Test Suggestions

9. predict() returning 0.0 for None model

Returning 0.0 (meaning "no churn risk") when the model is None could silently hide bugs. A ValueError or a logged warning might be safer — happy to hear your thinking on this though!

10. Missing features silently filled with NaN

In predict(), missing keys in the customer dict get silently filled with np.nan. A warning or validation step would help callers catch mistakes.

11. test_low_risk_customer_has_lower_prob name vs. assertion

This test name suggests it checks that low-risk < high-risk, but the assertion only checks both are in [0, 1]. Totally understandable with small synthetic data — maybe just rename it to test_valid_probabilities_for_different_risk_profiles or similar so the name matches what's being asserted.


TL;DR

The core contribution is solid — well-structured code, good test coverage, and a practical use case. The main things to address are:

  1. Restore the original examples/README.md (and add yours in examples/customer_churn/)
  2. Remove the zip file from the repo
  3. Move files into examples/customer_churn/ subdirectory
  4. Trim the prompt to focus on WHAT, not HOW (optional but recommended)

Thanks again for contributing — looking forward to the next revision! 🎉

@Drowser2430
Copy link
Copy Markdown
Author

Thanks Greg — really appreciate the detailed review.

I’m going to push a cleanup revision that:

  1. restores examples/README.md (and moves my churn README into examples/customer_churn/README.md),
  2. removes the committed zip file,
  3. moves all churn files into examples/customer_churn/ with the prompt under examples/customer_churn/prompts/.

I’ll update the PR shortly — thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants