feat(examples): Add customer churn prediction ML example by Drowser2430 · Pull Request #561 · promptdriven/pdd

Drowser2430 · 2026-02-23T21:03:07Z

Adds a complete customer churn prediction ML example using sklearn LogisticRegression.

Files added:

customer_churn.py — main ML module (train + predict functions)
test_customer_churn.py — 18 pytest unit tests (all passing ✅)
example_customer_churn.py — runnable demo script
customer_churn_python.prompt — PDD prompt (source of truth)
README.md — setup and usage docs

Note: Files should be organized under examples/customer_churn/ — happy to restructure if needed.

This adds PDD's first data science/ML example, demonstrating the full PDD workflow on a real-world use case. Related to my application for the AI Engineer role.

feat(examples): Add customer churn prediction ML example

Copilot

Pull request overview

Adds a new ML/data-science example demonstrating Prompt-Driven Development (PDD) end-to-end for customer churn prediction using a scikit-learn LogisticRegression pipeline.

Changes:

Introduces a churn prediction module (train + predict) plus a runnable demo script.
Adds a pytest-based unit test suite for the example.
Adds accompanying PDD prompt and updates examples documentation.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`pdd-contribution-Drowser2430.zip`	Adds a zipped contribution bundle (currently includes build/test artifacts and duplicates).
`examples/customer_churn.py`	New churn training/prediction module using sklearn Pipeline + ColumnTransformer.
`examples/example_customer_churn.py`	New runnable demo generating synthetic data and printing evaluation + predictions.
`examples/test_customer_churn.py`	New pytest suite validating train/predict behavior and edge cases.
`examples/customer_churn_python.prompt`	New PDD prompt describing the churn module requirements.
`examples/README.md`	Replaces the examples index with churn-specific documentation (needs restructuring).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T21:18:00Z

examples/README.md

+| PDD Concept | Implementation |
+|---|---|
+| Prompt as source of truth | `prompts/customer_churn_python.prompt` |
+| Code generated from prompt | `customer_churn.py` |
+| Usage example | `example_customer_churn.py` |


This README points to prompts/customer_churn_python.prompt, but the prompt file added by this PR is examples/customer_churn_python.prompt (no examples/prompts/ directory). Please fix the documented path (or move the prompt file) so the README reflects the actual layout.

Copilot · 2026-02-24T21:18:01Z

examples/test_customer_churn.py

+import pytest
+import pandas as pd
+import numpy as np
+from customer_churn import train, predict


Tests import from customer_churn import train, predict, which depends on how pytest is invoked and the working directory/PYTHONPATH. Given the intended layout under examples/customer_churn/, please ensure the test import path matches the final structure so pytest can be run as documented (and without relying on implicit cwd behavior).

Suggested change

from customer_churn import train, predict

from examples.customer_churn import train, predict

Copilot · 2026-02-24T21:18:01Z

examples/README.md

+examples/customer_churn/
+├── prompts/
+│   └── customer_churn_python.prompt   # PDD prompt (source of truth)
+├── customer_churn.py                  # Generated module
+├── example_customer_churn.py          # Runnable demo


The documented file tree assumes an examples/customer_churn/ folder, but this PR currently adds the churn files directly under examples/. Please either move the files into the documented directory structure or update the tree and commands accordingly.

Copilot · 2026-02-24T21:18:01Z

examples/customer_churn_python.prompt

+Create a Python module that trains a binary classification model to predict
+customer churn. The module should:
+
+1. Accept a dataset (as a pandas DataFrame or CSV path) with customer features


The prompt says the module should accept a dataset as a DataFrame or CSV path, but train() in this PR only accepts pd.DataFrame. Since the prompt is treated as source-of-truth, either update this line or implement CSV-path support.

Suggested change

1. Accept a dataset (as a pandas DataFrame or CSV path) with customer features

1. Accept a dataset as a pandas DataFrame with customer features

Copilot · 2026-02-24T21:18:02Z

examples/README.md

+# Customer Churn Prediction — PDD Example

-This directory contains examples that demonstrate comparisons between using Cursor and Prompt-Driven Development (PDD) for various programming tasks. These examples serve as practical illustrations of how PDD can be used to generate and modify code, via the pdd sync command, and how it compares to traditional development approaches.
+This example demonstrates a complete **Prompt-Driven Development** workflow for a real-world machine learning use case: **predicting customer churn** using logistic regression.

-## Getting Started
+It is a companion to the core `hello` and `factorial_calculator` examples, showing PDD applied to a **data science / ML context** — a domain not previously covered in the official examples.


examples/README.md has been replaced with churn-specific documentation, which removes the overview/index for all other example projects under examples/. Please restore the examples index README and move the churn docs into a dedicated examples/customer_churn/README.md (then link to it from the main examples README).

Copilot · 2026-02-24T21:18:02Z

examples/customer_churn.py

+"""
+Customer Churn Prediction Module
+Generated via PDD (Prompt-Driven Development) workflow.
+Prompt: prompts/customer_churn_python.prompt


The module docstring says Prompt: prompts/customer_churn_python.prompt, but the prompt file added in this PR is examples/customer_churn_python.prompt (and there is no examples/prompts/ folder). Update the reference so the source-of-truth prompt path is correct after the final directory layout is decided.

Suggested change

Prompt: prompts/customer_churn_python.prompt

Prompt: examples/customer_churn_python.prompt

Copilot · 2026-02-24T21:18:02Z

examples/customer_churn.py

+
+    categorical_transformer = Pipeline(steps=[
+        ("imputer", SimpleImputer(strategy="most_frequent")),
+        ("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))


OneHotEncoder(..., sparse_output=False) requires scikit-learn >= 1.2; the README currently installs scikit-learn without a minimum version. Either document the minimum required scikit-learn version for this example or use an encoder argument compatible with older versions to avoid runtime failures for users.

Suggested change

("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))

("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))

Copilot · 2026-02-24T21:18:02Z

examples/example_customer_churn.py

+from customer_churn import train, predict
+
+


This example script imports from customer_churn import train, predict, which only works if customer_churn.py is on the Python path (e.g., running from the same directory). This conflicts with the README’s cd examples/customer_churn instructions (directory doesn’t exist in this PR). Please align the import with the final folder structure (e.g., move files under examples/customer_churn/ and keep relative execution consistent, or adjust the import/package layout accordingly).

Suggested change

from customer_churn import train, predict

import sys

from pathlib import Path

try:

from customer_churn import train, predict

except ImportError:

# Allow running this example from the `examples/` directory by

# adding the repository root (parent of `examples/`) to sys.path.

repo_root = Path(__file__).resolve().parents[1]

if str(repo_root) not in sys.path:

sys.path.insert(0, str(repo_root))

from customer_churn import train, predict

gltanaka

Hey @Drowser2430 — thanks so much for putting this together! It's awesome to see someone take the initiative to build a full ML example for PDD, and the customer churn use case is a great choice. The code itself is clean, well-tested, and clearly demonstrates the train → evaluate → predict workflow. Really appreciate the effort here. 🙌

I have some feedback that should help get this ready to merge. Most of it is structural, and a few items relate to how PDD prompts are typically written (we have a prompting guide that covers the conventions).

Structural Issues (please fix before merge)

1. `examples/README.md` was overwritten

The existing README documents all the other examples (agentic fallback, edit file tool, handpaint, hello world, hello you, pi calc, QR code sandwich). This PR replaces it entirely with content only about customer churn. Could you restore the original and instead place your README at examples/customer_churn/README.md? You could also add a short entry for the new example in the top-level README.

2. Binary zip file committed

pdd-contribution-Drowser2430.zip was included at the repo root — this should be removed from the PR.

3. Files should live in a subdirectory

Your PR description and README both describe an examples/customer_churn/ directory structure (which is the right idea!), but the files are currently placed flat in examples/. Moving them into examples/customer_churn/ (with the prompt under prompts/) would match the description and keep things tidy.

Prompt Guide Alignment

These aren't blockers, but aligning with the project's prompting guide would make this a stronger example of the PDD workflow.

4. Prompt-to-code ratio is a bit high (~46%)

The guide recommends 10–30% of expected code size. Right now the prompt is 76 lines for ~165 lines of code. A lot of what's in the "Technical Requirements" section (pipeline structure, max_iter=1000, random_state=42, test_size=0.2) is specifying how to implement rather than what the module should do. Trimming those implementation details would bring the ratio down nicely.

The guide puts it well: "Focus on Interfaces, Invariants, and Outcomes. Let grounding handle implementation patterns."

5. Missing shared preamble `<include>`

PDD prompts typically start with something like <include>context/project_preamble.prompt</include> for shared style rules. Things like "All functions must include type hints" and "Include docstrings" are great conventions — they just belong in a preamble rather than the individual prompt.

6. Consider adding PDD metadata tags

Tags like <pdd-reason>, <pdd-interface>, and <pdd-dependency> help with architecture sync. Not required, but they'd make this example more complete as a PDD reference.

7. Prompt format

PDD prompts typically use % section markers or XML-style tags rather than markdown ## headings. Check the example in the prompting guide for the conventional structure.

8. Example Usage section in the prompt

The ## Example Usage block in the prompt largely duplicates example_customer_churn.py. Consider removing it from the prompt and using <include> to reference the example file if needed — this keeps the prompt focused on requirements.

Minor Code/Test Suggestions

9. `predict()` returning `0.0` for `None` model

Returning 0.0 (meaning "no churn risk") when the model is None could silently hide bugs. A ValueError or a logged warning might be safer — happy to hear your thinking on this though!

10. Missing features silently filled with `NaN`

In predict(), missing keys in the customer dict get silently filled with np.nan. A warning or validation step would help callers catch mistakes.

11. `test_low_risk_customer_has_lower_prob` name vs. assertion

This test name suggests it checks that low-risk < high-risk, but the assertion only checks both are in [0, 1]. Totally understandable with small synthetic data — maybe just rename it to test_valid_probabilities_for_different_risk_profiles or similar so the name matches what's being asserted.

TL;DR

The core contribution is solid — well-structured code, good test coverage, and a practical use case. The main things to address are:

Restore the original examples/README.md (and add yours in examples/customer_churn/)
Remove the zip file from the repo
Move files into examples/customer_churn/ subdirectory
Trim the prompt to focus on WHAT, not HOW (optional but recommended)

Thanks again for contributing — looking forward to the next revision! 🎉

Drowser2430 · 2026-03-01T02:14:23Z

Thanks Greg — really appreciate the detailed review.

I’m going to push a cleanup revision that:

restores examples/README.md (and moves my churn README into examples/customer_churn/README.md),
removes the committed zip file,
moves all churn files into examples/customer_churn/ with the prompt under examples/customer_churn/prompts/.

I’ll update the PR shortly — thanks again!

Drowser2430 added 2 commits February 23, 2026 14:25

Add files via upload

f3ceb5a

feat(examples): Add customer churn prediction ML example

Add files via upload

251d986

feat(examples): Add customer churn prediction ML example

gltanaka requested a review from Copilot February 24, 2026 21:13

Copilot started reviewing on behalf of gltanaka February 24, 2026 21:13 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

gltanaka requested changes Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(examples): Add customer churn prediction ML example#561

feat(examples): Add customer churn prediction ML example#561
Drowser2430 wants to merge 2 commits intopromptdriven:mainfrom
Drowser2430:main

Drowser2430 commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

gltanaka left a comment

Uh oh!

Drowser2430 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	from customer_churn import train, predict
	from examples.customer_churn import train, predict

	1. Accept a dataset (as a pandas DataFrame or CSV path) with customer features
	1. Accept a dataset as a pandas DataFrame with customer features

	Prompt: prompts/customer_churn_python.prompt
	Prompt: examples/customer_churn_python.prompt

	("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
	("onehot", OneHotEncoder(handle_unknown="ignore", sparse=False))

-from customer_churn import train, predict
+import sys
+from pathlib import Path
+try:
+    from customer_churn import train, predict
+except ImportError:
+    # Allow running this example from the `examples/` directory by
+    # adding the repository root (parent of `examples/`) to sys.path.
+    repo_root = Path(__file__).resolve().parents[1]
+    if str(repo_root) not in sys.path:
+        sys.path.insert(0, str(repo_root))
+    from customer_churn import train, predict

Conversation

Drowser2430 commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

gltanaka left a comment

Choose a reason for hiding this comment

Structural Issues (please fix before merge)

1. examples/README.md was overwritten

2. Binary zip file committed

3. Files should live in a subdirectory

Prompt Guide Alignment

4. Prompt-to-code ratio is a bit high (~46%)

5. Missing shared preamble <include>

6. Consider adding PDD metadata tags

7. Prompt format

8. Example Usage section in the prompt

Minor Code/Test Suggestions

9. predict() returning 0.0 for None model

10. Missing features silently filled with NaN

11. test_low_risk_customer_has_lower_prob name vs. assertion

TL;DR

Uh oh!

Drowser2430 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `examples/README.md` was overwritten

5. Missing shared preamble `<include>`

9. `predict()` returning `0.0` for `None` model

10. Missing features silently filled with `NaN`

11. `test_low_risk_customer_has_lower_prob` name vs. assertion