Guide · 8 min read
How to predict customer churn from an Excel file — without a data scientist
If you run operations at an Indian D2C or retail business, the data to predict who will leave is already in your spreadsheet. Here is exactly how to turn it into a ranked, explainable churn list you can act on this week.
What “predicting churn” actually means
Customer churn is when a paying customer stops buying or cancels. Predicting churn means estimating, for each customer right now, how likely they are to leave in the next period — so you can spend retention effort on the accounts that are actually at risk, instead of everyone or no one.
You do not need a machine-learning team for this. You need three things: a clean-ish spreadsheet, one column that records whether a customer churned in the past, and a tool that fits a model and explains it. That is the entire job.
The columns you need in your sheet
One row per customer. The single most important column is the outcome — a column that says whether that customer churned (for example churn with values Yes/No, or status = active/cancelled). Then add any behavioural signals you already track:
- Tenure — how many months the customer has been with you.
- Recency — days since last order or last login.
- Monetary — average or total spend.
- Support load — number of tickets or complaints.
- Plan / segment / region — categorical context.
Even five columns are enough to find a real pattern. More history (a few hundred rows) gives a more reliable model, but you can start with what you have.
How the prediction works (in plain terms)
Under the hood, a churn model learns the relationship between your input columns and the past churn outcome. A transparent approach — the one SheetSense uses — is logistic regression. It assigns each factor a weight: some push the probability of churn up, some pull it down. The result is a probability between 0% and 100% for every customer, plus a ranked list of which factors mattered most.
The advantage of a transparent model over a black box is trust. When you tell your founder “these 40 accounts are high-risk, mostly because they haven’t logged in for 45+ days and have raised 3+ tickets,” that is a conversation the business can act on. A model that just outputs a score with no reason gets ignored.
Reading the drivers
A good churn tool does not just score customers — it explains the drivers. For a typical subscription business you will often see patterns like:
- Days since last login ↑ churn — disengagement is the loudest signal.
- Support tickets ↑ churn — friction predicts exit.
- Tenure ↓ churn — long-standing customers are stickier.
- Higher spend / premium plan ↓ churn — committed customers stay.
These directions are not assumptions — a fitted model measures them from yourdata, so you learn what is true for your business specifically.
How much should you trust the model?
Always check the quality metric measured on data the model has not seen. For churn, the key number is AUC (area under the ROC curve): 0.5 is a coin flip, 1.0 is perfect. Anything above roughly 0.7 is genuinely useful for prioritising retention. If the AUC is near 0.5, your columns do not yet contain enough signal — add a behavioural column like recency and try again.
Turning the prediction into action
- Sort customers by churn probability and take the top 10–20%.
- Cross-reference with value: prioritise high-probability and high-spend.
- Attack the top driver — if inactivity leads, launch a re-engagement nudge.
- Re-run monthly and track whether your at-risk rate falls.
Do it now with SheetSense
SheetSense was built for exactly this workflow. Upload a CSV or Excel, click your churn column, and get the ranked risk list, the drivers in Hindi or English, and a one-page PDF for your team — free for your first file up to 1,000 rows. There is a demo churn file if you want to see it work before using your own data.