Data analysis projects
The gym chain Model Fitness is developing a customer interaction strategy based on analytical data.
Analyze customer profiles and come up with a customer retention strategy.
1. Plotting distribution of customer churn

2. Correlation matrix

The correlation matrix tells us the following:
contract_period and month_to_end_contract have the strongest positive correlation among other features 0.97 which is quiet logical as they move in the same direction the longer the contract period the longer the month till the end of the contract.avg_class_frequency_total and avg_class_frequency_current_month have also strong positive correlation = 0.95 which is again logical since total includes the current month, the higher they spend in current month, the higher it will be in total.lifetime and churn have the strongest negative correlation = -0.44. The longer is customer’s lifetime the less likely for them to churn.We ave set the X variable for features and y variable for target which is churn of our dataset. We have divided the dataset to 80/20 where 80% of our dataset will be train set and remaining 20% is a validation set where we will be comparing our predictions with the actual data.
We have built binary classification models which is designed to predict the probability, in our case of churn (for the upcoming month) for each customer.

Accuracy is the share of accurate predictions among all predictions where closer to 1, the better. For this metric, Logistic Regression has a better score = 0.93
Precision tells us what share of predictions in class 1 are true by looking at the share of correct answers only in the target class, where closer to 1, the better. For this metric, Logistic Regression has a better score = 0.86
Recall aims at minimizing the opposite risks by demonstrating the number of real class 1 objects you were able to discover with your model where the closer to 1, the better. For this metric, Logistic Regression has a better score = 0.83
To sum up, the Logistic Regression model gave better results and we can implement it to forecast.
In order to use K-Means clustering (which groups objects step by step under the assumption that the number of user clusters is already known), we must determine the number of user clusters that can be identified. The distance between the objects and the agglomerative hierarchical clustering itself can be visualized with special plots called dendrograms. We have set the n = 5, clusters. We have also calculated the silhouette score which shows the extent to which an object from a cluster is similar to its cluster, rather than to another one. The closer to 1, the better the clustering. In our case the Silhouette score is 0.14 which is not high.

Calculating churn rate for each cluster

According to the table above, customers from clusters 3 and 2 are most likely to leave with churn rates of 51.4% and 44.3 % respectively.
gender - client’s gendernear_location - whether the user lives or works in the neighborhood where the gym is locatedpartner - whether the user is an employee of a partner companypromo_friends - whether the user originally signed up through a “bring a friend” offerphone - whether the user provided their phone numbercontract_period- 1 month, 3 months, 6 months, or 1 yeargroup_visits - whether the user takes part in group sessionsage - user’s ageavg_additional_charges_total - the total amount of money spent on other gym services: cafe, athletic goods, cosmeticsmonth_to_end_contract - the months remaining until the contract expireslifetime - the time (in months) since the customer first came to the gymavg_class_frequency_total - average frequency of visits per week over the customer’s lifetimeavg_class_frequency_current_month - average frequency of visits per week over the preceding monthchurn- the fact of churn for the month in questionnear_location feature plays vital role whether the customer stays or leaves, of course it is logical. If the gym is close to their home/office, customer most probably visits the gym.partner feature is also one of the most important features to improve customer retention. Employees from a partner companies also tend to be more closer and loyal to the gym than third party visitors.promo_friends feature also makes customer retention better. Clients come and train together, even have the same schedule can also attract other customers by inviting their friends. Socializing is one of the key factors nowadays.group_visits stats also showed that the higher the participation in group sessions the lower the churn rate. Our clients who take part in group sessions most probably are more effective and accurate towards their goals.age is always one of the most important factors to find the suitable target group. The average range of clients who are loyal to the gym is 27-32.avg_class_frequency_total clients with the average frequency of visits of 2 times per week show loyalty to gym. The more often user comes to the gym, the higher is their loyalty.