Risk on default Prediction

With this data analysis I was able to determine the
likelihood of a new customer having future defaults on a loan.

The data for this project were taken from Kaggle and can be downloaded directly from here.

Data Preview

Observations

The data contain 252000 rows and 13 columns
There are no NaN values
There are no duplicate values

Data Dictionary

Variable	Definition
Id	Id of the costumer
income	Income of the customer
age	Age of the customer
experience	Experience of the customer in years
profession	Profession of the customer
married	Whether married or single
house_ownership	House ownership status of the customer
car_ownership	Whether the customer owns a car or not
risk_flag	Whether the customer defaulted on the loan or not
currentjobyears	Years of experience in the current job of customer
currenthouseyears	Number of years in the current residence
city	City of residence
state	State of residence

Costumer review

Observations

Most of the clients:

are single
are renting
don't own a car
have been working at the same job for the past 3 to 8-9 years
have been living in the same house for the last 10 to 14 years
have not default on their loan

Machine learning

For this project I have used the Train_test_split machine learning from sklearn. You can find more info about sklearn, Train_test_split and more here.

Conclusion

With machine learning, I can predict that 71.77% of new customers will not have a default on their loan payments.

Back to Home