Titanic
Using Titanic’s passengers’
post-accident statistics, I determined the likelihood of
survival categorised by gender, age, class and marital status.

For this project I have used Kaggle. The dataset can be found on this page
Data Preview



Observations
- The dataset contains 891 rows
- The dataset contains 12 columns
- There are 866 NaN values (most of them are in the Cabin column, but also in Age and 2 Embarked)
- There are no duplicate values
Data Dictionary
Variable | Definition | Key |
---|---|---|
PassengerId | Id of the passenger | |
Survived | Survival | 0 = No, 1 = Yes |
Pclass | Ticket class | 1 = 1st Class, 2 = 2nd Class, 3 = 3th Class |
Name | Name of the passenger | |
Sex | Sex of the passenger | |
Age | Age in years | |
SibSp | Number of siblings / spouses aboard the Titanic | |
Parch | Number of parents / children aboard the Titanic | |
Ticket | Ticket number | |
Fare | Passenger fare | |
Cabin | Cabin number | |
Embarked | Port of embarkation | C = Cherbourg, Q = Queenstown, S = Southampton |
Passengers categorisation


Observations
- The total number of males was nearly double compared to that of female passengers.
- If the dataset is categorised by the class in which the passengers were, the first and second classes had an equal distribution of males and females.
- The third class presented a majority of males instead.
Age classification
Based on the Age dataset I was able to further differentiate the population by visualising the number of under 16 passengers to the bar chart. To do this I extrapolated all passengers that were under 16 (males and females) and categorised them as "child".

Observations
- Most of the children were in the third class
Age Histogram chart

kdeplot by age


Observations
- Most of the passengers were between 20 and 30 years old
- Most of the children were between 0 and 5 years old
Family classification
Thanks to the SibSp ("Number of siblings/spouses aboard the Titanic") and Parch ("Number of parents/children aboard the Titanic") data, I was able to determine who was travelling alone or not.

Observations
- Most of the passenger were travelling alone
- Survival rate was significantly lower for the passengers who were travelling alone (0 = they didn't survive, 1 = they survived)
Survival rate


Observations
- Males had a significantly lower survival rate
- Passengers in the first and the second class had a higher survival rate