Titanic

Using Titanic’s passengers’ post-accident statistics, I determined the likelihood of
survival categorised by gender, age, class and marital status.

For this project I have used Kaggle. The dataset can be found on this page


Data Preview

Observations

  • The dataset contains 891 rows
  • The dataset contains 12 columns
  • There are 866 NaN values (most of them are in the Cabin column, but also in Age and 2 Embarked)
  • There are no duplicate values

Data Dictionary

Variable Definition Key
PassengerId Id of the passenger
Survived Survival 0 = No, 1 = Yes
Pclass Ticket class 1 = 1st Class, 2 = 2nd Class, 3 = 3th Class
Name Name of the passenger
Sex Sex of the passenger
Age Age in years
SibSp Number of siblings / spouses aboard the Titanic
Parch Number of parents / children aboard the Titanic
Ticket Ticket number
Fare Passenger fare
Cabin Cabin number
Embarked Port of embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Passengers categorisation

I could also divide the passengers by sex for each class.

Observations

  • The total number of males was nearly double compared to that of female passengers.
  • If the dataset is categorised by the class in which the passengers were, the first and second classes had an equal distribution of males and females.
  • The third class presented a majority of males instead.

Age classification

Based on the Age dataset I was able to further differentiate the population by visualising the number of under 16 passengers to the bar chart. To do this I extrapolated all passengers that were under 16 (males and females) and categorised them as "child".

Observations

  • Most of the children were in the third class

Age Histogram chart


kdeplot by age


Observations

  • Most of the passengers were between 20 and 30 years old
  • Most of the children were between 0 and 5 years old

Family classification

Thanks to the SibSp ("Number of siblings/spouses aboard the Titanic") and Parch ("Number of parents/children aboard the Titanic") data, I was able to determine who was travelling alone or not.

Observations

  • Most of the passenger were travelling alone
  • Survival rate was significantly lower for the passengers who were travelling alone (0 = they didn't survive, 1 = they survived)

Survival rate


Observations

  • Males had a significantly lower survival rate
  • Passengers in the first and the second class had a higher survival rate