2. visualizing data
so I started by checking for any null values to drop and as you can see I found a lot.
Then I decided the have a quick look at histograms showing what numeric values are given and info about them.
I used another quick heatmap to get more info about what I am dealing with.
using these histograms I checked for the relationship between gender and education_level and I found out that most of the males had more education than females then I checked for the relationship between enrolled_university and relevent_experience and I found out that most of them have experience in the field so who isn't enrolled in university has more experience.
3. Predicting data
I made some predictions so I used city_development_index and enrollee_id trying to predict training_hours and here I used linear regression but I got a bad result as you can see.
So I went to using other variables trying to predict education_level but first, I had to make some changes to the used data as you can see I changed the column gender and education level one.
I ended up getting a slightly better result than the last time.
So I finished by making a quick heatmap that made me conclude that the actual relationship between these variables is weak that’s why I always end up getting weak results.
Github link: https://github.com/azizattia/HR-Analytics/blob/main/README.md