top of page
Keyboard and Mouse

Data Science Projects

The goal of this Kaggle competition is to detect anomalies on chests X-rays images.

Examining X-ray

The goal of this project was to predict if a patient present heart disease based on test results, gender and age. The dataset is part of the UCI machine learning dataset.


This project aims to clean up and analyze the data set of Ph. D. students salaries by universities and departments over time. I performed the analysis with R and the tidyverse libraries. For the data cleaning, I excluded the variables containing a majority of missing values, combined similar departments, and separated the universities per location.

Screenshot 2021-02-21 at 23.00.47.png

This is your Project description. A brief summary can help visitors understand the context of your work. Click on "Edit Text" or double click on the text box to start.


Kasey Hemington runs BrainPost with a fellow PhD friend, Leigh Christopher as a way to keep in touch with her scientific roots while working as a data scientist!

The goal of this project is to answer the following question:

  • What content (or types of content) is most popular (what are patterns we see in popular content) and is different content popular amongst different subgroups (e.g. by source/medium)?

  • Where are people visiting from (source-wise)?

Image by Lukas Blazek

The goal of this project is to use statistical learning to identify the combination of the features that are more likely to be associated with stroke.

Brain Sketch

The goal of this analysis is to understand the Discord 66DaysofData community created by Ken Jee. Where are they coming from? What are the common question? How many people give up during the challenge progress? What are the most used programming language?

The user of the A14 road can report an incident using an application. The goal of this challenge proposed by the organizers of the Project: Hack5 hackathon and Highways England is to perform sentiment analysis to obtain new insights from the user's comments and improve the user experiences on the application.

Image by Roger Bradshaw

For this project, as for the NYC Airbnb analysis, I used Python and the scikit-learn library to predict the price of houses based on the independent variables. Because there were more variables in this dataset and because several factors were linearly correlated to the price, a multiple linear regression model performed well.

bottom of page