About Me

Will(Xiaochun) Ma

A Data Scientist who loves jogging and traveling.

My Career


I am working as a Data Scientist Intern. Doing search optimtization, recommendation and classification jobs.

Jun. 2018
Data Scientist intern


I worked as a NLP and Software engineer. Building news summarization and recommendation system.

Jan. 2018
NLP and Software engineer intern

Columbia law school

I worked at Columbia law school as a Research Assistant. I Parsed and analyzed sentence structure and sentence meaning of law statement for further win-lose predictive modeling and built database and query interface of online scrapped court judgments.

Sep. 2017 to Jan. 2018
NLP full stack research assistant

Didi Chuxing(Chinese Uber)

I worked at Columbia law school as a Data Scientist Intern. I Mined and modeled upon data of driver's safety features, evaluated the Dangerous Driving SDK, profiled Cities in China on safety aspects and classified accidents and complaints based on main reasons.

Jan. 2017 to May. 2017
Data Scientist Intern

My Skills

My Projects

Key algorithms

In this project, I crawled and summerized the key algorithms.

...   ...

Key algorithms Android app

This project is child project of key algorithms that could help learners review algorithms.

...   ...

Movie recommendation system

This project was designed to recommend movies to users based on the ratings they gave about ten movies.

Cat recognition algorithm(Deep learning)

This project was designed to test if there is a cat in an image with 5 layer Neural Network. The algorithms were built from scratch.

Telecom company customer churn prediction

This project is to predict whether a customer will churn or not based on historical data, seven models were used.

Kaggle House price prediction

This project is to predict house price, with a lot of data visualization and several simple models were used.

Airline passenger prediction using time series

This project is for the purpose of predicting passenger number in an airline given the historical data.

Netease music annual report data analysis

This project was for the purpose of reconstructing database and data analysis of Netease music annual report, pandas intensive!

NBA teams playoffs data analysis

This project was a small hackathon for calculating when a team is no longer hopeful for playoffs.

User profile for Internet of Vehicles

This project was for the purpose of clustering drivers based on their driving habits, then explain their habits.

Amazon review sentiment analysis

This project was about using word vectors and rate labels to predict sentiment of Amazon Mobil phone review.

Document Similarity & Topic Modelling for News

This project is to compare document simliarity and based on this use LDA to do topic modeling.

Spelling Recommender

This project is to use Jaccard distance and Damerau–Levenshtein distance based on 3,4-grams to recommend correct spelling.