Wambui's Photo

Journey So Far

Hello! I'm Wambui, an aspiring Data Scientist driven by curiosity and a commitment to continuous learning. Here, you'll find projects showcasing my journey, where I apply data science techniques to solve real-world problems. I am passionate about turning data into actionable insights and aim to contribute meaningfully to society through innovative solutions

My Projects

Moringa School Projects

These projects showcase my ability to work on real-world data problems, leveraging tools such as Python, SQL, and Tableau, as part of my training at Moringa School.

Project 2

NLP Web-Based App for Depression Detection

Objective: Developed a natural language processing (NLP) model that classifies Reddit posts as either depressive or non-depressive.

Technologies Used: Python (NLTK, Scikit-learn), Data Visualization (Matplotlib, Seaborn), HTML/CSS, Streamlit

Features:

  • Input: User submits a Reddit post or text.
  • Output: Displays whether the text is classified as depressive or non-depressive with a confidence score.
  • Deployed as a web app for real-time use.
  • View Live Demo

    Outcome: Achieved 90% accuracy, showcasing linguistic markers of depressive tendencies.

    View Project
    customer-churn

    Sentiment Analysis of Social Media Posts

    Objective: Performed sentiment analysis on tweets related to Apple and Google products, classifying user sentiments into positive, negative, or neutral categories using Natural Language Processing (NLP) techniques. The goal is to assess public perception of these brands based on social media data.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Numpy, Scikit-learn, NLTK, TfidfVectorizer, Matplotlib, Seaborn
  • Techniques: Sentiment Analysis, Logistic Regression, Support Vector Machine (SVM), F1-Score Evaluation, Cross-Validation
  • Outcome: Based on the analysis, the SVM model is recommended for production use due to its superior overall accuracy and handling of neutral sentiment

    View Project
    customer-churn

    Predicting Customer Churn

    Objective: Developed models to predict customer churn, enabling proactive retention strategies.

    Technologies Used:Programming Languages: Python Libraries/Tools: Pandas, Matplotlib, Seaborn, Scikit-learn Techniques: Logistic Regression, Decision Trees, Exploratory Data Analysis (EDA), Data Cleaning

    Outcome: Achieved 82% model accuracy and identified key churn predictors, improving retention by 25%.

    View Project
    customer-churn

    Home Value Predictions

    Objective: Predict home values in King County based on multiple features, such as size, location, and property age, to provide actionable insights into the real estate market.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Linear Regression, Exploratory Data Analysis (EDA), Data Cleaning
  • Outcome: Achieved high model accuracy, enabling practical use for real estate investors and stakeholders.

    View Project
    movie-theatre

    Microsoft Movie Studio

    Objective:Analyze the Box Office Mojo and IMDB datasets to identify movie genres that perform well at the box office, providing actionable recommendations for investment and studio collaborations to increase box office success.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Statistical Analysis
  • Outcome: Provided actionable recommendations for investment in high-performing genres and strategic collaboration with leading studios to maximize box office success.

    View Project

    Personal Upskilling Projects

    These self-initiated projects demonstrate my ability to independently explore and solve complex problems while sharpening my technical skills.

    ship_image

    Titanic Survival Prediction

    Objective:To understand the characteristics of those who survived to provide more insights for improving survival rates in analogous scenarios

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Statistical Analysis
  • Outcome:I managed to predict Titanic survival correctly for 72% of people -as per this Kaggle competition results.

    View Project
    sales_dashboard

    Kwanza Tukule: Sales Performance Analysis

    Objective: To analyze sales data and customer purchasing behavior to provide business insights that improve decision-making.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Plotly, Dash, Matplotlib, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Customer Segmentation
  • Outcome: The project identified best-selling products, peak sales periods, and segmented customers based on purchasing behavior. The insights support inventory planning and targeted marketing strategies.

    View Project
    Project 2

    Next Project in the oven! 💪

    Stay tuned for more exciting projects coming soon!

    Get more details about my experience and projects by downloading my CV or resume.

    CV Resume

    Let’s connect!

    You can reach me through email or any of these platforms.