Wambui's Photo

From Data to Decisions

Hello! I'm Wambui, a Data Scientist driven by curiosity and a commitment to continuous learning.I leverage tools such as Python, SQL,and Machine learning Techniques to craft powerful visualizations and solve complex business problems. I am passionate about turning data into actionable insights and aim to contribute meaningfully to society through innovative solutions

My Projects

Moringa School Projects

These projects showcase my ability to work on real-world data problems, leveraging tools such as Python, SQL, and Tableau, as part of my training at Moringa School.

Project 2

NLP Web-Based App for Depression Detection

Objective: Developed a natural language processing (NLP) model that classifies Reddit posts as either depressive or non-depressive.

Technologies Used: Python (NLTK, Scikit-learn), Data Visualization (Matplotlib, Seaborn), HTML/CSS, Streamlit

Features:

  • Input: User submits a Reddit post or text.
  • Output: Displays whether the text is classified as depressive or non-depressive with a confidence score.
  • Deployed as a web app for real-time use.
  • View Live Demo

    Outcome: Achieved 90% accuracy, showcasing linguistic markers of depressive tendencies.

    View Project
    customer-churn

    Sentiment Analysis of Social Media Posts

    Objective: Performed sentiment analysis on tweets related to Apple and Google products, classifying user sentiments into positive, negative, or neutral categories using Natural Language Processing (NLP) techniques. The goal is to assess public perception of these brands based on social media data.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Numpy, Scikit-learn, NLTK, TfidfVectorizer, Matplotlib, Seaborn
  • Techniques: Sentiment Analysis, Logistic Regression, Support Vector Machine (SVM), F1-Score Evaluation, Cross-Validation
  • Outcome: Based on the analysis, the SVM model is recommended for production use due to its superior overall accuracy and handling of neutral sentiment

    View Project
    customer-churn

    Predicting Customer Churn

    Objective: Developed models to predict customer churn, enabling proactive retention strategies.

    Technologies Used:Programming Languages: Python Libraries/Tools: Pandas, Matplotlib, Seaborn, Scikit-learn Techniques: Logistic Regression, Decision Trees, Exploratory Data Analysis (EDA), Data Cleaning

    Outcome: Achieved 82% model accuracy and identified key churn predictors, improving retention by 25%.

    View Project
    customer-churn

    Home Value Predictions

    Objective: Predict home values in King County based on multiple features, such as size, location, and property age, to provide actionable insights into the real estate market.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Linear Regression, Exploratory Data Analysis (EDA), Data Cleaning
  • Outcome: Achieved high model accuracy, enabling practical use for real estate investors and stakeholders.

    View Project
    movie-theatre

    Microsoft Movie Studio

    Objective:Analyze the Box Office Mojo and IMDB datasets to identify movie genres that perform well at the box office, providing actionable recommendations for investment and studio collaborations to increase box office success.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Statistical Analysis
  • Outcome: Provided actionable recommendations for investment in high-performing genres and strategic collaboration with leading studios to maximize box office success.

    View Project

    Personal Upskilling Projects

    These self-initiated projects demonstrate my ability to independently explore and solve complex problems while sharpening my technical skills.

    ship_image

    Titanic Survival Prediction

    Objective:To understand the characteristics of those who survived to provide more insights for improving survival rates in analogous scenarios

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Statistical Analysis
  • Outcome:I managed to predict Titanic survival correctly for 72% of people -as per this Kaggle competition results.

    View Project
    sales_dashboard

    Kwanza Tukule: Sales Performance Analysis

    Objective: To analyze sales data and customer purchasing behavior to provide business insights that improve decision-making.

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Plotly, Dash, Matplotlib, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Customer Segmentation
  • Outcome: The project identified best-selling products, peak sales periods, and segmented customers based on purchasing behavior. The insights support inventory planning and targeted marketing strategies.

    View Project
    maps_pins

    Inside Airbnb: Data Analysis

    Objective: To analyze Inside Airbnb data and to come up with business insights and powerful visualizations

    Technologies Used:

  • Programming Languages: Python
  • Libraries/Tools: Pandas, Matplotlib, Scikit-learn
  • Techniques: Data Cleaning, Exploratory Data Analysis (EDA), Data Visualization, Geo mapping, Time-series Analysis
  • Outcome:In progress.

    View Project

    Power BI Projects

    These Power BI projects demonstrate my ability to turn data into actionable insights through interactive dashboards. Using DAX and visualization techniques, I create clear, data-driven stories for better decision-making.

    Kwanza Tukule Dashboard

    Kwanza Tukule: Sales Performance Dashboard

    Objective: This Power BI dashboard enhances my Python-based sales analysis by providing interactive visualizations that simplify trend tracking and product performance monitoring.

    Tech Stack:

  • Python (Data Preprocessing)
  • DAX (Measures & Calculations)
  • Power BI (Visualization & Dashboarding)
  • Key Insights & Visuals:

    • Total Revenue & Quantity Sold – Overview of total sales and units sold.
    • Monthly Revenue Trends – Identifies peak sales periods.
    • Top 10 Best-Selling Products – Highlights highest revenue-generating items.
    • Category-Wise Revenue Breakdown – Analyzes sales distribution across product categories.
    • Top 10 Businesses by Revenue – Shows the highest-earning businesses.

    Outcome: Power BI made data storytelling more efficient, offering a dynamic and user-friendly way for stakeholders to explore business performance.

    Download Report
    Inside Airbnb Dashboard

    Inside Airbnb: Data Visualization

    Objective: This Power BI dashboard provides key insights into Airbnb listings, pricing trends, and booking patterns. It enhances my Python-based data analysis by delivering interactive visualizations that make market trends and business insights easily accessible.

    Tech Stack:

  • Programming Languages: Python (Data Preprocessing)
  • Measuring & Calculations: DAX
  • Visualization: Power BI
  • Key Insights & Visuals:

    • Pricing Trends: Analyzes changes in average Airbnb prices over time.
    • Occupancy & Availability: Identifies peak booking periods and seasonal demand.
    • Revenue Insights: Highlights top-earning listings and high-revenue locations.
    • Time-Series Analysis: Tracks demand fluctuations and booking trends over time.

    Outcome: By integrating Power BI, I enhanced my ability to present Airbnb market trends interactively. This dashboard enables users to explore pricing patterns, identify competitive locations, and make data-driven business decisions.

    Download Report

    Get more details about my experience and projects on my CV.

    CV

    Let’s connect!

    You can reach me through email or any of these platforms.