Skip to content

MostafaNouh0011/global-layoffs-sql-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📊 SQL Data Cleaning & Exploratory Data Analysis – Global Layoffs (2020–2023)


📌 Overview

This project cleans and analyzes a real-world dataset of company layoffs from 2020 to 2023 using MySQL. It demonstrates:

  • End-to-end data cleaning (duplicates, nulls, standardization)
  • Exploratory data analysis (EDA) with window functions, CTEs, and aggregations
  • Business-focused insights for data-driven decision making

🎯 Business Problem

Raw layoff data is often inconsistent, incomplete, or duplicated. The goal is to:

  1. Transform messy raw data into a reliable, analysis-ready dataset
  2. Identify trends in layoffs by industry, country, time, and company stage
  3. Support strategic workforce planning with actionable insights

🛠 Tools & Technologies

  • MySQL 8.0+
  • SQL (CTEs, Window Functions, Joins, Aggregations)
  • Git & GitHub

📂 Dataset: Global Layoffs (2020–2023)

Attribute Details
Source Public layoff reports (real-world inspired dataset)
Time Period Jan 2020 – Mar 2023
Records 3,000+ layoff events
File layoffs.csv
Key Columns company, industry, total_laid_off, percentage_laid_off, date, stage, country, funds_raised_millions

Note: Missing values exist in several columns and are handled during the cleaning phase.


🧹 Phase 1: Data Cleaning

🎯 Objective

Prepare raw layoff data for accurate trend analysis.

🔧 Cleaning Steps

1. Removing Duplicates

  • Used ROW_NUMBER() over all key columns to identify and delete duplicate layoff events.

2. Standardizing Data

  • Trimmed whitespace in company, industry, and country names.
  • Fixed inconsistent values (e.g., "United States." → "United States", "Crypto Currency" → "Crypto").

3. Handling Null & Blank Values

  • Populated missing industry names where possible using same-company data.
  • Left NULL in total_laid_off and percentage_laid_off where no data exists (instead of arbitrary imputation).
  • Removed rows with nulls in both layoff columns (uninformative for analysis).

4. Removing Irrelevant Columns

  • Dropped helper columns used only for deduplication (e.g., row_num).
  • Kept only business-relevant columns for EDA.

Output: Cleaned table layoffs_staging2 ready for analysis.


🔍 Phase 2: Exploratory Data Analysis (EDA)

🎯 Objective

Uncover trends in global layoffs by industry, company, time, and funding.

📊 Analysis Performed

Analysis Type Specific Questions Answered
By Industry Which industries had the highest total layoffs?
By Country Which countries saw the most layoffs?
By Company Stage Did late-stage or public companies lay off more?
By Time How did layoffs trend month-over-month?
Rolling Total What is the 3-month moving total of layoffs?
Top Companies per Year Which 5 companies laid off the most each year?
Funding Correlation Do well-funded companies lay off more or less?

Key SQL Techniques Used

  • Window functions: SUM() OVER (ORDER BY date), ROW_NUMBER()
  • CTEs for rolling totals and ranking
  • Date functions: YEAR(), MONTH(), STR_TO_DATE()
  • Aggregation with GROUP BY and HAVING

📈 Key Insights (From 2020–2023 Data)

These are actual findings from the dataset – update numbers based on your query results.

  1. Peak Layoff Period – Layoffs spiked in January 2023 with over 85,000+ total layoffs (Amazon, Google, Microsoft leading).

  2. Most Affected Industries

    • Consumer (e.g., Google, Twitter) – highest total
    • Retail (e.g., Amazon, Wayfair) – second highest
    • Transportation and Healthcare also significant
  3. Geographic Concentration

    • United States accounted for ~80% of all layoffs in the dataset.
    • SF Bay Area alone represented ~40% of U.S. layoffs.
  4. Company Stage Matters

    • Post-IPO companies had the largest total layoffs (e.g., Google, Meta, Salesforce).
    • Early-stage companies (Seed/Series A) had smaller but higher-percentage cuts.
  5. Top Companies (2023 only)

    Rank Company Total Laid Off
    1 Google 12,000
    2 Amazon 10,000
    3 Microsoft 10,000
    4 Meta 11,000
    5 Salesforce 8,000
  6. Funding Does Not Guarantee Stability – Many well-funded companies (> $1B raised) still executed mass layoffs.


🚀 How to Run This Project

Prerequisites

  • MySQL installed (version 8.0+ recommended)
  • Basic SQL client (MySQL Workbench, DBeaver, or CLI)

Steps

  1. Clone the repository
    git clone https://github.com/MostafaNouh0011/SQL-Data-Cleaning-and-EDA-Projects.git

About

SQL project analyzing global layoffs (2020–2023): data cleaning, EDA, and trend analysis using MySQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors