This project cleans and analyzes a real-world dataset of company layoffs from 2020 to 2023 using MySQL. It demonstrates:
- End-to-end data cleaning (duplicates, nulls, standardization)
- Exploratory data analysis (EDA) with window functions, CTEs, and aggregations
- Business-focused insights for data-driven decision making
Raw layoff data is often inconsistent, incomplete, or duplicated. The goal is to:
- Transform messy raw data into a reliable, analysis-ready dataset
- Identify trends in layoffs by industry, country, time, and company stage
- Support strategic workforce planning with actionable insights
- MySQL 8.0+
- SQL (CTEs, Window Functions, Joins, Aggregations)
- Git & GitHub
| Attribute | Details |
|---|---|
| Source | Public layoff reports (real-world inspired dataset) |
| Time Period | Jan 2020 – Mar 2023 |
| Records | 3,000+ layoff events |
| File | layoffs.csv |
| Key Columns | company, industry, total_laid_off, percentage_laid_off, date, stage, country, funds_raised_millions |
Note: Missing values exist in several columns and are handled during the cleaning phase.
Prepare raw layoff data for accurate trend analysis.
- Used
ROW_NUMBER()over all key columns to identify and delete duplicate layoff events.
- Trimmed whitespace in company, industry, and country names.
- Fixed inconsistent values (e.g., "United States." → "United States", "Crypto Currency" → "Crypto").
- Populated missing industry names where possible using same-company data.
- Left
NULLintotal_laid_offandpercentage_laid_offwhere no data exists (instead of arbitrary imputation). - Removed rows with nulls in both layoff columns (uninformative for analysis).
- Dropped helper columns used only for deduplication (e.g.,
row_num). - Kept only business-relevant columns for EDA.
Output: Cleaned table
layoffs_staging2ready for analysis.
Uncover trends in global layoffs by industry, company, time, and funding.
| Analysis Type | Specific Questions Answered |
|---|---|
| By Industry | Which industries had the highest total layoffs? |
| By Country | Which countries saw the most layoffs? |
| By Company Stage | Did late-stage or public companies lay off more? |
| By Time | How did layoffs trend month-over-month? |
| Rolling Total | What is the 3-month moving total of layoffs? |
| Top Companies per Year | Which 5 companies laid off the most each year? |
| Funding Correlation | Do well-funded companies lay off more or less? |
- Window functions:
SUM() OVER (ORDER BY date),ROW_NUMBER() - CTEs for rolling totals and ranking
- Date functions:
YEAR(),MONTH(),STR_TO_DATE() - Aggregation with
GROUP BYandHAVING
These are actual findings from the dataset – update numbers based on your query results.
-
Peak Layoff Period – Layoffs spiked in January 2023 with over 85,000+ total layoffs (Amazon, Google, Microsoft leading).
-
Most Affected Industries
- Consumer (e.g., Google, Twitter) – highest total
- Retail (e.g., Amazon, Wayfair) – second highest
- Transportation and Healthcare also significant
-
Geographic Concentration
- United States accounted for ~80% of all layoffs in the dataset.
- SF Bay Area alone represented ~40% of U.S. layoffs.
-
Company Stage Matters
- Post-IPO companies had the largest total layoffs (e.g., Google, Meta, Salesforce).
- Early-stage companies (Seed/Series A) had smaller but higher-percentage cuts.
-
Top Companies (2023 only)
Rank Company Total Laid Off 1 Google 12,000 2 Amazon 10,000 3 Microsoft 10,000 4 Meta 11,000 5 Salesforce 8,000 -
Funding Does Not Guarantee Stability – Many well-funded companies (> $1B raised) still executed mass layoffs.
- MySQL installed (version 8.0+ recommended)
- Basic SQL client (MySQL Workbench, DBeaver, or CLI)
- Clone the repository
git clone https://github.com/MostafaNouh0011/SQL-Data-Cleaning-and-EDA-Projects.git