This repository contains two end-to-end Machine Learning projects built using Python and Multiple Linear Regression, following the complete Machine Learning Life Cycle.
Each project focuses on solving a real-world business problem using data analysis, visualization, and predictive modeling.
| Project Name | Technique | Domain |
|---|---|---|
| 50 Startups Profit Prediction | Multiple Linear Regression | Business Analytics |
| Toyota Corolla Price Prediction | Multiple Linear Regression | Automobile Analytics |
Predict the profit of startups based on their investments in:
- R&D Spend
- Administration Spend
- Marketing Spend
- State
This helps stakeholders understand which investments drive profitability and supports better financial decision-making.
- Analyze the impact of different expenditures on profit
- Build and compare multiple regression models
- Improve prediction accuracy using feature transformations
- Select the best model using performance metrics
- Language: Python
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
- IDE: Jupyter Notebook
- Version Control: Git & GitHub
- Business Problem Understanding
- Data Collection & Understanding
- Data Cleaning & Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Encoding & Transformation
- Model Building (Multiple Linear Regression)
- Model Evaluation (R², RMSE)
- Model Comparison & Selection
- Insights & Business Interpretation
- R&D Spend has the highest positive impact on profit
- Administration Spend has minimal influence
- Marketing Spend contributes moderately to profit
- Location (State) has limited numerical impact
- Optimized models achieved improved R² score, indicating better prediction accuracy
- Helps startups prioritize R&D investments
- Supports data-driven budgeting decisions
- Enables investors to evaluate profitability drivers
Predict the resale price of Toyota Corolla cars using historical and technical features such as:
- Age of the car
- Kilometers driven
- Fuel type
- Horsepower (HP)
- Transmission type
- Additional vehicle features
This supports used-car dealers and customers in fair and accurate price estimation.
- Identify key factors affecting car resale price
- Perform detailed EDA and feature analysis
- Build regression models and improve performance
- Minimize prediction error
- Language: Python
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
- IDE: Jupyter Notebook
- Version Control: Git & GitHub
- Problem Definition
- Data Understanding
- Handling Missing Values & Outliers
- Exploratory Data Analysis (EDA)
- Feature Selection & Encoding
- Model Training (Multiple Linear Regression)
- Model Evaluation (R², RMSE)
- Model Optimization
- Insights & Conclusions
- Car Age and KM driven have a strong negative impact on price
- Fuel Type significantly affects resale value
- Automatic transmission cars tend to have higher resale prices
- Feature transformations improved model accuracy
- Final model provides a good balance between interpretability and performance
- Enables accurate pricing for used-car dealers
- Builds customer trust through transparent valuation
- Reduces losses caused by underpricing or overpricing