Skip to content

shubhamgoyal575/ECommerce-Product-Categorization

Repository files navigation

E-Commerce Product Categorization

Project Overview

This project focuses on E-Commerce Product Categorization, where products are classified into predefined categories based on their textual descriptions. The dataset consists of product descriptions and corresponding categories. Multiple machine learning and deep learning models have been implemented to solve this multi-class classification problem.


Table of Contents

  1. Dataset Loading
  2. EDA
  3. Data Preprocessing
  4. Models Implemented
  5. Evaluation Metrics
  6. Results and Comparison

Dataset Description

The dataset contains the following features:

  • Product Description: Textual information about the product.
  • Product Category: Target labels representing the category of each product.

The dataset is preprocessed to clean text by removing punctuation, stopwords, and performing lowercasing. The cleaned text is then converted into numerical features for modeling.


Project Workflow

  1. Data Preprocessing:

    • Cleaning text (lowercase, punctuation, and stopwords removal).
    • Feature extraction using CountVectorizer and TF-IDF Vectorizer.
  2. Model Training:

    • Traditional ML models: Logistic Regression, Random Forest, and Multinomial Naive Bayes.
    • Deep learning model: LSTM (Long Short-Term Memory).
  3. Model Evaluation:

    • Metrics such as Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.
  4. Hyperparameter Tuning:

    • Grid Search and Random Search for optimal parameters.

Models Implemented

1. Machine Learning Models

  • Logistic Regression:
    • Suitable for linear classification tasks.
    • Used with CountVectorizer and TF-IDF Vectorizer features.
  • Random Forest Classifier:
    • Ensemble learning method based on decision trees.
    • Effective for non-linear data patterns.
  • Multinomial Naive Bayes:
    • Probabilistic model well-suited for text classification.
    • Works effectively with frequency-based vectorization.

2. Deep Learning Model

  • LSTM (Long Short-Term Memory):
    • Handles sequential data effectively.
    • Utilized embedding layers, dropout, and LSTM layers for learning.
    • Tokenized and padded sequences were used as input.

Evaluation Metrics

  • Accuracy: Measures overall correctness of predictions.
  • Precision: Focuses on correctly predicted positive cases.
  • Recall: Captures how well the model identifies true positives.
  • F1-Score: Balances Precision and Recall.
  • Confusion Matrix: Visualizes prediction performance for each class.

About

This project classifies e-commerce products into predefined categories using machine learning. It includes preprocessing steps like stopword removal, punctuation cleaning, and feature extraction. Models, including LSTM, are implemented, and evaluated for better accuracy.

Topics

Resources

Stars

Watchers

Forks

Contributors