E-Commerce Product Categorization

Project Overview

This project focuses on E-Commerce Product Categorization, where products are classified into predefined categories based on their textual descriptions. The dataset consists of product descriptions and corresponding categories. Multiple machine learning and deep learning models have been implemented to solve this multi-class classification problem.

Dataset Description

The dataset contains the following features:

Product Description: Textual information about the product.
Product Category: Target labels representing the category of each product.

The dataset is preprocessed to clean text by removing punctuation, stopwords, and performing lowercasing. The cleaned text is then converted into numerical features for modeling.

Project Workflow

Data Preprocessing:
- Cleaning text (lowercase, punctuation, and stopwords removal).
- Feature extraction using CountVectorizer and TF-IDF Vectorizer.
Model Training:
- Traditional ML models: Logistic Regression, Random Forest, and Multinomial Naive Bayes.
- Deep learning model: LSTM (Long Short-Term Memory).
Model Evaluation:
- Metrics such as Accuracy, Precision, Recall, F1-Score, and Confusion Matrix.
Hyperparameter Tuning:
- Grid Search and Random Search for optimal parameters.

Models Implemented

1. Machine Learning Models

Logistic Regression:
- Suitable for linear classification tasks.
- Used with CountVectorizer and TF-IDF Vectorizer features.
Random Forest Classifier:
- Ensemble learning method based on decision trees.
- Effective for non-linear data patterns.
Multinomial Naive Bayes:
- Probabilistic model well-suited for text classification.
- Works effectively with frequency-based vectorization.

2. Deep Learning Model

LSTM (Long Short-Term Memory):
- Handles sequential data effectively.
- Utilized embedding layers, dropout, and LSTM layers for learning.
- Tokenized and padded sequences were used as input.

Evaluation Metrics

Accuracy: Measures overall correctness of predictions.
Precision: Focuses on correctly predicted positive cases.
Recall: Captures how well the model identifies true positives.
F1-Score: Balances Precision and Recall.
Confusion Matrix: Visualizes prediction performance for each class.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
E-Commerce Product Categorization System Presentation.pptx		E-Commerce Product Categorization System Presentation.pptx
E-commerce-product-categorization.ipynb		E-commerce-product-categorization.ipynb
Product Categorizaton.ipynb		Product Categorizaton.ipynb
README.md		README.md
test_data.csv		test_data.csv
test_results.csv		test_results.csv
train_product_data.zip		train_product_data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-Commerce Product Categorization

Project Overview

Table of Contents

Dataset Description

Project Workflow

Models Implemented

1. Machine Learning Models

2. Deep Learning Model

Evaluation Metrics

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

E-Commerce Product Categorization

Project Overview

Table of Contents

Dataset Description

Project Workflow

Models Implemented

1. Machine Learning Models

2. Deep Learning Model

Evaluation Metrics

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages