Fine-Tuning DistillBERT for Sentiment Analysis

Overview

This repository contains a Jupyter Notebook for fine-tuning DistillBERT on a sentiment analysis dataset. The model is trained using TensorFlow and Hugging Face's transformers library to classify tweets into sentiment categories.

Dataset

The dataset used for training is Tweets.csv, which contains airline-related tweets labeled with sentiment categories (positive, neutral, negative).

Steps Covered

1. Data Preprocessing

Load dataset (Tweets.csv)
Check for missing values and class balance
Convert text to lowercase
Remove unnecessary columns
Visualize word frequency using a Word Cloud

2. Tokenization

Convert text into tokenized inputs (input_ids, attention_mask)
Use Hugging Face DistilBertTokenizer
Ensure proper padding and truncation

3. Feature Mapping

Map tokenized inputs to a TensorFlow dataset format
Prepare training and testing sets

4. Model Training

Load DistilBertForSequenceClassification
Define loss function and optimizer
Train model using TensorFlow/Keras

5. Evaluation

Predict sentiment on test data
Compute accuracy, precision, recall, and F1-score
Generate a classification report

Requirements

To run this notebook, install the following dependencies:

pip install numpy pandas matplotlib seaborn nltk tensorflow transformers scikit-learn tqdm plotly

Running the Notebook

Clone this repository:

git clone https://github.com/awais-124/fine-tuning-distilbert.git
cd fine-tuning-distilbert

Run the Jupyter Notebook:

jupyter notebook CODE.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CODE.ipynb		CODE.ipynb
README.md		README.md
Tweets.csv		Tweets.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-Tuning DistillBERT for Sentiment Analysis

Overview

Dataset

Steps Covered

1. Data Preprocessing

2. Tokenization

3. Feature Mapping

4. Model Training

5. Evaluation

Requirements

Running the Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning DistillBERT for Sentiment Analysis

Overview

Dataset

Steps Covered

1. Data Preprocessing

2. Tokenization

3. Feature Mapping

4. Model Training

5. Evaluation

Requirements

Running the Notebook

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages