This repository contains a Jupyter Notebook for fine-tuning DistillBERT on a sentiment analysis dataset. The model is trained using TensorFlow and Hugging Face's transformers library to classify tweets into sentiment categories.
The dataset used for training is Tweets.csv, which contains airline-related tweets labeled with sentiment categories (positive, neutral, negative).
- Load dataset (
Tweets.csv) - Check for missing values and class balance
- Convert text to lowercase
- Remove unnecessary columns
- Visualize word frequency using a Word Cloud
- Convert text into tokenized inputs (
input_ids,attention_mask) - Use Hugging Face
DistilBertTokenizer - Ensure proper padding and truncation
- Map tokenized inputs to a TensorFlow dataset format
- Prepare training and testing sets
- Load
DistilBertForSequenceClassification - Define loss function and optimizer
- Train model using TensorFlow/Keras
- Predict sentiment on test data
- Compute accuracy, precision, recall, and F1-score
- Generate a classification report
To run this notebook, install the following dependencies:
pip install numpy pandas matplotlib seaborn nltk tensorflow transformers scikit-learn tqdm plotly- Clone this repository:
git clone https://github.com/awais-124/fine-tuning-distilbert.git
cd fine-tuning-distilbert- Run the Jupyter Notebook:
jupyter notebook CODE.ipynb