The goal of the project is see how machine learning can categorize the sentiment behind textual data. To do this, we trained a Gated Recurrent Unit (GRU) model on a labeled dataset of tweets.
Built and trained a single-layer GRU model with Keras/TensorFlow on a 70k-tweet dataset for six-class sentiment prediction, using tokenization and word embeddings.
Tuned hyperparameters via learning curves, analyzed misclassifications with confusion matrices, and observed model validity by visualizing embedded vectors with color maps.
Achieved 93% test accuracy with a 60k/10k train-test split, improving by 5% over other models.