Tori Grasso

A month of leveling up SoundSoar: I solidified Spotify-powered data pipelines, automated sync, ran feature-importance analysis, and improved the ML models for trend prediction. I also added auth groundwork and a curated “trending-up” playlist.

Introduction

This month was a pivotal period of growth and refinement for my capstone project, SoundSoar. I focused on solidifying data processing pipelines and enhancing machine learning models to deliver more accurate trend predictions. A key focus was feature importance analysis, which helped surface the Spotify attributes (tempo, valence, danceability, popularity, and more) that matter most. Getting population history right was essential for reliable trend signals and better overall performance.

The cover image shows how users engage with trending songs and individual track details.

Features developed

Spotify authentication and API integration

I implemented a login flow using django-allauth to prepare for Spotify SSO. The Spotify Web API is integrated to fetch playlists, track details, and audio features. This real-time data powers trend prediction and user playlists.
Data management

I built a pipeline to structure and store Spotify data, including popularity history, audio features, and retrieval frequency. The relational schema keeps track attributes dynamic and versioned over time. I added validation steps before each update to prevent duplication and inconsistency.
Automating data sync

To avoid manual updates, I automated data retrieval with Windows Task Scheduler calling PowerShell scripts that trigger Python functions. This keeps the dataset fresh and continuously improves trend predictions.
Feature importance analysis

I used Random Forest and HistGradientBoosting to rank features by their contribution to predictions. The analysis confirmed which Spotify attributes influence outcomes most and helped tighten the model on the most relevant inputs.
Trend prediction model

The model analyzes and forecasts song popularity over time. Using the selected feature set (valence, tempo, danceability, historical popularity metrics, and more), I trained models and tuned hyperparameters to improve accuracy. The output classifies songs into trending categories so users can discover emerging hits.

For persistence, I serialize trained models with joblib and update them on a schedule as new Spotify data arrives. This keeps predictions aligned with changing trends. The cover image includes a snapshot of active and historical models that shows performance over time.
Custom playlist integration

I defined a custom playlist type for charts and added the SoundSoar Suggestions playlist with the top 25 trending-up tracks, driven by the prediction pipeline.

Retrospective

What went right

Seamless integration path for Spotify auth using django-allauth.
Automated data sync reduced manual overhead and kept the dataset fresh.
Refined popularity metrics and feature importance improved model focus and accuracy.
Interactive plots (via Plotly) made exploration and insight discovery easier.

Challenges

Initial plan to include sentiment analysis slowed progress; narrowing scope to streams helped.
AWS Lightsail configuration and environment setup required extra troubleshooting time.
Early visualization attempts with Bokeh ran into install and config friction, prompting the switch to Plotly.

What I learned and what is next

Protect scope and land core features first; add sentiment analysis later if time allows.
Start data collection earlier to de-risk model training and evaluation.
Continue sharpening pipelines, hyperparameter tuning, and validation.
Level up comfort with Unix environments to prepare for deployment and scale.

Introduction

Features developed

Spotify authentication and API integration

Data management

Automating data sync

Feature importance analysis

Trend prediction model

Custom playlist integration

Retrospective

What went right

Challenges

What I learned and what is next

Related