A month of leveling up SoundSoar: I solidified Spotify-powered data pipelines, automated sync, ran feature-importance analysis, and improved the ML models for trend prediction. I also added auth groundwork and a curated “trending-up” playlist.
Introduction
This month was a pivotal period of growth and refinement for my capstone project, SoundSoar. I focused on solidifying data processing pipelines and enhancing machine learning models to deliver more accurate trend predictions. A key focus was feature importance analysis, which helped surface the Spotify attributes (tempo, valence, danceability, popularity, and more) that matter most. Getting population history right was essential for reliable trend signals and better overall performance.
The cover image shows how users engage with trending songs and individual track details.
Features developed
-
Spotify authentication and API integration
I implemented a login flow using
django-allauth
to prepare for Spotify SSO. The Spotify Web API is integrated to fetch playlists, track details, and audio features. This real-time data powers trend prediction and user playlists. -
Data management
I built a pipeline to structure and store Spotify data, including popularity history, audio features, and retrieval frequency. The relational schema keeps track attributes dynamic and versioned over time. I added validation steps before each update to prevent duplication and inconsistency.
-
Automating data sync
To avoid manual updates, I automated data retrieval with Windows Task Scheduler calling PowerShell scripts that trigger Python functions. This keeps the dataset fresh and continuously improves trend predictions.
-
Feature importance analysis
I used Random Forest and HistGradientBoosting to rank features by their contribution to predictions. The analysis confirmed which Spotify attributes influence outcomes most and helped tighten the model on the most relevant inputs.
-
Trend prediction model
The model analyzes and forecasts song popularity over time. Using the selected feature set (valence, tempo, danceability, historical popularity metrics, and more), I trained models and tuned hyperparameters to improve accuracy. The output classifies songs into trending categories so users can discover emerging hits.
For persistence, I serialize trained models with
joblib
and update them on a schedule as new Spotify data arrives. This keeps predictions aligned with changing trends. The cover image includes a snapshot of active and historical models that shows performance over time. -
Custom playlist integration
I defined a custom playlist type for charts and added the SoundSoar Suggestions playlist with the top 25 trending-up tracks, driven by the prediction pipeline.
Retrospective
What went right
- Seamless integration path for Spotify auth using
django-allauth
. - Automated data sync reduced manual overhead and kept the dataset fresh.
- Refined popularity metrics and feature importance improved model focus and accuracy.
- Interactive plots (via Plotly) made exploration and insight discovery easier.
Challenges
- Initial plan to include sentiment analysis slowed progress; narrowing scope to streams helped.
- AWS Lightsail configuration and environment setup required extra troubleshooting time.
- Early visualization attempts with Bokeh ran into install and config friction, prompting the switch to Plotly.
What I learned and what is next
- Protect scope and land core features first; add sentiment analysis later if time allows.
- Start data collection earlier to de-risk model training and evaluation.
- Continue sharpening pipelines, hyperparameter tuning, and validation.
- Level up comfort with Unix environments to prepare for deployment and scale.