Anna’s Archive Scrapes 300TB of Music and Metadata from Spotify in Massive 'Preservation' Effort
Anna's Archive has scraped 300TB of data from Spotify, including 86 million audio files and extensive metadata. While the group claims it is a preservation effort, Spotify confirms illicit DRM circumvention. Read more about this massive breach and its implications for the industry.

A smartphone displaying the Spotify logo
A "pirate activist" group known as Anna's Archive has claimed responsibility for scraping a massive portion of Spotify’s music library, amounting to approximately 300 terabytes of data. In a blog post published on [December 20, 2025], the group announced it had acquired over 86 million audio files and metadata for nearly 256 million tracks, framing the act as a necessary move to preserve human culture.
The Scale of the Breach
The sheer volume of data involved in this scrape is unprecedented for a streaming service. According to a report by PCMag, the group claims their dataset represents 99.6% of all Spotify listens. While the metadata has already been released via BitTorrent, the group plans to release the audio files in stages.
Here is a breakdown of the scraped data:
- Total Size: Approximately 300TB.
- Metadata: 256 million rows (including artist names, albums, and track details).
- Audio Files: 86 million songs.
- Audio Quality: Popular tracks were ripped in 160kbps (Ogg Vorbis), while less popular "long tail" tracks were compressed to lower bitrates to save space.
Spotify’s Response
Spotify has confirmed the incident, stating that they are actively investigating the breach. A spokesperson for the company told The Guardian that a third party had "scraped public metadata and used illicit tactics to circumvent DRM" to access the audio files.
The company noted that they have identified and disabled the accounts responsible for the scraping and have implemented new safeguards to prevent similar attacks in the future.
Anna’s Archive, which is widely known for hosting shadow libraries of academic papers and books, argues that this project is not about facilitating illegal downloads but about archival preservation. The group stated that they want to protect humanity’s musical heritage from potential threats like "natural disasters, wars, and budget cuts."
Security analysts at Bitdefender note that while the group frames this as a cultural project, it legally constitutes a large-scale breach of Spotify’s terms of service and copyright laws.
Beyond the immediate copyright infringement, experts are concerned about how this data might be used. With the rise of generative AI, a clean, organized dataset of this magnitude could be incredibly valuable and controversial for training AI models. Reports from CSO Online suggest that while the metadata is public, the distribution of the audio files could reignite debates over how tech companies source data for machine learning.



