L o a d i n g

Helping Beatpulse find the beat in their library


Building a vector database powered music library

We connected with the BeatpulseLabs team in early 2024 through Techstars and were immediately intrigued by the opportunity for Context Data to assist in scaling Beatpulse’s mission: providing royalty-free audio and music to studios and companies within the audio machine learning space.

Background

Over the past year, the use of copyrighted music and audio for training LLMs (Large Language Models) has faced increased backlash and regulation. The European Union has even enacted specific legislation requiring companies utilizing generative AI or foundational AI models to provide detailed reports on any copyrighted works, including music, used in training.

Given its partnerships with numerous independent artists and producers worldwide, Beatpulse was perfectly positioned to fill this market gap by offering a collection of aggregated, non-copyrighted audio to these companies.

Identifying Challenges

As we delved deeper into Beatpulse’s internal operations, it became evident that one of their major challenges was efficiently searching through their vast audio library to meet client needs. Unlike traditional music services like Spotify or Tidal, Beatpulse’s clients—primarily enterprise users—search for audio using specific sound attributes.

Clients often seek tracks based on criteria such as instrumentation (e.g., piano or drums), genre (e.g., hip-hop, jazz, or classical), or tempo (e.g., 100+ BPM). Despite Beatpulse’s diligent efforts to generate and categorize these characteristics, the process of searching and delivering results remained cumbersome and highly manual.

Our Approach and Solution

Beatpulse requested an assessment of their platform and library. Following our review, we proposed and built a solution to automate the search process—developing a vector search platform tailored to their music library, leveraging their existing categorization framework.

Data Preparation
  • 01 - Extracted necessary characteristics from each track in their Amazon S3-hosted library and built a JSON representation of the library.
  • 02 - Converted the JSON data into vector embeddings using OpenAI’s text-embedding-ada-2 model.
  • 03 - Indexed the vector embeddings in a Pinecone database.
Search Interface Development

With the data stored in the Pinecone index, we created a web-based search interface that:

  • 01 - Accepted text input and checkbox prompts from users.
  • 02 - Combined these inputs into a query and converted them into vector embeddings using OpenAI’s text-embedding-ada-2 model.
  • 03 - Queried the Pinecone database index using the vector embeddings to deliver results rapidly and accurately.
Impact and Results

We are thrilled about our collaboration with Beatpulse. Their team has expressed immense satisfaction with the solutions we’ve developed, which have significantly accelerated their search process, enhancing their service offerings to clients.