Βλέπετε μια παλιά εκδήλωση. Η πώληση εισιτηρίων και η υποβολή εργασιών έχουν κλείσει.

Exploring Non-Autoregressive Transformers for Efficient Adaptive Music Composition

Συγγραφείς

Αλέξης Σπηλιωτόπουλος, Σπύρος Πολυχρονόπουλος, Ιωάννης Παναγάκης

Σύνοψη

The adoption of artificial intelligence (AI) in music composition is revolutionizing the process of music creation, providing groundbreaking tools and concepts for composition. Since the mid-20th century, and increasingly in recent years due to the advances in deep learning and the availability of large amounts of data, AI has enabled the automatic composition of music, assisting in the creation of new musical ideas and improving processes including arranging and mixing. However, achieving low-latency music generation remains a major challenge even with today's computational power. This is a rather fundamental requirement for applications requiring real-time interaction, such as live performances and dynamic soundtracks for video games and virtual reality. Furthermore, another significant and active challenge is the coherency and expressiveness of AI-generated music, as these qualities are essential for maintaining the emotional impact and artistic integrity that audiences expect from high-quality musical experiences. Overcoming these barriers, will allow AI to propose compositional ideas that will  transform how humans perceive and interact with music. Automated music composition has evolved significantly since the primitive rule-based systems (ex. Illiac Suite, 1957) that use predefined algorithms to probabilistic models like Markov chains which provide much more diversity. The emergence of neural networks, particularly Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM) networks, proved to be a game-changer as it opened the space for longer-term dependencies and generated more coherent pieces. Recently, deep learning and generative models have further transformed the field. Generative Adversarial Networks (GANs) introduced a new approach by using two competing neural networks to produce realistic outputs. The most groundbreaking developments have been through Transformer models, originally designed for natural language processing (NLP), are known to excel at capturing long-range dependencies and have also shown great potential in the context of music generation. These complex models can be trained with massive musical datasets to generate intricate compositions. Recent research efforts are targeted at generating music with higher quality, longer-range coherence, and expressive content. The field is rapidly evolving, driven by AI advancements, the computational power, and the increasing availability of high-quality data. Despite significant advancements, several technical challenges still remain. Among these are the necessity for vast amounts of high-quality data to train models effectively, especially Transformers, which require extensive datasets to learn intricate musical patterns and structures. Further, computational resources and time needed to train these models are substantial, often limiting accessibility. Ensuring AI-generated music is coherent and expressive remains a complex task, as models must capture nuanced elements of human composition. Achieving real-time and adaptive generation performance poses another significant hurdle, requiring systems to respond instantaneously to user inputs or environmental changes without sacrificing quality. These challenges are particularly relevant in live performances (where for example musicians perform along with a real-time AI-generated accompaniment) and in interactive media like video games and virtual reality (where music must adapt dynamically to user’s actions). This paper explores the application of a non-autoregressive transformer encoder-based model for adaptive musical accompaniment. By utilizing a pretrained model, the study aims to dynamically generate coherent musical continuations. Rather than generating music in an autoregressive manner, various sampling techniques are employed to ensure that the generated music remains stylistically appropriate and expressive. Achieving low-latency music generation is a key objective, essential for applications requiring real-time interaction. This work focuses on the difficulties of guaranteeing coherence, expressiveness, and adaptability potential of the model, and aims to contribute on the development of more responsive and adaptive AI-driven music generation systems, paving the way for innovative applications in the field.