ABOUT BEAM

 

Introduction
Since the 1960s, various Brain-Computer (Musical) Interfaces (BCI or BCI) have been developped to enrich musical performance, in recent years with remarkable ways and novel architectures. Le Groux et al. [LMV10] introduced the The Multimodal Brain Orchestra for the generation and modulation of musical material using the EEG signals of its three musicians and conductor. Mullen et al. [MWJ11] create a system for real-time, internet-enabled manipulation of robotic instruments including a physical audience, a virtual (Internet) audience, a composer/conductor and a brainist, a solitary performer who manipulated the instsruments from his EEG signals. Hamano et al. [HRT13] developed a system for adding expressions emulating the expressivity of traditional acoustic instruments. Tokunaga and Lyons [TL13] created the Enactive Mandala which generates ambient music and animated visual music according to EEG signas. Levicán [LAB17] used facial expressions to control different sound synthesis algorithms. Macionis et al. [MK19] incorporated in a mixed-media installation an individual’s overall state of relaxation motivating the individual to relax further. Ramchurn et al. [RMM19] proposed BCI-based Live Score Performance System which generates a unique film based on a member of the audience brainwaves data.

Most of these systems either affect the musical performance in an abstract and non-explicit way, or focus on the multi-studied features of concentration, stress and relaxation. The aim of this project was to study specific unexplored emotional states of musicians as they performing so that we can detect them in a real-time live performance and exploit them for the instant reshaping of their sound, without necessarily the performer’s awareness.

Juslin and Laukka [JL03] conclude that emotions can be communicated on different instruments that provide to the performer’s disposal relatively different acoustic cues, which largely reflect the musician’s sound production mechanisms. However, as the emotional state is highly subjective, we assume that the self-reported state of musicians who participated in the training procedure was valid.

System Architecture
As tree boosting has been shown to give state-of-the-art results on various classification tasks we used the Xgboost system [CG16]. Boosted Trees are a popular machine learning method whose advantages are effectiveness, scalability and partial interpretability of results. This model uses an ensemble of weak classifiers, to produce the final result. We used the xgboost python package and trained classifiers with different combinations of features and sequence lengths.

Our system consists of an EEG biosensor, which detects the brainwave signals, a software of analysing the biometric data and detecting in real-time the musician’s happiness or sadness, and a multi effect processor for reshaping the sound.

Training
We conducted an experiment in order to gather data relating to a musician’s brain activity in two different emotional states, happiness and sadness. To achieve this we acquired the EEG signals of four musicians, as they listened to music and as they performed a song, in two cases for these two different emotional states, which where self reported by the musicians.
This choice is considered quite crucial and was made due to the fact that for categorizing brain signals we need the actual emotion, which is what the user gives us and not what accompanies the track. These two values are not always identical since the feeling that a listener or a musician feels when listening to or playing a piece is purely subjective [SSH13].

Particularly, for each musician we kept track of their alpha, beta, theta, delta and gamma values, with a non-invasive wearable biosensor (headset). The songs were selected by the musicians for each of the aforementioned emotional states and belonged to different genres of music. For each case (listening, performing) and for each state (happiness, sadness) we recorded their EEG values for five minutes, with sampling rate 10 sec, i.e., 80 minutes and 480 samples. For the performing case we also recorded the performance in wav format.

In total this process took about 20 minutes for each participant. Our system records the measurements every 0.2 sec. We have therefore stored 4 matrices (one for each segment of the experiment) $T\in R^{(c \times t)}$ for each subject, with $c$ being the 8 different signals recorded and $t$ being the number of measurements made for this signal (5 minutes x 60 second x 0.2 resolution = 1500 samples).
The system uses 3 of the 4 subjects for training and the fourth for testing. In total, the dataset had 16260 training and 5520 testing samples.

For identifying the emotional state of each user, we used XGBoost. To find the optimum hyper-parameter values of the network we run a grid search, testing the model’s performance for 9 different hyper-parameters. From the above run we achieved acc: 98.3\% for a 7-fold cross validation, while in the test set we have 89\% accuracy.

Example Configuration
A biometric sensor was sending the EEG signals data from a bass player as he was improvising. Our emotion detecting algorithm was sending midi messages to his multi effect processor, with his feelings of happiness to augment the chorus value and sadness to augment the reverb value.
Conclusions
We have shown that it is possible to implement a multimedia performance during which various elements will be controlled by specific emotional states of the performers, with optional integration of the spectator, in order to observe and interact with the occurring unexpected changes in the performance.
Future Work
A larger-scale experiment could be conducted with more musicians, sensors, and emotional states under consideration. EEG signals could be studied separately in terms of studying or performance from the musician’s point of view, as well as conditions of solitary or collective listening of a live performance concerning the audience. In addition, the correlation of the brainwaves between the audience members but also between the musicians and audience could be explored. Another research direction could be the retrieval of a musician’s emotional state given an audio sample. In each case, by acquiring a large enough dataset we could explore the use of deep learning for emotion recognition through musical expression or listening.
Ethical Standards
Individuals voluntarily participated in the data collection procedure, as we recorded their EEG signals while playing and listening to their selected songs, with full knowledge of relevant risks and benefits.