Σύνοψη
Attributes or features from different sensory modalities are often systematically associated with each other. This process is referred to as cross-modal correspondence. Unlike synesthesia, where associations are concurrent or immediately consecutive, cross-modal correspondences can occur between attributes that share similar positions on scales of sensory dimensions with distinct degrees of intensity (Spence, 2011).
This study aims to organise existing cross-modal associations between the auditory and the visual modalities in the context of the research project Soundsketcher. Soundsketcher seeks to create a prototype application for music visualisation in the form of graphic scores. One of the fundamental goals is to base mappings between sonic and visual structures on associations derived from existing knowledge of audio-visual correspondences.
The most extensively studied relationships to date are those between pitch and loudness and their visual correspondences. One of the most prominent identified associations with pitch is elevation, where ‘high’ corresponds to high pitch and ‘low’ to low pitch (Athanasopoulos et al., 2016; Baret 2005; Walker, 1987), a phenomenon also observed in many languages around the world (Stumpf, 1883; Evans & Treisman, 2010). The horizontal dimension, apart from indicating the passage of time in a significant amount of free-drawn systems of 2D music representation (Tan & Kelly 2004, Athanasopoulos & Moran, 2013) has also been mapped onto pitch, with ‘left’ corresponding to lower pitches and ‘right’ to higher pitches (Küssner et al., 2014; Lidji et al., 2007). In addition, size is another visual attribute that has been related to pitch. Specifically, large objects have been linked to low pitches, while small objects to high pitches (Speed et al., 2021; Evans & Treisman, 2010). Such correspondences can be attributed to statistical regularities in the environment where organisms smaller in size and living in higher elevations are more likely to produce higher pitches (Spence, 2022). In addition, Western music notation or the mere structure of a piano keyboard can also account for such cross-modalities. Another visual parameter that has been associated with high-low pitches is brightness and dark-bright objects, respectively (Ward et al., 2006; Marks, 1974). A notable mention should be made for colour and notes of the chromatic scale, where identified correspondences have been contrasting possibly due to the lack of a clear linear organisation (Spence & Di Stefano, 2022).
Several of the above-mentioned visual attributes have also been linked to loudness. For instance, large objects are typically matched to louder sounds, while small objects are to quieter ones (Eitan, 2013), which again may be grounded in naturally occurring statistical correlations (Spence, 2011), or occur due to metaphorical associations (Walker, 1987; Spence, 2022). Similarly, loudness and brightness have been significantly correlated, matching loud sounds to dark objects and quiet sounds to bright objects (e.g., Marks, 1987). Thickness has also been positively correlated to loudness (Küssner et al., 2014), as well as spatial location, such that louder sounds were related to higher elevation (Kohn & Eitan, 2012).
Duration and rhythm are crucial features in the perception of sound: the length of line segments has been found to be proportional to sound duration for Westerners, Japanese, literate and nonliterate Papua New Guineans (Athanasopoulos et al., 2016). These populations have graphically depicted stimuli featuring high attack densities with densely arranged lines, and the inverse for lower attack densities (Athanasopoulos et al., 2016; Athanasopoulos & Moran, 2013).
Timbre is one of the least explored sound elements for its potential visual analogues. A handful of studies have examined the relationship between timbre and shape, where instruments producing soft sounds such as the piano or the cello were associated with rounded shapes and instruments such as crash cymbals were associated with angular shapes (Adeli et al., 2014). At the same time, listeners have linked auditory roughness with jagged and spiky 2- and 3-dimensional shapes (Liew et al., 2017, 2018).
Despite the lack of studies on direct timbral-visual associations, recent works have demonstrated that some of the salient semantic dimensions of timbre such as brightness, roughness or mass (Zacharakis et al, 2016) and their various nuances (Wallmark, 2018; Reymore, 2022; Noble, 2022) could be visually represented. However, since timbre is significantly more complex than pitch or loudness, there is still room for adequate modelling of these semantic categories through audio features. This is a key area of focus for the Soundsketcher project, aiming to utilise such models for timbre semantics together with existing models for pitch, loudness and rhythm to achieve perceptually informed sound visualisations.
References
Adeli, M., Rouat, J., & Molotchnikoff, S. (2014). Audiovisual correspondence between musical timbre and visual shapes. Frontiers in Human Neuroscience, 8. https://doi.org/10.3389/fnhum.2014.00352
Athanasopoulos, G., & Moran, N. (2013). Cross-cultural representations of musical shape. Empirical Musicology Review, 8(3-4), 185-199. https://doi.org/10.18061/emr.v8i3-4.3940
Athanasopoulos, G., Tan, S. L., & Moran, N. (2016). Influence of literacy on representation of time in musical stimuli: An exploratory cross-cultural study in the UK, Japan, and Papua New Guinea. Psychology of Music, 44(5), 1126-1144. https://doi.org/10.1177/0305735615613427
Eitan, Z. (2013). How pitch and loudness shape musical space and motion: New findings and persisting questions. In S.-L. Tan, A. Cohen, S. Lipscomb & R. Kendall (Eds.), The psychology of music in multimedia (pp. 161-187). Oxford: Oxford University Press.
Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and auditory features. Journal of Vision, 10(1), 1-12. https://doi.org/10.1167/10.1.6
Kohn, D., & Eitan, Z. (2012). Seeing sound moving: congruence of pitch and loudness with human movement and visual shape. In 12th International Conference on Music Perception and Cognition/8th Triennial Conference of the European Society for the Cognitive Sciences of Music (p. 541). Thessaloniki: The School of Music Studies, Aristotle University of Thessaloniki.
Küssner, M. B., Tidhar, D., Prior, H. M., & Leech-Wilkinson, D. (2014). Musicians are more consistent: Gestural cross-modal mappings of pitch, loudness and tempo in real-time. Frontiers in Psychology, 5, 99328.
Lidji, P., Kolinsky, R., Lochy, A., & Morais, J. (2007). Spatial associations for musical stimuli: A piano in the head? Journal of Experimental Psychology: Human Perception and Performance, 33(5), 1189-1207. https://doi.org/10.1037/0096-1523.33.5.1189
Marks, L. E. (1974). On associations of light and sound: The mediation of brightness, pitch, and loudness. The American Journal of Psychology, 87(1-2), 173-188. https://doi.org/10.2307/1422011
Marks, L. E. (1987). On cross-modal similarity: Auditory–visual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception and Performance, 13, 384-394.
Noble, J., Thoret, E., Henry, M., & McAdams, S. (2020). Semantic dimensions of sound mass music: mappings between perceptual and acoustic domains. Music Perception, 38(2), 214-242.
Reymore, L. (2022). Characterizing prototypical musical instrument
timbres with Timbre Trait Profiles. Musicae Scientiae, 26(3), 648-674.
Speed, L. J., Croijmans, I., Dolscheid, S., & Majid, A. (2021). Crossmodal associations with olfactory, auditory, and tactile stimuli in children and adults. i-Perception, 12(6). https://doi.org/10.1177/20416695211048513
Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics, 73, 971-995. https://doi.org/10.3758/s13414-010-0073-7
Spence, C., & Di Stefano, N. (2022). Coloured hearing, colour music, colour organs, and the search for perceptually meaningful correspondences between colour and sound. i-Perception, 13(3), 1-42. https://doi.org/10.1177/20416695221092802
Spence, C. (2022). Exploring group differences in the crossmodal correspondences. Multisensory Research, 35(6), 495-536. https://doi.org/10.1163/22134808-bja10079
Stumpf, K. (1883). Tonpsychologie I [Psychology of the tone]. Leipzig: Hirzel.
Tan, S. L., & Kelly, M. E. (2004). Graphic representations of short musical compositions. Psychology of Music, 32(2), 191-212.
Walker, R. (1987). The effects of culture, environment, age, and musical training on choices of visual metaphors for sound. Perception & Psychophysics, 42, 491-502. https://doi.org/10.3758/BF03209757
Wallmark, Z. (2018). A corpus analysis of timbre semantics in orchestration treatises. Psychology of Music, 47(4), 585-605.
Ward, J., Huckstep, B., & Tsakanikos, E. (2006). Sound-colour synaesthesia: To what extent does it use cross-modal mechanisms common to us all? Cortex, 42(2), 264-280.
Zacharakis, A., & Pastiadis, K. (2016). Revisiting the Luminance-Texture-Mass Model for Musical Timbre Semantics: A Confirmatory Approach and Perspectives of Extension. Journal of the Audio Engineering Society, 64, 636-645.