A Literature Review on Music Parameter Extraction and Visualization

Music visualization research is extremely complex and dynamic. Several researchers have applied various methods to persevere in the study of all aspects that make up music. The complexity of music also includes factors such as waveform, frequency, pitch, rhythm, tempo, timbre, and chords. Researchers in recent years have studied the extraction of single elements, visualization, or cross-discipline for these aspects. As far as the current research is concerned, most of the disciplines related to music visualization are focused on computers, psychology, sports science, and other related disciplines. Research on the elements of music itself has focused on music visualization, music element extraction, music association, music emotion, and the study of several important aspects of music, such as waveform, frequency, pitch, rhythm, tempo, timbre, and chord. After reviewing the research, this paper has found that with the continuous development of science and technology, music visualization has a progressive intersection with computer science, artificial intelligence, and neural networks. Thus, future research can continue to interact more with computer science.


Introduction
With the increasing research in music visualization, music studies, visualization, element extraction, and related parameters are conducted dynamically.For music-related studies, the first is the study of association with some researchers conducting music-to-colour association studies and music-to-emotion-to-colour association studies (Zamm et al., 2013).Musicemotion correlation studies are also ongoing research in the field of music research.Many scholars have used it to study the effect of music on listeners' inter-emotional responses by studying people recording their emotional reactions after listening to music (Swaminathan & Schellenberg, 2015).Many researchers have also studied the extraction or visualization of some aspects of music through computer technology and datasets, and the study of music Vol 14, Issue 3, (2024) E- ISSN: 2222-6990 To Link this Article: http://dx.doi.org/10.6007/IJARBSS/v14-i3/21093DOI:10.6007/IJARBSS/v14-i3/21093 Published Date: 03 March 2024 also has cross-disciplinary studies with the human brain and motor nerves, among others (Roy & Dowd, 2010).
This paper reviews the methods related to music content processing in terms of element extraction and visualization.In terms of research on music element extraction, some researchers' music information retrieval algorithms have been used to extract the contours of melodies in music (Zhang, 2022).Some researchers have also used neural networks to track and extract rhythms and beats in the music and so on (Oord et al., 2018).For music visualization research, scholars have been working on developing models with neural networks to continuously improve the precision and accuracy of visualization (Miller et al., 2019;Lima et al., 2022).
Music is an extremely complex object of study in itself, and music visualization can be designed to cover all aspects of music, such as rhythm, waveform, and melody (Yu et al., 2021).The rest of the elements have been studied in the direction of how to extract them from other characteristics of the music and analyze their accuracy after extraction (Pinto et al., 2021).During the course of the review of music visualization, many studies were found to have accomplished graphical, pictorial treatments of music but failed to respond with sufficient precision to the music itself (Lima et al., 2022).Many visualization studies have visualized music from an associative perspective, and most of these visualization methods examine music as a whole (Fonteles et al., 2014).By contrast, other studies on musical elements have mostly focused on the extraction of those elements, and extraction can be considered the first step of visualization (Zhang, 2022).Therefore, this paper posits that the visualization of each element of music can still undergo further investigation on the basis of extraction.Additionally, being able to restore the characteristics of each element of music more accurately is the direction in which visualization research can continue to dig deeper.The accuracy of the methods for visualization extraction has also become more accurate with the development of the research and the aspects studied (Lima et al., 2022).With the development of the disciplines of artificial intelligence and neural networks, an increasing number of researchers are working on a more accurate representation of the visual content of music based on neural networks (Kim et al., 2019).
The challenges in music visualization are primarily reflected in the following aspects: most of the studies are focused on a specific dataset with certain limitations (Greer et al., 2019).In addition, most of the current research in music visualization happens through real-time animation, with less attention to the structural components of music (Lima et al., 2022).In terms of the current visualization research, the accuracy of the method of extracting and tracking the element can be further improved, regardless of the musical element, which is often difficult to improve in previous studies (Huang et al., 2021).As far as the current research is concerned, the visualization studies of melody and rhythm in music are wellestablished (Salamon et al., 2012;Reddy & Rompapas, 2021).Moreover, visualization studies for other musical elements are still more difficult (Lima et al., 2022;Reddy & Rompapas, 2021).
Music technology is constantly changing, and new important techniques are emerging to visualize music content (Miller et al., 2019).Research on music visualization helps us understand music better and more intuitively (Dalton et al., 2019).Music is also often used as a tool for emotional expression by artists, and research on the extraction and visualization of musical elements can provide a better insight into the artist's artistic style (Coorevits et al., 2019).It also provides an easy and more accurate match for people to search for music that matches their preferences in their lives (Zhang, 2022).Music often affects people's psychological conditions and even physical motor performance; thus, visualizing or extracting music-related elements provides a better understanding of the mechanisms by which music affects people's minds and bodies (Karageorghis et al., 2018).Exploring music element extraction and visualization studies can also further inform music classification (Eghbal-Zadeh et al., 2015).
It provides a research basis and theoretical support for future researchers studying the direction of music visualization.In the study of visualization, less research is carried out on pitch, spacing, timbre, and harmony of music in the study, and these topics can exactly provide new directions for researchers.Music visualization research will provide more comprehensive theoretical support for a better understanding of music.

Music
In the field of music research, associative studies also reached a peak in 2013.Zamm et al.(2013) proposed the color-music Association (CSM) which refers to the phenomenon of perceiving color when someone hears a note or sings a song.Palmer et al.(2013) demonstrated experimental evidence for cross-modal matching between music and color mediated by emotional associations.White matter correlates of colour-music associations of synapses have also been investigated (Zamm et al., 2013).Music and emotion have been the subject of keen research, and Clarke et al (2015) have extensively studied emotional, linguistic, and social motivation from a musical perspective.A large number of academic papers and awards from different disciplines are presented, demonstrating that listening to music via headphones can profoundly change the cultural attitudes of highly demanding perceivers.Swaminathan and Schellenberg (2015) investigated the link between music and emotion, the communication and perception of emotion in music, the emotional consequences of listening to music, and the predictors of music preference.Music audio generally consists of three physical attributes: frequency, time, and amplitude (Koelsch et al., 2013).
Some researchers have demonstrated the processing of non-local dependencies in music (Koelsch et al., 2013).Li et al (2018) proposed a speech analysis dataset for facilitating musical performances and informed us how to build a complete dataset from a very small package.Melody extraction algorithms are used in computer science to extract pitch information about the main melody from music recordings (Salamon et al., 2014).Levitin et al (2018) review studies targeting the temporal and rhythmic characteristics of music that span several methodological techniques, including neurosurgery, psychophysicists, and traditional behavioural experiments.We also review studies of animal synchrony and compare the results to advances in human rhythm perception and cognition.MIR has been exploring automated music genre recognition since 2002.Our strategy based on feature sets is effective.Analyzing, evaluating, comparing, and merging acoustic and visual features produce classification accuracies equivalent to or better than existing methods (Nanni et al., 2016).Herremans et al (2018) proposed a functional classification that reveals the interconnectedness of systems used for automatic music generation systems.Interference models, composer models, and hybrid models differ in their assumptions and network structure.Dong et al (2018) proposed three symbolic multitrack music generation models based on MuseGAN.
Some researchers reviewed the research results on content-based music information retrieval involving eight denotational-related tasks, including sound/non-sound segmentation, artist identification, style classification, dance identification, sentiment identification, instrument identification, and music fragment annotation (Murthy & Koolagudi, 2019).Flexer et al (2020) reviewed the latest breakthroughs in music structure analysis methods for audio and discussed the challenges that may arise when applying these techniques in the real world.Nieto et al (2020) explored the latest algorithms for music structure analysis of audio and their problems in real life.Calvo-Zaragoza et al ( 2021) examined the application of optical music recognition (OMR) to transform digital audio data into non-digital audio data.Lerch and Knees (2021) investigated the new approaches in the field of music information retrieval and audio signal processing new approaches in the field of music information retrieval and audio signal processing, mainly through machine learning solutions.Koelsch and Jäncke(2015) proposed a new assessment method to measure heart rate changes associated with music and other factors.Some scholars (Janata et al., 2012) have experimentally explored the relationship between sound and emotion, arguing that sound can be considered a mental structure and model system.Roy and Dowd (2010) assessed the involvement of acoustic systems in terms of musical neurochemistry.McDermott et al (2016) studied the same music in different ethnic and regional cultures in terms of aesthetic responses and found that exposure to musical harmony may alter tastes, demonstrating that culture dominates aesthetic responses to music.Roy and Dowd (2010) examined how individuals and groups use music from a sociological perspective, how the collective production of music is achieved, and how music relates to broader social distinctions, particularly class, race, and gender.

Extraction
Schedl et al ( 2014) first introduced established methods for feature extraction and music indexing of music items from audio signals and background data sources, focusing on contemporary MIR achievements (e.g., automatic semantic tagging and user-centric retrieval and recommendation methods).Methods for estimating heterotopic/polyphonic music melodic sequences based on systematic MODGD (direct) and source-based MODGD (source) have also been investigated (Rajan et al., 2017).Chu (2022) analyzed the characteristics of digital music and extracted musical features, rhythm, tune, intensity, and timbre in MIDI format.To extract musical melodies, Bittner et al (2015) trained a discriminative binary classifier to identify melodic and non-melodic contours.It outperformed the generative model in contour classification accuracy.
Oramas et al ( 2016) created a music knowledge base and tested an information extraction pipeline to better interpret the music data.To demonstrate that signal acoustic features can be used to distinguish musical genres, Shin et al ( 2019) used a sound-encoded auditory spike code to extract acoustic features similar to the human auditory system.In the same year, MFCC was also used to extract features, and K-NN was used to classify music as pop or RnB (rhythm and blues) (Ramadhana & Widiarthaa, 2021).In the study of cross-modal learning (e.g., audio and lyrics), Shin et al (2019) proposed a cross-modal deep associative learning architecture with a two-branch deep neural network to process audio and text (lyrics) used a multi-agent system that assigned extraction, classification and service duties, thereby enabling the automatic classification of music.On the other hand, Mo & Niu (2017) used orthogonal matching pursuit, Gabor function, and Wigner distribution function to analyze music signals-OMPGW extracts music sentiments for music applications, such as music retrieval or recommendation.

Visualization
In contrast to the music notation method (the most popular previous method for visualizing music), Smith and Williams(1997) proposed an alternative method for visualizing music using color and three-dimensional space.Cooper et al (2006) reviewed the attempts to visually represent high-level information about music content and reviewed new methods for visualizing music with MIR in the field of music visualization as the current state of the art.Nanayakkara et al (2007) quickly established real-time music visualization by using a combination of Max/MSPTM and FlashTM to propose a novel, new scheme that can be used to visualize music.Puzoń and Kosugi (2011) found that Visuals demonstrated not only obvious repetitions when reading a score or listening to a piece of music but also more subtle repetition patterns.Donnelly and Sheppard(2013) explored the use of Bayesian networks to identify the timbre of musical instruments.Bayesian networks with conditional dependencies in the time and frequency dimensions achieved 98% accuracy in the instrument classification task and 97% accuracy in the instrument family identification task.
A simplified 3D particle system and a fast translation algorithm have also been implemented to generate real-time animated particles orchestrated by classical music for music visualization (Fonteles et al., 2013).To develop a support tool for music perception and composition, Fonteles et al (2014) proposed a 3D particle system and a mapping algorithm.Lex et al (2014) used a novel visualization technique, namely, UpSet, to quantitatively analyze sets and their intersections and aggregates of intersections, to visualize music.
For children with hearing impairments, Kim et al (2015) provided a prototype of a music visualization system that records and decodes musical elements into digital data and then visualizes the information.Some scholars have analyzed the visualization elements through the ability to read the soundtrack or even simply listen to a live performance to understand the structural components of the piece (Malandrino et al., 2015).Oramas et al (2016) showed that visualization can improve the recognition of musical forms by examining the theory of isochore structure, visualization, and comments from novices and veterans.Pons & Serra (2017)  In contrast to many audio synthesis efforts, where direct waveform creation models perform best, the state-of-the-art music source separation is the computational masking of the amplitude spectrum (Défossez et al., 2021).By modeling the correlation of the spectrogram along the time and frequency dimensions, Chen et al (2022) proposed a host-and networkbased time-frequency attention module and multiscale attention to effectively capture the association of music signals and explore the connection between music spectrograms and music waveforms.
To enable the model to better choose whether the acoustics are in the spectral or waveform domain, Défossez et al (2021) investigated how to perform end-to-end hybrid source separation.In recent years, song separation SVS algorithms dealing with encoder potential waveform graphs have improved in quantity and quality (Papantonakis et al., 2022).

Pitch
McLeod and Wyvill (2003) created software that can accurately display the pitch of notes being played or sung by a musician in real time.Some researchers have proposed that effective noisy speech multi-pitch tracking algorithms are essential for acoustic signal processing (Wu et al., 2003).Povey et al (2011) proposed a new speech recognition method using a Gaussian mixture model with the same number of Gaussians in all Hidden Markov Model stages, with each state having a 50-dimensional vector and a parameter to the GMM global mapping of the space.
Pitch is one of the main auditory senses and plays a decisive role in the analysis of music, speech, and auditory scenes (Oxenham, 2012).Zatorre and Baum (2012) claimed that speech and musical melodies process pitch information differently with two pitch-related processing systems, one for coarse-grained approximate analysis and one for finer-grained accurate representation, which is unique to music.For an automatic speech recognition system, Ghahremani et al ( 2014)proposed a method for estimating pitch and articulation probabilities.The BABEL project investigated data from multiple languages and found considerable improvements over systems without pitch features and systems that obtained pitch and POV information via SAcC or getf0.Kim et al (2018)

Rhythm
Beginning with psychophysical studies of temporal rhythm and pitch perception, Krumhansl (2000) summarised psychological research on how this aspect is seen and recalled.Patterns, beats, and rhythms are the temporal components of rhythm.Music rhythmically activates the somatic and premotor systems (Thaut et al., 2014).By studying percussion, Repp (2005) showed that sensorimotor synchronization (SMS), the rhythmic coordination of perception and action, is most evident in music and dance.For studies of rhythmically responsive motor areas, Grahn and Brett (2007) suggested that basal ganglia and SMAs may mediate rhythmic perception outside of motor creation.Some scientists have also used rhythmic features to create a Thayers-based model of emotion to investigate the association between emotion and rhythm (Cu et al., 2012).Böck et al (2016) provided a then state-of-the-art method for extracting combined beats and low-tempo rhythms from audio sources.To provide music mixing with rhythmic synchronization, extraction of rhythmic patterns, and rhythm-based music retrieval, Lin et al ( 2010) developed methods that automatically select similar songs by a seed song and user-defined rhythmic parameters.Quinton (2017) evaluated the reliability of rhythmic feature extraction to improve the confidence of automatic beat structure analysis and MIR systems.
The study provides two methods to automatically quantify metric modulation in audio recordings.Automatically 'capturing rhythms' and annotating musical beats to correct them have been a topic faced by scientists (Driedger et al., 2019).Driedger et al (2019) provided a novel dataset displaying beats and mathematically describing the automatic correction method, demonstrating its effectiveness.Dalton et al (2019) explored how rhythm analysis enables DAW rhythms to synchronize with source recordings.Research by Percival and Tzanetakis inspired Renoise's basic beat extraction technique.Böck and Davies (2020) evaluated cutting-edge deep neural network techniques for computational rhythm analysis, which improved the performance of the system by 6% by disassembling, examining, and reassembling such techniques.

Chords
The EEG of music cognition has rarely been studied (ERPs); Koelsch et al (2000) determined that musical context, task relevance of accidental chords, degree of violation, and probability influenced music processing.Pauwels & Peeters (2013) provided a new approach to music structure segmentation based on an integrated estimate of structural segments, keys, and chords in a probabilistic framework.A priori probabilities of key changes and chord transitions define the boundaries of the structural segments.To investigate neonatal responses to Western music, Virtala et al (2013) tested change-related mismatch responses (MMR) by encoding Western music chords in the neonatal brain using ERPs.
Virtala et al ( 2014) study determined that musicology improves brain and behavioral recognition of Western music chords.Cambouropoulos et al (2014) study found that an idiom-independent chord type representation captured tonal simultaneity in every harmonic context, leading them to focus on harmonic representation and computational analysis (e.g., modal, modal, jazz, octave, and atonal).To explore chords with various affective properties, the work of Lahdelma and Eerola ( 2016) examined the affective nature of vertical harmonies.
To anticipate the feelings of listeners of musical parts, Greer et al ( 2019) investigated a corpus of chords and lyrics matched to musical phrases, which were used to represent lyrics and chords in a shared vector space.To enable users to turn images into short chord-spin combinations, Polo and Sevillano (2019) developed Musical Vision, an emerging tool that interacts to construct fully variable mappings between color space and MIDI instrument and audio pitch space.

Melody
Polansky & Bassein(1992) used contour theory to assess pitch means in large-scale segmentation of waveforms, melodies, musical pieces, or other measurable features.Halpern and Zatorre (1999) used PET to examine brain activity associated with known melodies.Margulis ( 2005) provided an empirical technique for analyzing melodic anticipation and a model for rating the anticipatory nature of the melodic occurrence.Salamon and Gómez (2012) developed a unique method for extracting major melodies from polyphonic music recordings.In the same year, Salamon et al ( 2012) provided a unique method for genre identification by directly exploiting the high-level melodic qualities in the audio signal of polyphonic music.Later, Salamon et al(2014) summarised the difficulties in the design, evaluation, and application of melody extraction methods and proposed that melody extraction research faces problems in algorithm performance, development, and evaluation.Zhang (2022) proposed a LAM algorithm based on music melody contour feature extraction and oriented to music information retrieval.

Beats
Ariza & Cuthbert (2010) applied the beat module of the TimeSignature-music21 Python toolbox to read Humdrum and MusicXML and output Lilypond and MusicXML.Salamon et al (2012) found that beat perception is one of the auditory input recognition regular pulses that may contribute to the creation of music.The findings clearly support intrinsic beat perception.Degara et al (2011) used a novel probabilistic approach to calculate the duration between musical beat occurrences by explicitly modeling non-beat states.Holzapfel et al (2012) proposed a beat technique for tracking difficult musical samples without ground truth.Böck et al (2019) proposed a multi-task learning method for musical rhythm estimation and beat tracking trained entirely using rhythmic annotations.Böck et al (2019) used a new method of temporal convolutional networks to monitor audio in beats.In the same year, the researchers also designed a causal technique for determining the location of beats from an audio source (Richter, 2019).
The data-driven automatic drumming transcription (ADT) model of Wang et al (2020) was unable to discriminate beats outside of a specified, small range of percussion-like vocabularies.The ADT problem for open vocabularies was overcome by adding a bit of learning.To eliminate the designed spectral features, Steinmetz and Reiss (2021) developed WaveBeat, a waveform-based end-to-end joint beat and downbeat tracking method.Pinto et al (2021) proposed a real-world beat-tracking strategy based on a relatively small temporal region of annotated beat positions and focused fine-tuning of the most advanced deep neural network to extract beats from music audio signals.To improve the accuracy and relevance of beat matching, Zhu (2022) developed a data mining-based error recognition system for dance movement and music beat matching.

Tempo
To measure the effect of music on the assessment of happiness and depression, Bella et al (2001) concluded that tempo mastery precedes modality when interpreting the emotional tone conveyed by music.Karageorghis et al (2008)examined how music rhythm influences motor flow, intrinsic motivation, and music choice.To further understand the connection between music and emotion, Van Der Zwaag et al ( 2011) investigated how rhythm, pattern, and tempo influence mood.To test linear and nonlinear models for predicting musical tension, Farbood (2012) examined several musical factors: harmony, pitch, melodic anticipation, dynamics, onset frequency, tempo, beat, rhythmic regularity, and syncopation.
Getz et al ( 2014) again studied to assess how stress, optimism, and musical training affect a person's desire to listen to music (for emotional control and/or cognitive stimulation) and the tempos they prefer.As the research on music and emotion deepens, some researchers believe in the importance of using music as a pre-game technique in sports by adjusting the volume and tempo of music whilst monitoring brain activity (Bishop et al., 2014).Another approach offered by Juslin et al (2014) attempted to explain musical emotions in terms of a set of target mechanisms triggered by various information in musical events (e.g., tempo).
Percival & Tzanetakis (2014) proposed a reduced tempo estimation method for music with constant or near-constant tempo to retain tempo accuracy whilst reducing steps, parameters, and modeling assumptions.Building on previous research on tempo-emotion associations, Dobrota & Reić Ercegovac (2015) investigated topics aimed at understanding whether a correlation exists between listeners' preferred patterns and tempos and their individual personality attributes.
Rosemann et al (2016)studied the effects of eye-hand coordination, performance tempo, complexity, and cognitive abilities of pianists.Neuhoff et al (2017) explored the challenges in tempo playing methods, variations in fine-tuning and expressiveness, temporal effects, and the implications of these results for music theory.Karageorghis et al (2018) assessed the interactive effects of musical tempo and intensity (volume) on the execution and subjective emotion of a basic motor skill by assessing whether this study further extends previous research.Coorevits et al (2019) again returned to the musical performance itself, examining the many effects that changes in musical tempo have on the 'performance state' or the articulation of the performer's movements.Bittner et al (2022) offered a way to understand musical tempo beyond listening to music by developing a Visual project and found that the number of notes per time unit and tempo also mattered.

Timbre
Pressnitzer et al ( 2000) concluded from a psychoacoustic perspective that psychoacoustic roughness increases when non-tonal orchestral timbres reduce musical tension.Patil et al (2012) studied musical timbres using a neurocomputational framework with a nonlinear classifier and 1,000 mammalian primary auditory cortical neurons and spectral-temporal receptive areas of simulated cortical neurons.Burger et al (2013) simultaneously investigated the relationship between rhythm, timbre, and motion, concluding that body motion reflects, reproduces, and predicts musical quality.Town and Bizley (2013) outlined human timbre perception and the spectral and temporal acoustic features that shape timbre in speech, music, and environmental sounds, suggesting some worthwhile directions for research.Lui (2013) developed a technique for teaching musical timbre on mobile devices, and the model was self-trained by volume-tuning streamlined spectral data.To confirm that the pitch and timbre analysis process of music has unexpected similarities, Cousineau et al (2014) explored music with a sequence discrimination task.Rocha et al (2013) further investigated musical similarity, focusing on electronic dance music (EDM) using timbre similarity as a sub-similarity.To overcome the drawback that both operations in modulation analysis may erase useful modulation information, Ren et al (2015) proposed a two-dimensional representation of acoustic and modulation frequencies to extract joint features.
Still, from the perspective of music retrieval classification, Eghbal-Zadeh et al ( 2015) used music timbre similarity and music i-vectors to derive song-level descriptors from frame-level temporal information for artist classification.Lu et al (2019) developed MUNIT to enable music classification, developed MUNIT for multimodal music style transformation and timbre enhancement using unsupervised, non-parallel data describing the multimodal scattering of musical situations.Kim et al (2019) created a neural music synthesis model with configurable timbres using sheet music and instrument data, using conditions for learning instrument embedding and WaveNet vocoder for the recurrent neural network.

Hypothetical Future and Recommendation
A review of research on music visualization found that for the aspect of music waveform research, a steady stream of researchers has emerged in recent years to study the connection between music graphs and music waveforms based on WaveNet.Pitch is also an integral part of music.In recent years, researchers have been using neural networks to model and study pitch in music.Rhythm is often associated with movement, and some scholars wish to study the effect of rhythm on motor performance, whilst others have classified music by extracting rhythmic features.Harmonic spins, on the other hand, are more linked to brain science.
Research on the visualization of harmonic spins has focused more on the association with color.My interest in melody has been in melody extraction and prediction of melodic occurrence.The latest research is on the extraction of musical contour features by melody for the classification of music.The study of beats has been focused on the perception, tracking, and extraction of beats.The study of rhythm has focused more on the relationship between rhythm and motor performance.The study of tone has focused more on the area related to music classification.
Through the above studies, we found that the waveforms, harmonies, and melodies are more closely related to visualization studies of music.Visualizing waveforms and melodic colors is easier.The main limitation is that the visualization of rhythm in music is more difficult, and most of the studies have focused on the effect of rhythm or beat on motion performance.The visualization of pitch and timbre has also received less attention, and the studies are more closely related to music feature retrieval and classification.

Novelty
In the future, with the continuous development of artificial intelligence in computer science, other researchers will use new models and new algorithms to extract the characteristics of the parameters of music and further research on how to use these new technologies in improving the study of music visualization.
Klapuri (2008)rris (2008) networks with tiny rectangular filters to classify music.A more expressive and intuitive deep learning architecture was achieved through the representational power of the first layer and the application of various filter shapes based on the musical concepts within the first layer.To help people create musical compositions quickly and efficiently,Malandrino et al (2018)built a visual tool called, Visual Harmony.Khulusi et al (2020)investigated and overviewed the special relationship between musicology and visualization.To reveal the semantic structure in classical orchestral works,Chan et al (2009)proposed an innovative visualization solution.Ciuha et al (2010)visualized music by interconnecting similar aspects of music and visual perception, with their research focusing on visualizing the harmonic relationship between pitch and color.To enhance the music listening experience in private spaces,Reddy and Rompapas(2021)used 'liquid hands' to bridge the distance between visualized virtual and actual concerts by utilizing alternative solutions in a virtual environment(Isaacson, n.d.).Lima et al (2022)recommended classifying ideas through input attributes, visualization quality, InfoVis technique if interactivity is allowed, and user assessment.This paper examines visual techniques developed by experts in the field of music analysis as well as some less successful approaches to music visualizationOhmi (2007)In the current work, the authors attempt to illustrate the development of music through still images of music in specific time units whilst paying attention to its structural components rather than through real-time animation.WaveformInspired by sawtooth waves,Camacho and Harris (2008)developed SWIPE, an estimator for evaluating the pitch of speech and music.Klapuri (2008)determined the fundamental wavelengths of multiple sounds in polyphonic music and multichannel speech signals by studying computer models of human auditory regions.WaveNet emerged in the field of sound waveform extraction, which is a deep neural network for generating raw audio Wu et al (2021)ruwatari (2020)amental concept in music theory, namely, the circle of fifths, as a model for studying visualized music.Jeong and Kim (2019)linked the DMX512 protocol through Openframeworks to create a 'dynamic lighting for music visualization' to present musical features.waveforms(Oordetal., 2016).Oord et al (2017)investigated probability density distillation, a novel method for training parallel feedforward networks to generate high-fidelity speech samples 20 times faster than real-time using learned WaveNet.Shen et al (2018)used a modified WaveNet model as a vocoder in combination with Tacotron 2 (a neural network structure for text-to-speech synthesis) to generate time-domain waveforms in spectrograms.Rethage et al (2018) used wavelet's end-to-end speech denoising learning method, which enabled researchers to create a model that maintains 'in-phase' signals in the waveform graph to overcome the shortcomings of amplitude spectrograms.Lluís et al (2019)developed a deep learning model based on spectrograms-DeepConvSep, which can be improved by our proposed Wavenet-based model and Wave-U-Net.Nakamura and Saruwatari (2020)proposed a governmental deep neural network based on the Wave-U-Net discrete small Wavelet Transform (DWT); DWT is used for time domain music source waveform separation.Wu et al (2021)proposed a pitch-adaptive waveform generation model called, Quasi-Periodic Wave Network (QPNet), to overcome the limited pitch controllability of fictitious wave networks (WNs) using pitch-dependent expanded convolutional neural networks (PDCNNs).