A short review on voice transformations at IRCAM.
P. Lanchantin, S. Farner, C. Veaux, G. Degottex, A. Roebel, and X. Rodet.
In Proc. of the First International Workshop on Performative Speech and Singing Synthesis, Vancouver, Canada, Mar. 2011 [Preprint]
 
IRCAM has a long experience in analysis, synthesis and transformation of voice. Natural voice transformations are of great interest for many applications and can be combine with text-to-speech system, leading to a powerful creation tool. We present research conducted at IRCAM on voice transformations for the last few years. Transformations can be achieved in a global way by modifying pitch, spectral envelope, durations etc. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different gender and age, modified vocal quality, or another speech style. These transformations can be applied in realtime using ircamTools TRAX. Transformation can also be done in a more specific way in order to transform a voice towards the voice of a target speaker. Finally, we present some recent research on the transformation of expressivity.
Also presented at The 14th International Conference on Digital Audio Effects (DAFx-11), Paris, France, Sept. 2011 [Preprint]

Ensemble hand-clapping experiments under the influence of delay and various acoustic environments
S. Farner, A. Solvang, A. Sæbø, and P. Svensson
Journal of the Audio Engineering Society, 57 (12), Dec. 2009, pp. 1028-1041
 
This study presents hand-clapping experiments performed to increase the knowledge about distributed musical performance with an inter-musician sound delay up to 68 ms in virtual reverberant and anechoic environments as well as in real reverberant environments. Four reactions to increasing delay were studied: tempo decrease, imprecision, leader-follower strategy, and ensemble performance quality as judged by the subjects. The results suggest that the behavior changes at two different delays of about 20 and 35–50 ms, respectively. The influence of acoustical environment was ambiguous and needs further study.

Natural transformation of type and nature of the voice for extending vocal repertoire in high-fidelity applications
S. Farner, A. Röbel, and X. Rodet
In Proc. of the 35th International AES Conference (Audio for Games), London, UK, Feb 2009
 
Natural voice transformation will reduce the need for authentic voices in many situations, ranging from vocal services via education and entertainment to artistic applications. Transformation of one voice to correspond to that of another person has been studied for decades but still suffers from limitations that we propose to overcome by an alternative approach. It consists in modifying pitch, spectral envelope, durations etc. in a global way. While it sacrifices the possibility to attain a specific target voice, the approach allows the production of new voices of a high degree of naturalness with different sex and age, modified vocal quality (soft, breathy, and whisper), or another speech style (dullness and eagerness). The transformation of sex and age has been evaluated by a listening test.
[Sound examples]

Quantifying the strategy taken by a pair of ensemble hand-clappers under the influence of delay.
N. Darabi, P. Svensson, and S. Farner.
In Proc. of the 125th AES Convention, San Francisco, CA, USA, Oct 2008
 
Pairs of subjects were placed in two acoustically isolated rooms clapping together under an influence of delay up to 68 ms. Their trials were recorded and analyzed based on a definition of compensation factor or CF. This parameter was calculated from the recorded observations for both performers as a discrete function of time and thought of as a measure of the strategy taken by the subjects while clapping. Increasing the delay CF was shown to be increased linearly as it is desired to avoid tempo decrease for such high latencies. Theoretically a critical value for CF was defined as tempo over measure (or beat) duration and was used to explain why very short latencies may lead to a tempo acceleration in accordance with Chafe effect.

Voice transformation and speech synthesis for video games.
S. Farner, Ch. Veaux, G. Beller, X. Rodet, and L. Ach
Presented at Paris Game Developers Conference, Paris, France, June 2008
 
Voice and expressivity transformation as well as text-to-speech synthesis with high degree of naturalness are now available. A set of tools permitting a large range of voices to be made from a single voice, whose speech may be produced from text and given a certain expressivity, is proposed. In the context of multiplayer video games, for instance, this technology allows for creation of the speech of non-player characters as well as for transforming the player’s voice into the voice of her character. The technology behind these tools will be presented. A demonstration using cartoon characters will also be provided.
[Description] [Presentation] [Sound examples]

Electrophysiological Study of Algorithmically Processed Metric/Rhythmic Variations in Language and Music
S. Ystad, C. Magne, S. Farner, G. Pallone, M. Aramaki, M. Besson, and R. Kronland-Martinet
EURASIP Journal on Audio, Speech, and Music Processing, vol. 2007, Article ID 30194, 13 pages, 2007, doi:10.1155/2007/30194
 
This work is the result of an interdisciplinary collaboration between scientists from the fields of audio signal processing, phonetics and cognitive neuroscience aiming at studying the perception of modifications in meter, rhythm, semantics and harmony in language and music. A special time-stretching algorithm was developed to work with natural speech. In the language part, French sentences ending with tri-syllabic congruous or incongruous words, metrically modified or not, were made. In the music part, short melodies made of triplets, rhythmically and/or harmonically modified, were built. These stimuli were presented to a group of listeners that were asked to focus their attention either on meter/rhythm or semantics/harmony and to judge whether or not the sentences/melodies were acceptable. Language ERP analyses indicate that semantically incongruous words are processed independently of the subject's attention thus arguing for automatic semantic processing. In addition, metric incongruities seem to influence semantic processing. Music ERP analyses show that rhythmic incongruities are processed independently of attention, revealing automatic processing of rhythm in music.

Ensemble hand-clapping experiments under the influence of delay and various acoustic environments
S. Farner, A. Solvang, A. Sæbø, and P. Svensson.
In Proc. of the 121st AES Convention , Preprint No. 6905, San Francisco, CA, USA, Oct 2006.
 
Hand-clapping experiments were performed by pairs of subjects under the influence of a delay up to 68 ms in various acoustic environments. The mean tempo decreased close to linearly as function of the delay. During each sequence the tempo slowed down to a degree that increased with the delay but for delays shorter than about 15-23 ms, the tempo increased during the sequence. For the timing imprecision, and for the subjects' judgements of their own ensemble performance, no effect of the delay could be observed up to 20 ms. Above 32 ms the effects were observed to increase with the delay. Virtual anechoic conditions lead to a higher imprecision than the reverberant conditions, and real-reverberation conditions lead to a slightly lower tempo.
[Preprint © AES] [Poster]

Timbre variations as an attribute of naturalness in clarinet play
S. Farner, R. Kronland-Martinet, T. Voinier, and S. Ystad.
Computer Music Modeling and Retrieval. Third International Symposium CMMR 2005, Pisa, Italy, September 2005. Published in Lecture Notes in Computer Science, Vol. 3902, Springer-Verlag, May 2006, pp. 45-53, ISBN 3-540-34027-0
 
A digital clarinet played by a human and timed by a metronome was used to record two playing control parameters, the breath control and the reed displacement, for 20 repeated performances. The regular behaviour of the parameters was extracted by averaging and the fluctuation was quantified by the standard deviation. It was concluded that the movement of the parameters seem to follow rules. When removing the fluctuations of the parameters by averaging over the repetitions, the result sounded less expressive, although it still seemed to be played by a human. The variation in timbre during the play, in particular within a note's duration, was observed and then fixed while the natural temporal envelope was kept. The result seemed unnatural, indicating that the variation of timbre is important for the naturalness.
[Preprint © Springer]

Contribution to harmonic balance calculations of self-sustained periodic oscillations with focus on single-reed instruments
S. Farner, C. Vergez, J. Kergomard, and A. Lizée
Journal of the Acoustical Society of America, 119 (March 2006), pp. 1794-1804
 
The harmonic balance method (HBM) was originally developed for finding periodic solutions of electronical and mechanical systems under a periodic force, but has later been adapted to self-sustained musical instruments. Unlike time-domain methods, this frequency-domain method does not capture transients and so is not adapted for sound synthesis. However, its independence of time makes it very useful for studying every periodic solution of this system, whether stable or unstable, without care of particular initial conditions in time. A computer program for solving general problems involving nonlinearly coupled exciter and resonator, "Harmbal", has been developed based on the HBM. The method as well as convergence improvements and continuations facilities are thoroughly presented and discussed in the present paper. Application of the method is demonstrated especially on problems with severe difficulties of convergence, i.e. the Helmholtz motion (square signals) of single-reed instruments when no losses are taken into account, the reed being modelled as a simple spring.

Comparing spectral distance measures for join cost optimization in concatenative speech synthesis.
I. Bjørkan, T. Svendsen, and S. Farner
In Proc. of Interspeech 2005, Lisboa, Portugal, Sept 2005, pp. 2577-2580
 
In concatenative synthesis the join cost function can be related to the probability of a perceived discontinuity at the join. Therefore it is important that the distance measures in the cost function correlate highly with human perceived discontinuities. In this paper the results of a listening test on joins in two Norwegian long vowels: /A:/ and /e:/, is presented. Five spectral distance measures and the F0 difference are compared as predictors of the human perceived discontinuities using Receiver Operating Characteristic (ROC) curves. In addition, a linear join cost function is optimized by means of stepwise linear regression.
[Preprint]

Comparison of rhythmic processing in language and music: an interdisciplinary approach.
C. Magne, M. Aramaki, C. Astesano, R. L. Gordon, S. Ystad, S. Farner, R. Kronland-Martinet, and M. Besson.
Journal of Music and Meaning 3, Fall 2004/Winter 2005, sec.5.1 (Online journal)
 
In this paper we describe an interdisciplinary collaboration between phoneticians, acousticians and neuroscientists that led to a study of rhythm in music and language. In the first part of the paper we discuss general aspects of rhythm, with a short overview of some earlier studies on the cultural influences of linguistic rhythm on musical rhythm. In the second part, we describe an experimental procedure aimed at comparing the perception of rhythmic and semantic violations in language with the perception of rhythmic and harmonic violations in music. Subjects listened to different sentences and melodies and were asked to focus on either rhythm or semantics/harmony to indicate whether or not the last word/arpeggio was acceptable or not in the context. The Event-Related Brain Potential method was used to study perceptual and cognitive processing related to the rhythmic and semantic/harmonic incongruities. The results indicated that the processing of rhythmic incongruities was associated with increased positive deflections in the Brain Potential in similar latency bands in both language and music. However, these positive components were present independently of the participants’ focus in the music part while they were only present when the participants focused on semantics in the language part.

Some aspects of the harmonic balance method applied to the clarinet
C. Fritz, S. Farner, and J. Kergomard
Applied Acoustics, 65 (2004), pp. 1155-1180
 
The clarinet has been extensively studied by various theoretical and experimental techniques. In this paper, the harmonic balance method (HBM), a numerical method mainly working in the frequency domain, has been applied to solve a simple nonlinear clarinet model consisting of a linear exciter (for the reed) nonlinearly coupled to a linear resonator with visco-thermal losses (for the pipe). A recent and improved implementation of the HBM for self-sustained instruments has allowed us to study the model theoretically when including dispersion in the pipe or mass and damping terms in the reed model. The resulting periodic solutions for the internal pressure spectrum and the corresponding playing frequency are shown to align well with previous theoretical and experimental knowledge of the clarinet. Finally, we present and briefly discuss a few (probably unstable) oscillation regimes both with the HBM and with a real clarinet.

Convergence improvement of the harmonic balance method to obtain periodic solutions for self-sustained musical instruments
S. Farner, C. Vergez, and J. Kergomard
In Proc. of the International Congress on Acoustics (ICA) 2004, Kyoto, Japan, April 2004, pp. 1429-1432
 
The harmonic balance method was originally developed for solving periodic solutions of forced-oscillation electronic circuits but has later been adapted to self-sustained musical instruments. A computer program for solving general problems involving nonlinearly coupled exciter and resonator has been developed using this method, and the convergence has been improved as well as continuation. We briefly present the harmonic balance method before we address one specific problem of convergence due to sampling and describe the backtracking algorithm used to efficiently reduce the problem. This improvement facilitates continuation and we demonstrate the method on a simple model of a clarinet and compare with analytical results. For example, in the lossless approximation, it is verified that the method converges toward the Helmholtz motion, and we apply it to draw the bifurcation diagram when the blowing pressure is increased.
[Preprint]

A new method for the calculation of self-sustained oscillations: the perturbation of the Helmholtz motion
J. Kergomard, S. Divoir, S. Farner, and C. Vergez
In Proc. of Stockholm Music Acoustics Conference (SMAC) 2003, Stockholm, Sweden, August 6-9, 2003, pp. 397-400
 
When losses are ignored, elementary solutions for the classical models of self sustained instruments, such as reed or bowed string instruments, are pure square or "rectangular" signals, called Helmholtz motion. When losses are introduced, round corner signals are obtained, and the calculation becomes delicate. Ab initio calculation is possible, but methods limited to the steady-state regime make it easier to study the influence of the parameters on the spectrum and the playing frequency: the harmonic balance is well known, but, because losses are small, another iterative technique is suggested. Considering e.g. reed instruments, the Fourier components of the input pressure signal can be divided into two parts: the components with high input impedance, and those with low input impedance (corresponding to the missing harmonics of the rectangular signal). A perturbation method can be obtained by starting from infinite and zero impedances, respectively. A key point is that at each step, frequency is fixed in order to calculate the perturbation, then a new value is calculated using any equation of the harmonic balance system, an excellent candidate being the reactive power defined by Boutillon. In this preliminary study, results are compared for a simplified problem to those of the harmonic balance method, and they are very interesting, especially far from the oscillation thresholds.
[Preprint]

Influence of rhytmic, melodic, and semantic violations in language and music on the electrical activity in the brain
S. Ystad, C. Magne, S. Farner, G. Pallone, V. Pasdeloup, R. Kronland-Martinet, and M. Besson
In Proc. of Stockholm Music Acoustics Conference (SMAC) 2003, Stockholm, Sweden, August 6-9, 2003, pp. 671-674
 
The work presented here is part of a larger multidisciplinary project associating audio signal processing, linguistics, and cognitive neurosciences. It aims at comparing and better understanding how music and language are processed in the brain. From a music and speech synthesis point of view, this is important when striving for naturalness and expressiveness in synthesized music and language. As a first experiment towards this goal we have manipulated the rhythm in music and language as well as harmony and semantics, respectively. Since we wanted to work with natural speech, we developed and used a method to extend a given part of an audio signal without altering the timbre. This allows manipulations of the syllable lengths in the language part. In the music part we used piano tones from a sampler as a first approach. The note duration and melody of the musical sequences were digitally modified by altering the MIDI codes. In the language part of the experiment, participants were presented sentences where the final word was either semantically congruous or incongruous (e.g., I take coffee with sugar/dog ). In the musical sequences, the final part of the melody was either in or out of tonality. Moreover, the penultimate (second last) syllable or note was either of natural duration or increased in duration, in order to produce rhythmic incongruities. Thus, the two factors rhythm and semantics/tonality were independently manipulated. Changes in brain electrical activity were measured from 28 electrodes on the scalp using an Event-Related Potential method. Preliminary results show that similar reactions can be observed in language and music, at least for rhythmic violations.
[Preprint]

Remelting by Continuous Feeding of Rolled Scrap into a Melt
Snorre Farner, Frede Frisvold, and Thorvald Abel Engh
Light Metals 2000, The Minerals, Metals & Materials Society, pp. 699-704
 
Metal losses during remelting is common when recycling aluminium. Reduction of these losses could give a substantial economic gain. Experiments with continuous feeding of aluminium plates into molten aluminium have been performed. A simple steady-state mathematical model has been developed that gives the temperature profile and the penetration depth into the melt as a function of the feeding velocity, superheat, and the heat-transfer coefficients from melt to solid and from a solidified shell to the plate. A criterion for shell formation is also formulated. The results can be applied to understand more complex systems where shredded scrap is fed into molten aluminium. The model presented could be of direct interest when feeding rolled scrap into molten aluminium.
[Preprint © TMS]

Evidence for unconventional superconductivity of Sr2RuO4 from specific-heat measurements.
S. Nishizaki, Y. Maeno, S. Farner, S. Ikeda, and T. Fujita.
J. Phys. Soc. Japan, 67 (1998), pp. 560-563
 
We measured the specific heat of single crystals of non-cuprate layered perovskite superconductor Sr2RuO4. The crystals with different Tc up to 1.2 K exhibit a large residual electronic coefficient γ0, which decreases systematically with increasing Tc. This behavior is consistent with the presence of nodes in the superconducting gap and with the variations of Tc due to pair breaking by impurities and defects. To quantitatively account for the observed large γ0, however, we need to introduce additional mechanism.

Pairing symmetry of superconducting Sr2RuO4 from specific heat measurements.
S. Nishizaki, Y. Maeno, S. Farner, S. Ikeda, and T. Fujita.
Physica C, 282 (1997), pp. 1413-1414
 
We report the low temperature specific heat of single crystals of non-cuprate layered perovskite superconductor Sr2RuO4 (Tc~1K). In this paper we focus on the relation between the residual value γ0 of the electronic specific heat and the specific-heat jump across Tc. The results provide strong evidence for unconventional superconducting state.