DOWNLOAD
THE MONTREAL AFFECTIVE VOICES
The MAV consist of 90 nonverbal affect bursts corresponding to emotions of anger, disgust, fear, pain, sadness, surprise, happiness and pleasure (plus a neutral expression) recorded in ten different actors (five male and five females). Ratings of Valence, Arousal and Intensity along eight emotions were collected for each vocalization in thirty participants. Analyses reveal high recognition accuracies for most emotional categories (mean 68%). They also reveal significant effects of both actor’s and participant’s gender: the highest hit rates (75%) were obtained for female participants rating female vocalizations, and the lowest hit rates (60%) for male participants rating male vocalizations. Interestingly, the “mixed” situations, i.e., male participants rating female vocalizations or female participants rating male vocalizations, yielded similar, intermediate ratings.

Montreal Affective Voices related paper :
PRIMAVOICE
These stimuli include four main categories of sounds : human voices, macaque vocalizations, marmoset vocalizations and non-vocal sounds, each containing 24 stimuli, for a total of 96 sound stimuli. Each main category was divided into 4 subcategories of 6 stimuli, forming 16 subcategories in total (cf. Table S1). The set of stimuli used during training was different from the one used during scanning in order to minimize familiarization effects. Human voices contained both speech (sentence segments from the set of stimuli used in a previous study,28 n = 12), and non-speech (vocal affect bursts selected from the Montreal Affective Voices dataset;29 n = 12), equally distributed into positive (pleasure, laugh; n = 4), neutral (n = 4) and negative (angry, fear; n = 4) vocalizations. Macaque vocalizations, kindly provided by Marc Hauser,30 included both positive (coos 25%, n = 6, grunts 25%, n = 6) and negative (aggressive calls 25%, n = 6, screams 25%, n = 6) calls. Marmoset vocalizations, kindly provided by Asif Ghazanfar,31 were divided into supposed positive (trill 25%, n = 6), neutral (phee 25%, n = 6, twitter 25%, n = 6) and negative (tsik 25%, n = 6) calls. These three primate call categories contained an equal number of female and male callers. Non-vocal sounds included both natural (living 25%, n = 6, non-living 25%, n = 6) and artificial sounds (human actions 25%, n = 6, or not 25%, n = 6) from previous studies from our group3,32 or kindly provided by Christopher Petkov5 and Elia Formisano.28

Stimuli were adjusted in duration, resampled at 48828 Hz and normalized by root mean square amplitude. Finally, a 10-ms cosine ramp was applied to the onset and offset of all stimuli. During experiments, stimuli were delivered via MRI-compatible earphones (S14, SensiMetrics, USA) at a sound pressure level of approximately 85 dB (A).
Related papers :
VOICE LOCALIZER
The file TVA_loc.zip contains a set of stimuli to perform in a functional localizer of the temporal voice areas (TVA) with fMRI. This functional localizer lasts 10 minutes and is based on the contrast of vocal vs. nonvocal sounds. The voice localizer contains 40 8-sec blocs of sounds (16 bit, mono, 22050 Hz sampling rate): 20 blocs (vocal_01 -> vocal_20) consist of only vocal sound (speech as well as nonspeech), and 20 consist of only nonvocal sounds (industrial sounds, environmental sounds, as well as some animal vocalizations). All sounds have been normalized for RMS; a 1kHz tone of similar energy is provided for calibration. The file TVA_loc.txt provides a proposed order of the sound blocs, optimized for the contrast Vocal vs. Nonvocal. Number 1->20 refer to the 20 vocal blocs; number 21->40 refer to the 20 nonvocal blocs; 99 refers to an 8-sec silence bloc. The localizer has been planned for a TR of 10 sec (sparse sampling), with a dummy scan at the beginning (starting the sound stimulation), and a beginning of each block 2 sec after beginning of image acquisition. In this case, and following the bloc order suggested in TVA_loc.txt, 61 volumes should be acquired, and the vector of onsets for the two conditions VOCAL and NONVOCAL are (in seconds):
VOCAL = [22 62 82 112 132 162 202 222 242 262 312 352 372 402 432 462 482 512 542 572];
NONVOCAL= [12 32 52 102 122 142 182 232 282 302 322 342 382 422 442 472 502 522 552 592];
Note: it is also possible to use the localizer with a TR of 2 sec, with a continuous scanning noise as background.

Related papers :