Vocalizations constitute an effective way to communicate both emotional arousal (bodily activation) and valence (negative/positive). There is strong evidence suggesting that the convergence of vocal expression of emotional arousal among animal species occurs, hence enabling cross-species perception of arousal, but it is not clear if the same is true for emotional valence. Here, we conducted a large online survey to test the ability of humans to perceive emotions in the contact calls of several wild and domestic ungulates produced in situations of known emotional arousal (previously validated using either heart rate or locomotion) and valence (validated based on the context of production and behavioural indicators of emotions). Participants (1024 respondents from 48 countries) were able to rate above chance levels the arousal level of vocalizations of three of the six ungulate species and the valence of four of them. Percentages of correct ratings did not differ a lot across species for arousal (49–59%), while they showed much more variation for valence (33–68%). Interestingly, several factors such as age, empathy, familiarity and specific features of the calls enhanced these scores. These findings suggest the existence of a shared emotional system across mammalian species, which is much more pronounced for arousal than valence.