Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpora. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes.
Utterance-level persuasion strategy annotations. AUL refers to the average utterance length in terms of the number of words in an utterance and `\alpha` refers to Krippendorff’s alpha.
Architecture of the independent model for each strategy. We fix the parameters in the video encoder and train the other modules end-to-end. `\oplus` denotes the concatenation of two feature representations
Experimental Results on incorporating visual features for persuasion strategy prediction. We train an independent model for each category using BERT and RoBERTa backbones. Additionally, we also use the off-theshelf Multi-Task BERT model (MT-BERT) (Chawla et al., 2021) to jointly predict all categories.
Ablation study of adopting different context lengths for persuasion strategy prediction.
Data domain generalization experiments. We report the testing performance on the Ego4D dataset using models trained only on YouTube data (w.o. Fine-tuning), and trained on YouTube data and further fine-tuned with Ego4D data (w. Fine-tuning). We also report the performance of the models trained only on Ego4D dataset (Ego4D Only) as comparison.
Weights visualization of persuasion strategies in logistic regression. The connection between a strategy and 0 means this strategy contributes to the prediction of 0 (i.e. the voter doesn’t vote for the candidate). Likewise, the connection between a strategy and 1 denotes this strategy contributes to the prediction of 1 (i.e. the voter votes for the candidate). The transparency of lines corresponds to the weights of logistic regression. A less transparent line suggests a greater weight and more impact on the output.
@inproceedings{lai2023werewolf,
title={Werewolf among us: Multimodal resources for modeling persuasion behaviors in social deduction games},
author={Lai, Bolin and Zhang, Hongxin and Liu, Miao and Pariani, Aryan and Ryan, Fiona and Jia, Wenqi and Hayati, Shirley Anugrah and Rehg, James and Yang, Diyi},
booktitle={Findings of the Association for Computational Linguistics: ACL 2023},
pages={6570--6588},
year={2023}}