Toward AI-Mediated Avatar-Based Telecommunication: Investigating Visual Impression of Switching Between User- and AI-Controlled Avatars in Video Chat

IEEE Access
1OMRON SINIC X Corporation2National Institute of Advanced Industrial Science and Technology (AIST)* These authors are equally contributed.

TL;DR Our target scenario is a video chat between two people using avatars, where one person uses an auto-switching mechanism to switch between user-controlled and AI-controlled modes. Our study shows that appropriate auto-switching improves the user experience, highlighting the importance of future research in advancing this auto-switching mechanism.

Overview

Telecommunications technology has evolved rapidly, creating opportunities to diversify the communication culture through AI mediation. We anticipate that people will interchangeably use both user-controlled (controlled by transferring user movements) and AI-controlled avatars (controlled autonomously) in everyday communication. For example, users may temporarily use the AI-controlled mode during distractions. This paper argues the importance of investigating the auto-switching between user- and AI-controlled modes to improve the user experience in upcoming AI-mediated telecommunications. As a first step, we conducted a crowdsourced user experiment in a video chat context, focusing on the visual impressions of the displayed avatars. The result shows that impression improved when an appropriate switch setting is used, underscoring the value of this research direction. To identify the appropriate switch setting for our experiment, we developed a general-purpose adaptive experimental design tool based on Bayesian optimization, which we plan to release publicly.

Video

Target Scenario and System

This paper focuses on two-person video chat scenarios where both individuals use 3D avatars. One person alternates between user-controlled and AI-controlled modes, with the system automatically switching between them, while the other person, unaware of the switching, remains in user-controlled mode. The paper poses the following research question and aims to explore the visual impression (attention and naturalness) that the auto-switching avatar provides to the observing person.

RQ: Does optimizing the auto-switching method between user- and AI-controlled avatars result in a superior user experience and enhanced communication?

As a first step towards this direction, our auto-switching system employs a simple strategy to determine the switch timing based on head pose (yaw and pitch) and tolerance duration as parameter values. However, determining the optimal parameter values for switching behavior is challenging. Therefore, we employ adaptive experimental design to search for the optimal parameter set that provides the best user experience within our experimental setting.

Adaptive Experimental Design

Adaptive experimental design is a technique that enhances the efficiency of experiments by adaptively sampling parameters during the experiment, allowing researchers to achieve their goals with fewer tests. Unlike traditional methods, where all test conditions are predetermined, adaptive design uses computational techniques like Bayesian Optimization (BO) to progressively build a model and select the next samples based on data collected so far.

This approach is particularly beneficial in fields with high experimental costs, such as drug discovery, and is also well-suited for Human-Computer Interaction (HCI) research, where human participant involvement is costly. While adaptive experimental design is rarely mentioned in existing HCI literature, it has been implicitly used in several studies to optimize system parameters through iterative user experiments. This work is the first to explicitly introduce the concept within the context of HCI research.

Optimization

We conducted a search for the optimal parameter set for switching between user-controlled and AI-controlled modes with the adaptive experimental design approach. Through 10 iterations involving a total of 100 participants, we successfully identified the optimal parameter set that provided the best visual impression in terms of attention and naturalness.

Results

The obtained parameter set was evaluated by comparing it with randomly generated parameter sets with 24 people. The video below with "Optimal" label (upper left) was generated using the obtained parameter set. The other three videos were generated with randomly generated parameter sets.

We performed a statistical analysis of the experimental results using the Bayesian Bradley–Terry model with random-effects. The table shows the probabilities of the differences in the latent preference scores of the two parameter sets. Each cell shows the probability that the parameter set in the row is superior to the parameter set in the column.

In terms of attention, the overall probability that the preference score of Optimal was greater than that of the other random parameter sets was 99.2% (0.9920 = 0.9998 × 1.0000 × 0.9994). For naturalness, the overall probability was 86.3% (0.8632 = 0.8826 × 0.9986 × 0.9794).

Conclusion

Our study provides evidence that the user experience can be improved when an appropriate auto-switching setting is designed. This finding highlights the importance of future research in advancing this auto-switching mechanism.

Citation

@ARTICLE{10632136,
  author={Yoshida, Shigeo and Koyama, Yuki and Ushiku, Yoshitaka},
  journal={IEEE Access},
  title={Toward AI-Mediated Avatar-Based Telecommunication: Investigating Visual Impression of Switching Between User- and AI-Controlled Avatars in Video Chat},
  year={2024},
  volume={12},
  number={},
  pages={113372-113383},
  keywords={Avatars;Switches;Web conferencing;Communications technology;Artificial intelligence;User experience;Task analysis;Bayes methods;Human computer interaction;Telecommunications;Avatar;adaptive experimental design;Bayesian optimization;human-computer interaction;telecommunication},
  doi={10.1109/ACCESS.2024.3441233}
}