Semantics2Hands: Transferring Hand Motion Semantics between Avatars

Tsinghua University
ACM MM 2023

Given a source hand motion and a target hand model, our method can retarget realistic hand motions with high fidelity to the target while preserving intricate motion semantics.


Human hands, the primary means of non-verbal communication, convey intricate semantics in various scenarios. Due to the high sensitivity of individuals to hand motions, even minor errors in hand motions can significantly impact the user experience. Real applications often involve multiple avatars with varying hand shapes, highlighting the importance of maintaining the intricate semantics of hand motions across the avatars. Therefore, this paper aims to transfer the hand motion semantics between diverse avatars based on their respective hand models. To address this problem, we introduce a novel anatomy-based semantic matrix (ASM) that encodes the semantics of hand motions. The ASM quantifies the positions of the palm and other joints relative to the local frame of the corresponding joint, enabling precise retargeting of hand motions. Subsequently, we obtain a mapping function from the source ASM to the target hand joint rotations by employing an anatomy-based semantics reconstruction network (ASRN). We train the ASRN using a semi-supervised learning strategy on the Mixamo and InterHand2.6M datasets. We evaluate our method in intra-domain and cross-domain hand motion retargeting tasks. The qualitative and quantitative results demonstrate the significant superiority of our ASRN over the state-of-the-arts.

Despite the accurate body motions, errors introduced by copying finger joint rotations make the "thumb-up" gesture illegible.

Twist-bend-splay Frame Annotation

We first use our annotation tool to annotate the twist-bend-splay frames of different hand models. The annotation tool can semi-automatically derive the frame orientation of finger joints for Twist-bend-splay from the model's kinematic tree and mesh information.

Left: Twist-bend-splay frames obtained from different hand models using our annotation tool.
Right: Finger movements in the twist, splay, and bend directions.

Anatomy-based Semantic Matrix

We then use the twist-bend-splay frames to construct the anatomy-based semantic matrix (ASM). The ASM quantifies the positions of the palm and other joints relative to the local frame of the corresponding joint, enabling precise retargeting of hand motions.

Left: The inter-finger semantic features capture the subtle semantics of finger movements.
Right: The palm-finger semantic features capture the overall hand posture.

Semantics-Preserving Retargeting

The hand retargeting pipeline comprises two stages: semantic feature extraction and semantics-preserving reconstruction. We extract semantic matrices from the source hand motion during the first stage. In the second stage, we employ the anatomy-based semantics reconstruction network (ASRN) to reconstruct hand motion on the target hand model from the source ASM while preserving the source semantics.

The extraction stage involves the retrieval of ASM from the source hand motion. The reconstruction stage utilizes the source ASM, target hand shape parameter, and target hand anatomical parameter to reconstruct the target hand motion.


          title={Semantics2Hands: Transferring Hand Motion Semantics between Avatars},
          author={Ye, Zijie and Jia, Jia and Xing, Junliang},
          booktitle={Proceedings of the 31st ACM International Conference on Multimedia},