I am a Research Engineer at Apple, where I work on multimodal generative modeling to power Apple Intelligence's image generation experiences.
Prior to this, I received my Ph.D. from University of California, Merced, under the supervision of Prof. Ming-Hsuan Yang in the Vision and Learning Lab. I completed my M.S. in Computer Science from The University of Texas at Austin and my B.S. in Electrical Engineering from National Taiwan University.
Move-in-2D: 2D-Conditioned Human Motion Generation Hsin-Ping Huang,
Yang Zhou,
Jui-Hsien Wang,
Difan Liu,
Feng Liu,
Ming-Hsuan Yang,
Zhan Xu CVPR 2025 Move-in-2D generates human motion sequences conditioned on a scene image and text prompt, using a diffusion model trained on a large-scale dataset of annotated human motions. Project Page  / 
Paper
Fine-grained Controllable Video Generation via Object Appearance and Context Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang WACV 2025 FACTOR is a video generation model that allows detailed control over objects' appearances, context, and location by optimizing the inserted attention layer with large-scale annotations. Project Page  / 
Paper  / 
Media (AK)
Generating Long-take Videos via Effective Keyframes and Guidance Hsin-Ping Huang, Yu-Chuan Su, Ming-Hsuan Yang WACV 2025 We propose a framework for generating long-take videos with multiple coherent events by decoupling video generation into keyframe generation and frame interpolation. Paper  / 
Media (AK)
Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang ECCV 2022 We present adaptive vision transformers for face anti-spoofing, introducing ensemble adapters and feature-wise transformation layers for domain adaptation with few samples. Project Page  / 
Paper
Learning to Stylize Novel Views Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang ICCV 2021 We tackle 3D scene stylization, generating stylized images from novel views by constructing a point cloud, aggregating style statistics, and modulating features with a linear transformation. Project Page  / 
Paper  / 
Media (AK)
Unsupervised and Semi-Supervised Few-Shot Acoustic Event Classification Hsin-Ping Huang, Krishna C. Puvvada, Ming Sun, Chao Wang ICASSP 2021 We study semi-supervised few-shot acoustic event classification, learning audio representations from a large amount of unlabeled data and using these representations for classification. Paper
Semantic View Synthesis Hsin-Ping Huang, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang ECCV 2020 We address semantic view synthesis: generating free-viewpoint renderings of a synthesized scene from a semantic label map by synthesizing a multiple-plane image (MPI) representation. Project Page  / 
Paper  / 
Media (AK)
Unsupervised Adversarial Domain Adaptation for Implicit Discourse Relation Classification Hsin-Ping Huang, Junyi Jessy Li CoNLL 2019 We present an unsupervised adversarial domain adaptive network with a reconstruction component that leverages explicit discourse relations to classify implicit discourse relations. Paper