Kazuhito Koishida

Principal Research Manager

Redmond, Washington
U.S.A.

Kazuhito Koishida is a principal lead scientist in Microsoft’s Applied Sciences Group in the Experiences + Devices organization. He has been with Microsoft since 2000. Kazuhito’s area of interests are in signal processing and machine learning for audio, speech, computer vision, and other sensor data.

Past projects

Audio and voice compression: Bitrate/bandwidth scalable codec, MELP codec at 1.2kbps, and Windows Media audio and voice codec
Audio matching: Voice note application and music recognition service
Microphone array processing: Beamforming and sound source localization
Audio/voice detection and recognition: Keyword spotting and speaker identification
Speech enhancement: Audiovisual fusion and bandwidth expansion

Education

B.S degree in electrical engineering from the Tokyo Institute of Technology, Japan, 1994
M.S. degree in electrical engineering from the Tokyo Institute of Technology, Japan, 1995
Ph.D. degree in electrical engineeringg from the Tokyo Institute of Technology, Japan, 1998. Dissertation title: "Speech Coding Based on Mel-Generalized Cepstral Analysis"
Post doctoral researcher at the Signal Compression Lab at the University of California, Santa Barbara, 1998-2000

Windows Agent Arena

CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions

To appear in ICASSP 2025 2025

By Vasily Zadorozhnyy, Saeed Amizadeh, Qiang Ye, Kazuhito Koishida
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

NeurIPS Workshop 2024 December, 2024

By Rogerio Bonatti, Danny Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Jang, Zack Hui
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

NeurIPS Workshop 2024 December, 2024

By Lawrence Jang, Yinheng Li, Charles Ding, Justin Lin, Paul Pu Liang, Danny Zhao, Rogerio Bonatti, Kazuhito Koishida
Learned Image Compression with Text Quality Enhancement

Proceedings, 2024 IEEE International Conference on Image Processing (ICIP) October, 2024

By Andrew Lai (Chih-Yu Lai), Dung Tran, Kazuhito Koishida
Automatic Disfluency Detection From Untranscribed Speech

IEEE/ACM Transactions on Audio, Speech, and Language Processing October, 2024 Vol. 32 Pages 4727-4740

By Amrit Romana, Kazuhito Koishida, Emily Mower Provost
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Proceedings, Interspeech 2024 September, 2024

By Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi
LiveSpeech: Low-Latency Zero-Shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Proceedings, Interspeech 2024 September, 2024

By Trung Dang, David Aponte, Dung Tran, Kazuhito Koishida
Data Generation using Large Language Models for Text Classification: An Empirical Case Study

DMLR Workshop in ICML 2024 July, 2024

By Yinheng Li, Rogerio Bonatti, Sara Abdali, Justin Wagle, Kazuhito Koishida
Weakly-Supervised Audio Separation via Bi-modal Semantic Similarity

Proceedings of the Twelfth International Conference on Learning Representations (ICLR) May, 2024 Vol. abs/2404.01740

By Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Proceedings. ICASSP 2024 April, 2024 Pages 5435-5439

By Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida
Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis

Proceedings. International Conference on Complex Networks and Their Applications November, 2023 Pages 363-373 ISBN: 978-3-031-53468-3

By Minh Bui, Dung Tran, Kazuhito Koishida, Trac D. Tran, Peter Chin
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

Proceedings, Interspeech 2023 August, 2023 Pages 2463-2467

By Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
Progressive Knowledge Distillation: Balancing Inference Latency and Accuracy at Runtime

Workshop on Efficient Systems for Foundation Models @ ICML2023 July, 2023

By Don Dennis, Abhishek Shetty, Anish Sevekari, Kazuhito Koishida, Virginia Smith
Toward A Multimodal Approach for Disfluency Detection and Categorization

Proceedings, ICASSP 2023 June, 2023 Pages 1-5

By Amrit Romana, Kazuhito Koishida
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Proceedings, ICASSP 2022 May, 2022 Pages 6557-6561

By Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
A Training Framework for Stereo-Aware Speech Enhancement Using Deep Neural Networks

Proceedings, ICASSP 2022 May, 2022 Pages 6962-6966

By Bahareh Tolooshams, Kazuhito Koishida
Single-Channel Speech Enhancement Using Learnable Loss Mixup

Proceedings, Interspeech 2021 August, 2021 Pages 2696-2700

By Oscar Chang, Dung Tran, Kazuhito Koishida
INTERSPEECH 2021 Deep Noise Suppression Challenge

Proceedings, Interspeech 2021 August, 2021 Pages 2796-2800

By Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
Cascaded Time + Time-Frequency Unet for Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, and Gaps

Proceedings, ICASSP 2021 June, 2021 Pages 7153-7157

By Arun Nair, Kazuhito Koishida
Online Directional Speech Enhancement Using Geometrically Constrained Independent Vector Analysis

Proceedings, Interspeech 2020 October, 2020 Pages 61-65

By Li Li, Kazuhito Koishida, Shoji Makino
Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks

Proceedings, Interspeech 2020 October, 2020 Pages 2442-2446

By Ahmet Bulut, Kazuhito Koishida
Robust Pitch Regression with Voiced/Unvoiced Classification in Nonstationary Noise Environments

Proceedings, Interspeech 2020 October, 2020 Pages 175-179

By Dung Tran, Uros Batricevic, Kazuhito Koishida
Single-Channel Speech Enhancement by Subspace Affinity Minimization

Proceedings, Interspeech 2020 October, 2020 Pages 2447-2451

By Dung Tran, Kazuhito Koishida
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria July, 2020

By Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida
MMTM: Multimodal Transfer Module for CNN Fusion

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) June, 2020

By Hamid Vaezi Joze, Amirreza Shaban, Michael Iuzzolino, Kazuhito Koishida
Improved Active Speaker Detection Based on Optical Flow

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) June, 2020 Pages 4084-4090

By Chong Huang, Kazuhito Koishida
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks

Proceedings, ICASSP 2020 May, 2020 Pages 6214-6218

By Ahmet Bulut, Kazuhito Koishida
Geometrically Constrained Independent Vector Analysis For Directional Speech Enhancement

Proceedings, ICASSP 2020 May, 2020 Pages 846-850

By Li Li, Kazuhito Koishida
AV(SE)2 : Audio-Visual Squeeze-Excite Speech Enhancement

Proceedings, ICASSP 2020 May, 2020 Pages 7539-7543

By Michael Iuzzolino, Kazuhito Koishida
Sound Event Detection in Multichannel Audio Using Convolutional Time-Frequency-Channel Squeeze and Excitation

Proceedings, Interspeech 2019 September, 2019 Pages 3629-3633

By Wei Xia, Kazuhito Koishida
Adversarial Training for Speech Super-Resolution

IEEE Journal of Selected Topics in Signal Processing May, 2019 Vol. 13, No. 2 Pages 347-358

By Sefik Emre Eskimez, Kazuhito Koishida, Zhiyao Duan
Speech Super Resolution Generative Adversarial Network

Proceedings, 2019 International Conference on Acoustics, Speech and Signal Processing (ICASSP) May, 2019 Pages 3717-3721

By Sefik Emre Eskimez, Kazuhito Koishida
Text Independent Speaker Veriﬁcation Based on Triplet Convolutional Neural Network Embeddings

IEEE/ACM Transactions on Audio, Speech, and Language Processing September, 2018 Vol. 26, No. 9 Pages 1633-1644

By Chunlei Zhang, Kazuhito Koishida, John H. L. Hansen
End-to-End Text-Independent Speaker Verification with Flexibility in Utterance Duration

Proceedings, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) December, 2017 Pages 584-590

By Chunlei Zhang, Kazuhito Koishida
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances

Proceedings, Interspeech 2017 2017 Pages 1487-1491

By Chunlei Zhang, Kazuhito Koishida
Hybrid Low Bitrate Audio Coding Using Adaptive Gain Shape Vector Quantization

Proceedings, 2008 IEEE 10th Workshop on Multimedia Signal Processing October, 2008 Pages 927-932

By Sanjeev Mehrotra, Wei-ge Chen, Kazuhito Koishida, Naveen Thumpudi
A 1200/2400 BPS Coding Suite Based on MELP

Speech Coding, 2002, IEEE Workshop Proceedings October, 2002 Pages 90-92

By Tian Wang, Kazuhito Koishida, Vladimir Cuperman, Allen Gersho, J.S. Collura
Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic Features

IEICE Transactions on Information and Systems October, 2001 Vol. E84-D, No. 10 Pages 1427-1434

By Kazuhito Koishida, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi
Enhancing MPEG-4 CELP by Jointly Optimized Inter/Intra-frame LSP Predictors

2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421) September, 2000 Pages 90-92

By Kazuhito Koishida, Jan Lindén, Vladimir Cuperman, Allen Gersho
A 16-kbit/s Bandwidth Scalable Audio Coder Based on the G.729 Standard

Proceedings, 2000 International Conference on Acoustics, Speech and Signal Processing (ICASSP) June, 2000 Pages 1149-1152

By Kazuhito Koishida, Vladimir Cuperman, Allen Gersho
A 1200 BPS Speech Coder Based on MELP

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) June, 2000 Vol. 3 Pages 1375-1378

By Tian Wang, Kazuhito Koishida, Vladimir Cuperman, Allen Gersho, J.S. Collura
A 16 kb/s Wideband CELP-based Speech Coder Using Mel-Generalized Cepstral Analysis

IEICE Transactions on Information and Systems April, 2000 Vol. 83 Pages 876-883

By Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, Takao Kobayashi
A Wideband CELP Speech Coder at 16 kbit/s Based on Mel-Generalized Cepstral Analysis

Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) May, 1998 Vol. 1 Pages 161-164

By Kazuhito Koishida, Keiichi Tokuda, Gou Hirabayashi, Takao Kobayashi
CELP Speech Coding Based on Mel-Generalized Cepstral Analysis

IEICE Transactions on Information and Systems February, 1998 Vol. J81-A, No. 2 Pages 252-260

By Kazuhito Koishida, Keiichi Tokuda, Satoshi Imai, Takao Kobayashi
A 16 kbit/s Wideband CELP Coder Using Mel-Generalized Cepstral Analysis and Its Subjective Evaluation

Proceedings, 5th International Conference on Spoken Language Processing (ICSLP '98) 1998 Vol. 6 Pages 2583-2586

By Kazuhito Koishida, Gou Hirabayashi, Keiichi Tokuda, Takao Kobayashi
Low Bit Rate Speech Coding Based on Mel-Generalized Cepstral Analysis

Tokyo Institute of Technology 1998

By Kazuhito Koishida
Spectral Representation of Speech Based on Mel-Generalized Cepstral Coefficients and Its Properties

IEICE Transactions on Information and Systems November, 1997 Vol. J80-A, No. 11 Pages 1999-2006

By Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai
Spectral Quantization Using Statistics of Static and Dynamic Features

1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding September, 1997 Pages 19-20

By Kazuhito Koishida, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi
Efficient Encoding of Mel-Generalized Cepstrum for CELP Coders

Proceedings, 1997 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) April, 1997 Vol. 2 Pages 1355-1358

By Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, S. Imai
CELP Coding System Based on Mel-Generalized Cepstral Analysis

Proceedings, 4th International Conference on Spoken Language Processing (ICSLP '96) October, 1996 Vol. 1 Pages 318-321

By Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, S. Imai
CELP Coding Based on Mel-Cepstral Analysis

Proceedings, 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May, 1995 Vol. 1 Pages 33-36

By Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, S. Imai

Contact

Redmond, Washington
U.S.A.