Kazuhito Koishida
Principal Research Manager
I am a Principal Lead Scientist at Applied Sciences Group in Experiences + Devices organization. I have been with Microsoft since 2000. My area of interests is in signal processing and machine learning for audio, speech, computer vision, and other sensor data.
Past projects
- Audio and voice compression: Bitrate/bandwidth scalable codec, MELP codec at 1.2kbps, and Windows Media Audio and Voice codec
- Audio matching: Voice note application and music recognition service
- Microphone array processing: Beamforming and sound source localization
- Audio/voice detection and recognition: Keyword spotting and speaker identification
- Speech enhancement: Audio/visual fusion and bandwidth expansion
Education
- B.S degree in Electrical Engineering from the Tokyo Institute of Technology, Japan, in 1994
- M.S. degree in Electrical Engineering from the Tokyo Institute of Technology, Japan, in 1995
- Ph.D. degree in Electrical Engineering from the Tokyo Institute of Technology, Japan, in 1998. Dissertation title: Speech Coding Based on Mel-Generalized Cepstral Analysis
- Post doctoral researcher at Signal Compression Lab in the University of California, Santa Barbara, 1998-2000
-
Toward A Multimodal Approach for Disfluency Detection and CategorizationProceedings, ICASSP 2023 June, 2023 Pages 1-5
-
Proceedings, Interspeech 2023 August, 2023 Pages 2463-2467
-
Workshop on Efficient Systems for Foundation Models @ ICML2023 July, 2023
-
Proceedings, ICASSP 2022 May, 2022 Pages 6557-6561
-
Proceedings, ICASSP 2022 May, 2022 Pages 6962-6966
-
Proceedings, ICASSP 2021 June, 2021 Pages 7153-7157
-
Proceedings, Interspeech 2021 August, 2021 Pages 2696-2700
-
Proceedings, Interspeech 2021 August, 2021 Pages 2796-2800
-
Proceedings, Interspeech 2020 October, 2020 Pages 2447-2451
-
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) June, 2020
-
Proceedings of the 37th International Conference on Machine Learning (ICML), Vienna, Austria July, 2020
-
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural NetworksProceedings, ICASSP 2020 May, 2020 Pages 6214-6218
-
Proceedings, ICASSP 2020 May, 2020 Pages 846-850
-
AV(SE)2 : Audio-Visual Squeeze-Excite Speech EnhancementProceedings, ICASSP 2020 May, 2020 Pages 7539-7543
-
IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) June, 2020 Pages 4084-4090
-
Proceedings, Interspeech 2020 October, 2020 Pages 61-65
-
Proceedings, Interspeech 2020 October, 2020 Pages 2442-2446
-
Proceedings, Interspeech 2020 October, 2020 Pages 175-179
-
Adversarial Training for Speech Super-ResolutionIEEE Journal of Selected Topics in Signal Processing May, 2019 Vol. 13, No. 2 Pages 347-358
-
Speech Super Resolution Generative Adversarial NetworkProceedings, 2019 International Conference on Acoustics, Speech and Signal Processing (ICASSP) May, 2019 Pages 3717-3721
-
Proceedings, Interspeech 2019 September, 2019 Pages 3629-3633
-
Text Independent Speaker Verification Based on Triplet Convolutional Neural Network EmbeddingsIEEE/ACM Transactions on Audio, Speech, and Language Processing September, 2018 Vol. 26, No. 9 Pages 1633-1644
-
End-to-End Text-Independent Speaker Verification with Flexibility in Utterance DurationProceedings, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) December, 2017 Pages 584-590
-
End-to-End Text-Independent Speaker Verification with Triplet Loss on Short UtterancesProceedings, Interspeech 2017 2017 Pages 1487-1491
-
Hybrid Low Bitrate Audio Coding Using Adaptive Gain Shape Vector QuantizationProceedings, 2008 IEEE 10th Workshop on Multimedia Signal Processing October, 2008 Pages 927-932
-
A 1200/2400 BPS Coding Suite Based on MELPProceedings, 2002 IEEE Workshop on Speech Coding October, 2002 Pages 90-92
-
Vector Quantization of Speech Spectral Parameters Using Statistics of Static and Dynamic FeaturesIEICE Transactions on Information and Systems October, 2001 Vol. E84-D, No. 10 Pages 1427-1434
-
A 16 kb/s Wideband CELP-based Speech Coder Using Mel-Generalized Cepstral AnalysisIEICE Transactions on Information and Systems April, 2000 Vol. E83-D, No. 4 Pages 876-883
-
A 16-kbit/s Bandwidth Scalable Audio Coder Based on the G.729 StandardProceedings, 2000 International Conference on Acoustics, Speech and Signal Processing (ICASSP) June, 2000 Pages 1149-1152
-
A 1200 BPS Speech Coder Based on MELPProceedings, 2000 International Conference on Acoustics, Speech and Signal Processing (ICASSP) June, 2000 Pages 1375-1378
-
Enhancing MPEG-4 CELP by Jointly Optimized Inter/Intra-frame LSP PredictorsProceedings, 2000 IEEE Workshop on Speech Coding September, 2000 Pages 90-92
-
CELP Speech Coding Based on Mel-Generalized Cepstral AnalysisIEICE Transactions on Information and Systems February, 1998 Vol. J81-A, No. 2 Pages 252-260
-
A 16 kbit/s Wideband CELP Coder Using Mel-Generalized Cepstral Analysis and Its Subjective EvaluationProceedings, 5th International Conference on Spoken Language Processing (ICSLP '98) 1998 Vol. 6 Pages 2583-2586
-
A Wideband CELP Speech Coder at 16 kbit/s Based on Mel-Generalized Cepstral AnalysisProceedings, 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May, 1998 Vol. 1 Pages 161-164
-
Low Bit Rate Speech Coding Based on Mel-Generalized Cepstral AnalysisTokyo Institute of Technology 1998
-
Spectral Representation of Speech Based on Mel-Generalized Cepstral Coefficients and Its PropertiesIEICE Transactions on Information and Systems November, 1997 Vol. J80-A, No. 11 Pages 1999-2006
-
Spectral Quantization Using Statistics of Static and Dynamic FeaturesProceedings, 1997 IEEE Workshop on Speech Coding for Telecommunications September, 1997 Pages 19-20
-
Efficient Encoding of Mel-Generalized Cepstrum for CELP CodersProceedings, 1997 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) April, 1997 Vol. 2 Pages 1355-1358
-
CELP Coding System Based on Mel-Generalized Cepstral AnalysisProceedings, 4th International Conference on Spoken Language Processing (ICSLP '96) October, 1996 Vol. 1 Pages 314-317
-
CELP Coding System Based on Mel-Cepstral AnalysisProceedings, 1995 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) October, 1995 Vol. 1 Pages 33-36