curriculum vitae

General Information

Full Name Zheng Gao
Contact woshigaozheng [at] gmail [dot] com
Research Interests Large Language Model, Natural Langauge Processing, Graph Mining
Languages English, Chinese

Education

  • 2015 - 2020
    Ph.D. in Information Science
    Indiana University Bloomington, United States
  • 2013 - 2015
    M.S. in Information Science
    University of Pittsburgh, United States
  • 2009 - 2013
    B.M. in Information Management and System
    Shanghai International Studies University, China

Experience

  • 02/2023 - now
    Senior Algorithm Engineer
    Ant Group
    • Trained 10B and 65B Ant Group self-innovated large language model (LLM). Supported supervised fine-tuning stage to improve LLM reasoning capability.
    • Lead to implement and maintain company-level LLM evaluation pipeline, which contained 50+ public/private datasets and 20+ evaluation metrics. It served as the default evaluation toolkit for all major Ant Group Artificial General Intelligence (AGI) teams.
    • Implemented LLM agents to support two Ant Group internal human resource related use-cases, including meeting arrangements and hiring & dimission.
  • 06/2020 - 01/2023
    Applied Scientist
    Amazon Alexa AI
    • Built Natural Language Understanding (NLU) pipelines for Alexa to support customer utterance interpretation.
    • Expanded Alexa NLU pipelines from English regions to other language regions by involving contextual signals.
    • Developed 3p skill recommendation for fall back utterances to improve customer experience.
  • 06/2019 - 09/2019
    Data Scientist Intern
    Amazon Alexa AI
    • Applied deep language models and state-of-art clustering methods to extract infuential text patterns from user requests.
    • Built up an automatic pipeline by Spark and Shell scripts to enable training models on multiple data resources under Alexa restricted environment to replace existing human labor annotation.
  • 02/2018 - 03/2019
    NLP Research Intern
    Alibaba DAMO Academy / AI Lab
    • Generated product review summary from user consecutive behaviors by leveraging dynamic matrix factorization, deep reinforcement learning (Policy Gradient) and sequence to sequence model (Neural Machine Translation) with Attention techniques.
    • Proposed an end-to-end pairwise ranking model with transfer learning techniques to detect communities in targeted sparse graphs.
    • Detected multilevel anomalies from high dimensional dynamic use logs via Adversarial Autoencoder and Attention-based hierarchical representation learning.

Services

  • Conference Reviewer
    • Annual Meeting of the Association for Computational Linguistics (ACL 2024)
    • iConference (2023,2024)
    • ACM International Conference on Web Search and Data Mining (WSDM 2023,2024)
    • AAAI Conference on Artificial Intelligence (AAAI 2022,2023,2024)
    • International Workshop on Deep Learning Practice for High-Dimensional Sparse Data (DLP-RecSys 2023; DLP-KDD 2020,2021)
    • Workshop on Information Extraction from Scientific Publications (WIESP-AACL 2022)
    • The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
    • China Conference on Knowledge Graph and Semantic Computing (CCKS 2022)
    • Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE 2022)
    • IEEE International Conference on Multimedia and Expo (ICME 2022)
    • The Web Conference (WWW 2019, 2020)
    • ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018, 2022)
    • IEEE International Conference on Big Data (BigData 2020, 2022)
    • Joint Conference on Digital Libraries (JCDL 2021, 2022)
    • International Workshop on Knowledge Graph (IWKG-KDD 2020)
    • Workshop on Scholarly Document Processing (SDP-NAACL 2021, SDP-COLING 2022)
    • International Conference on Information Systems (ICIS 2021)
    • China Conference on Information Retrieval (CCIR 2021)
  • Journal Reviewer
    • Data Intelligence (2022)
    • The Social Science Journal (2022)
    • Journal of Informetrics (JOI 2021)
    • Computers in Industry (2021)
    • Journal of the Association for Information Science and Technology (JASIST 2019, 2021)
    • PeerJ Computer Science (2020)
    • PLoS ONE (2020, 2021)
    • BMC Bioinformatics (2019, 2020, 2022)
    • Social Network Analysis and Mining (SNAM 2019, 2020, 2021)
    • Medical Science Monitor (2019)
    • ACM Transactions on Computing for Healthcare (2020)
  • Funding Reviewer
    • Amazon Research Awards (ARA 2022)
  • Administrative Service
    • Chair of Doctoral Student Association (DSA) at Department of Information and Library Science, Indiana University Bloomington (2016 - 2018)

Honors and Awards

  • 2018 - 2019
    • Clayton A. Shepherd Scholarship, Indiana University Bloomington
  • 2015 - 2018
    • T’ung-li Yuan Memorial Fellowship, Indiana University Bloomington