Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

publications

ICDAR 2019 Competition on Table Detection and Recognition (cTDaR)

Gao, Liangcai, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. "ICDAR 2019 competition on table detection and recognition (cTDaR)." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510-1515. IEEE, 2019. [link][code]

A GAN-based feature generator for table detection

Li, Yibo, Liangcai Gao, Zhi Tang, Qinqin Yan, and Yilun Huang. "A GAN-based feature generator for table detection." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 763-768. IEEE, 2019. [link]

A YOLO-based table detection method

Huang, Yilun, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. "A YOLO-based table detection method." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813-818. IEEE, 2019. [link]

NTable: A Dataset for Camera-Based Table Detection

Zhu, Ziyi, Liangcai Gao, Yibo Li, Yilun Huang, Lin Du, Ning Lu, and Xianfeng Wang. "NTable: A Dataset for Camera-Based Table Detection." In International Conference on Document Analysis and Recognition, pp. 117-129. Springer, Cham, 2021. [link][code]

Rethinking table structure recognition using sequence labeling methods

Li, Yibo, Yilun Huang, Ziyi Zhu, Lemeng Pan, Yongshuai Huang, Lin Du, Zhi Tang, and Liangcai Gao. "Rethinking table structure recognition using sequence labeling methods." In International Conference on Document Analysis and Recognition, pp. 541-553. Springer, Cham, 2021. [link][code]

DAMO-YOLO : A Report on Real-Time Object Detection Design

Xu, Xianzhe*, Yiqi Jiang*, Weihua Chen*, Yilun Huang*, Yuan Zhang*, and Xiuyu Sun. "DAMO-YOLO: A Report on Real-Time Object Detection Design." arXiv preprint arXiv:2211.15444 (2022). [link][code]

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Shen, Xuan, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, and Yanzhi Wang. "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6163-6173. 2023. [link][code]

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Chen, Daoyuan*, Yilun Huang*, Zhijian Ma*, Hesen Chen*, Xuchen Pan, Ce Ge, Dawei Gao et al. "Data-juicer: A one-stop data processing system for large language models." In Companion of the 2024 International Conference on Management of Data, pp. 120-134. 2024. [link][code]

Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

Jiao, Qirui, Daoyuan Chen, Yilun Huang, Yaliang Li, and Ying Shen. "Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study." arXiv preprint arXiv:2401.17981 (2024). [link]

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Qin, Zhen, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, and Shuiguang Deng. "The synergy between data and multi-modal large language models: A survey from co-development perspective." IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). [link][code]

Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

Chen, Daoyuan*, Haibin Wang*, Yilun Huang*, Ce Ge, Yaliang Li, Bolin Ding, and Jingren Zhou. "Data-juicer sandbox: A feedback-driven suite for multimodal data-model co-development." In Forty-second International Conference on Machine Learning. 2025. Spotlight. [link][code]

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Jiao, Qirui, Daoyuan Chen, Yilun Huang, Bolin Ding, Yaliang Li, and Ying Shen. "Img-diff: Contrastive data synthesis for multimodal large language models." In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 9296-9307. 2025. [link][code]

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models

Chen, Daoyuan*, Yilun Huang*, Xuchen Pan, Nana Jiang, Haibin Wang, Ce Ge, Yushuo Chen et al. "Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models." arXiv preprint arXiv:2501.14755 (2024). [link][code]

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

Jiao, Qirui, Daoyuan Chen, Yilun Huang, Xika Lin, Ying Shen, and Yaliang Li. "DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?." arXiv preprint arXiv:2505.16915 (2025). [link][code]

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

Pan, Xuchen, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Yaliang Li, Bolin Ding, Jingren Zhou. "Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models." arXiv preprint arXiv:2505.17826 (2025). [link][code]

Yilun Huang

Sitemap

Pages

Page Not Found

Yilun Huang's Homepage

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

Blog Post number 4

publications

ICDAR 2019 Competition on Table Detection and Recognition (cTDaR)

A GAN-based feature generator for table detection

A YOLO-based table detection method

NTable: A Dataset for Camera-Based Table Detection

Rethinking table structure recognition using sequence labeling methods

DAMO-YOLO : A Report on Real-Time Object Detection Design

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Data-Juicer: A One-Stop Data Processing System for Large Language Models

Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models

DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models