Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
publications
ICDAR 2019 Competition on Table Detection and Recognition (cTDaR)
Gao, Liangcai, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. "ICDAR 2019 competition on table detection and recognition (cTDaR)." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510-1515. IEEE, 2019. [link][code]
A GAN-based feature generator for table detection
Li, Yibo, Liangcai Gao, Zhi Tang, Qinqin Yan, and Yilun Huang. "A GAN-based feature generator for table detection." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 763-768. IEEE, 2019. [link]
A YOLO-based table detection method
Huang, Yilun, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. "A YOLO-based table detection method." In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813-818. IEEE, 2019. [link]
NTable: A Dataset for Camera-Based Table Detection
Zhu, Ziyi, Liangcai Gao, Yibo Li, Yilun Huang, Lin Du, Ning Lu, and Xianfeng Wang. "NTable: A Dataset for Camera-Based Table Detection." In International Conference on Document Analysis and Recognition, pp. 117-129. Springer, Cham, 2021. [link][code]
Rethinking table structure recognition using sequence labeling methods
Li, Yibo, Yilun Huang, Ziyi Zhu, Lemeng Pan, Yongshuai Huang, Lin Du, Zhi Tang, and Liangcai Gao. "Rethinking table structure recognition using sequence labeling methods." In International Conference on Document Analysis and Recognition, pp. 541-553. Springer, Cham, 2021. [link][code]
DAMO-YOLO : A Report on Real-Time Object Detection Design
Xu, Xianzhe*, Yiqi Jiang*, Weihua Chen*, Yilun Huang*, Yuan Zhang*, and Xiuyu Sun. "DAMO-YOLO: A Report on Real-Time Object Detection Design." arXiv preprint arXiv:2211.15444 (2022). [link][code]
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
Shen, Xuan, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, and Yanzhi Wang. "DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6163-6173. 2023. [link][code]
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Chen, Daoyuan*, Yilun Huang*, Zhijian Ma*, Hesen Chen*, Xuchen Pan, Ce Ge, Dawei Gao et al. "Data-juicer: A one-stop data processing system for large language models." In Companion of the 2024 International Conference on Management of Data, pp. 120-134. 2024. [link][code]
Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study
Jiao, Qirui, Daoyuan Chen, Yilun Huang, Yaliang Li, and Ying Shen. "Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study." arXiv preprint arXiv:2401.17981 (2024). [link]
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Qin, Zhen, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, and Shuiguang Deng. "The synergy between data and multi-modal large language models: A survey from co-development perspective." IEEE Transactions on Pattern Analysis and Machine Intelligence (2025). [link][code]
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development
Chen, Daoyuan*, Haibin Wang*, Yilun Huang*, Ce Ge, Yaliang Li, Bolin Ding, and Jingren Zhou. "Data-juicer sandbox: A feedback-driven suite for multimodal data-model co-development." In Forty-second International Conference on Machine Learning. 2025. Spotlight. [link][code]
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Jiao, Qirui, Daoyuan Chen, Yilun Huang, Bolin Ding, Yaliang Li, and Ying Shen. "Img-diff: Contrastive data synthesis for multimodal large language models." In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 9296-9307. 2025. [link][code]
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models
Chen, Daoyuan*, Yilun Huang*, Xuchen Pan, Nana Jiang, Haibin Wang, Ce Ge, Yushuo Chen et al. "Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for Foundation Models." arXiv preprint arXiv:2501.14755 (2024). [link][code]
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
Jiao, Qirui, Daoyuan Chen, Yilun Huang, Xika Lin, Ying Shen, and Yaliang Li. "DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?." arXiv preprint arXiv:2505.16915 (2025). [link][code]
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
Pan, Xuchen, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Yaliang Li, Bolin Ding, Jingren Zhou. "Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models." arXiv preprint arXiv:2505.17826 (2025). [link][code]