WebUIBench consists of 5 categories of websites commonly visited by users: enterprise portals, background management systems, personal blogs, news sites, and e-commerce platforms.
For webpage data collection, our dataset consists of 719 full webpage and 2488 webpage slices from 5 categories, covering a variety of resolution modes. We open-source the screenshot (.png files), source HTML code (.html files), and element information (.json files) for these webpage. Based on this, WebUIBench includes 21,793 question-answer pairs, with an average of 10.68 question-answer pairs per webpage screenshot.
      Download the dataset from: [🤗Huggingface] or [BaiduNetDisk]
| # | 
                 | 
            
          |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WebUI Perception: EC=Element Classification, AP=Attribute Perception, VG=Visual Grounding; HTML Programming: CEC=Code Error Correcting, CFE=Code Function Editing; WebUI-HTML Understanding: WHM=WebUI-HTML Matching, WHR=WebUI-HTML Retrieval; W2C=WebUI-to-Code.
Please follow the requirements below to submit your evaluation results:
.json file..json file as an email attachment to  
              zyllin@bjtu.edu.cn
            If you prefer to run the evaluation on your own, we provide reference code and a Docker image. See the details below:
docker pull example/benchmark:latest@article{lin2025webuibench,
  title={WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code},
  author={Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao, Tianrui Wan, Yilun Ma, Junyu Gao, XueLong Li},
  journal={ACL findings},
  year={2025}
}