WebUIBench consists of 5 categories of websites commonly visited by users: enterprise portals, background management systems, personal blogs, news sites, and e-commerce platforms.
For webpage data collection, our dataset consists of 719 full webpage and 2488 webpage slices from 5 categories, covering a variety of resolution modes. We open-source the screenshot (.png files), source HTML code (.html files), and element information (.json files) for these webpage. Based on this, WebUIBench includes 21,793 question-answer pairs, with an average of 10.68 question-answer pairs per webpage screenshot.
Download the dataset from: [🤗Huggingface] or [BaiduNetDisk]
# |
|
|||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WebUI Perception: EC=Element Classification, AP=Attribute Perception, VG=Visual Grounding; HTML Programming: CEC=Code Error Correcting, CFE=Code Function Editing; WebUI-HTML Understanding: WHM=WebUI-HTML Matching, WHR=WebUI-HTML Retrieval; W2C=WebUI-to-Code.
Please follow the requirements below to submit your evaluation results:
.json
file..json
file as an email attachment to
zyllin@bjtu.edu.cn
If you prefer to run the evaluation on your own, we provide reference code and a Docker image. See the details below:
docker pull example/benchmark:latest
@article{lin2025webuibench,
title={WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code},
author={Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao, Tianrui Wan, Yilun Ma, Junyu Gao, XueLong Li},
journal={ACL findings},
year={2025}
}