OmniParser

发布于 2025-02-27 | 软件分享 •

字数总计 2351 | 阅读时长 5分钟 | 阅读量 52

OmniParser微软开源的屏幕解析工具，能够将用户界面的截图解析为结构化且易于处理的元素。它采用 Python 开发，基于 YOLO、BLIP2 和 Florence 等模型，实现较为精准的图标识别并生成描述性文本，支持与多种主流大语言模型（GPT-4V）集成，适用于开发桌面自动化操作的应用。

安装

下载源代码

git clone <https://github.com/microsoft/OmniParser>
# 搭建运行环境
cd OmniParser
conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt

下载模型文件

在源码目录下执行：

# Make sure you have git-lfs installed (<https://git-lfs.com>)
git lfs install
# download the model checkpoints to local directory OmniParser/weights/
git clone <https://huggingface.co/microsoft/OmniParser> weights
mv weights/icon_caption weights/icon_caption_florence

运行Demo

python gradio_demo.py

效果还行把，可惜显卡太拉了，跑个手机桌面截图跑2分钟