ZHANGYUXUAN-zR commited on
Commit
6fc2833
·
verified ·
1 Parent(s): 118ebd2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - zh
5
+ base_model:
6
+ - zai-org/GLM-4.1V-9B-Base
7
+ pipeline_tag: image-text-to-text
8
+ tags:
9
+ - agent
10
+ library_name: transformers
11
+ ---
12
+
13
+ # AutoGLM-Phone-9B-Multilingual
14
+
15
+ <div align="center">
16
+ <img src="https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/logo.svg" width="20%"/>
17
+ </div>
18
+
19
+ <p align="center">
20
+ 👋 Join our <a href="https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/WECHAT.md" target="_blank">WeChat</a> community
21
+ </p>
22
+
23
+ > ⚠️ This project is intended **for research and educational purposes only**.
24
+ > Any use for illegal data access, system interference, or unlawful activities is strictly prohibited.
25
+ > Please review our [Terms of Use](https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/privacy_policy.txt) carefully.
26
+
27
+ ## Project Overview
28
+
29
+ **Phone Agent** is a mobile intelligent assistant framework built on **AutoGLM**, capable of understanding smartphone screens through multimodal perception and executing automated operations to complete tasks.
30
+ The system controls devices via **ADB (Android Debug Bridge)**, uses a **vision-language model** for screen understanding, and leverages **intelligent planning** to generate and execute action sequences.
31
+
32
+ Users can simply describe tasks in natural language—for example, *“Open Xiaohongshu and search for food recommendations.”*
33
+ Phone Agent will automatically parse the intent, understand the current UI, plan the next steps, and carry out the entire workflow.
34
+
35
+ The system also includes:
36
+ - **Sensitive action confirmation mechanisms**
37
+ - **Human-in-the-loop fallback** for login or verification code scenarios
38
+ - **Remote ADB debugging**, allowing device connection via WiFi or network for flexible remote control and development
39
+
40
+ ## Model Usage
41
+
42
+ We provide an open-source model usage guide to help you quickly download and deploy the model.
43
+ Please visit our **[GitHub](https://github.com/zai-org/Open-AutoGLM)** for detailed instructions.
44
+
45
+ - The model architecture is identical to **`GLM-4.1V-9B-Thinking`**.
46
+ For deployment details, see the **[GLM-V](https://github.com/zai-org/GLM-V)** repository.
47
+
48
+ ### Citation
49
+
50
+ If you find our work helpful, please cite the following paper:
51
+ ```bibtex
52
+ @article{liu2024autoglm,
53
+ title={Autoglm: Autonomous foundation agents for guis},
54
+ author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
55
+ journal={arXiv preprint arXiv:2411.00820},
56
+ year={2024}
57
+ }
58
+ @article{xu2025mobilerl,
59
+ title={MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents},
60
+ author={Xu, Yifan and Liu, Xiao and Liu, Xinghan and Fu, Jiaqi and Zhang, Hanchen and Jing, Bohao and Zhang, Shudan and Wang, Yuting and Zhao, Wenyi and Dong, Yuxiao},
61
+ journal={arXiv preprint arXiv:2509.18119},
62
+ year={2025}
63
+ }
64
+ ```