Configuration Parsing Warning: Invalid JSON for config file config.json
3D-Speaker-MT.axera
meeting transcription demo on Axera
- Python 示例
- C++ 示例
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo :
How to Convert from ONNX to axmodel
支持平台
- AX650N
功能
- 会议音频转录与总结
模型转换
参考模型转换
上板部署
- AX650N 的设备已预装 Ubuntu22.04
- 以 root 权限登陆 AX650N 的板卡设备
- 链接互联网,确保 AX650N 的设备能正常执行 apt install, pip install 等指令
- 已验证设备:AX650N DEMO Board
Python API 运行
在python3.10(验证)
Requirements
pip3 install -r requirements.txt
流式会议纪要 Web Demo
支持浏览器麦克风实时分段转录,会议结束后自动做说话人聚类 + ASR,并调用 OpenAI 兼容接口生成会议纪要。
启动:
python -m app.server
浏览器访问:
http://127.0.0.1:8000
环境变量(可选,用作会议纪要生成):
OPENAI_API_KEY=xxx
OPENAI_BASE_URL=http://127.0.0.1:8001/v1 # 本地 OpenAI 协议服务时设置
OPENAI_MODEL=AXERA-TECH/Qwen3-1.7B
HOST=0.0.0.0
PORT=8000
SSL_CERT=cert.pem
SSL_KEY=key.pem
依赖提示(WebSocket):
- 请确保安装了
websockets或uvicorn[standard],否则浏览器实时流式会失败
设备权限提示:
- 如果遇到
/dev/axcl_host权限错误,请用有权限的账号或sudo运行
HTTPS(推荐,便于浏览器麦克风权限):
openssl req -x509 -newkey rsa:2048 -nodes \\
-keyout key.pem -out cert.pem -days 365 \\
-subj "/CN=<你的IP>"
SSL_CERT=cert.pem SSL_KEY=key.pem python -m app.server
离线处理脚本
对单个会议音频文件执行说话人聚类 + ASR,并导出文本,可选会议总结(LLM 通过参数配置):
python app/cli_batch.py --wav_file wav/vad_example.wav --output_dir output_dir
带会议总结(可选参数覆盖 LLM 配置):
python app/cli_batch.py \\
--wav_file wav/vad_example.wav \\
--output_dir output_dir \\
--summary \\
--openai_base_url http://127.0.0.1:8001/v1 \\
--openai_model AXERA-TECH/Qwen3-1.7B \\
--openai_api_key xxx
Latency
AX650N
| model | latency(ms) |
|---|---|
| vad | 5.441 |
| cammplus | 2.907 |
| sensevoice | 25.482 |
RTF: 约为0.2
eg:
Inference time for vad_example.wav: 10.92 seconds
- VAD processing time: 2.20 seconds
- Speaker embedding extraction time: 1.88 seconds
- Speaker clustering time: 0.16 seconds
- ASR processing time: 3.75 seconds
load model + Inference time for vad_example.wav: 13.08 seconds
Audio duration: 70.47 seconds
RTF: 0.15
参考:
技术讨论
- Github issues
- QQ 群: 139953715
- Downloads last month
- 25
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for AXERA-TECH/3D-Speaker-MT.Axera
Base model
FunAudioLLM/SenseVoiceSmall