toolName: search_codebase status: success query: Locate existing Python code for docling and word2markdown converters, FastAPI app, and any existing front-end reference at docling/app/static/index.html. Identify project structure and existing functions for converting docx/pdf to markdown and custom conversions. toolName: view_folder status: success /Users/fanyang/Desktop/FunMD_Convert toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/services/docling_adapter.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/services/docling_adapter.py toolName: search_by_regex status: success query: def .*\( toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/word2markdown/convert_to_md.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/word2markdown/convert_to_md.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/static/index.html toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/static/index.html toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/static/index.html toolName: search_by_regex status: success query: convert-folder toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: search_by_regex status: success query: Dockerfile toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/tests/test_md_to_docx.py toolName: view_folder status: success /Users/fanyang/Desktop/FunMD_Convert/word2markdown toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/word2markdown/app.py toolName: todo_write status: success Todos updated: 5 items toolName: ApplyPatch status: running toolName: write_to_file status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/services/unified_converter.py content: undefined toolName: ApplyPatch status: running toolName: edit_file_update status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py changes: toolName: ApplyPatch status: running toolName: edit_file_update status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py changes: toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py toolName: ApplyPatch status: running toolName: edit_file_update status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/server.py changes: toolName: ApplyPatch status: running toolName: write_to_file status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/tests/test_api_convert.py content: undefined toolName: run_command status: success command: python -m pytest -q undefined toolName: ApplyPatch status: running toolName: edit_file_update status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/tests/test_api_convert.py changes: toolName: run_command status: success command: python -m unittest -q undefined toolName: run_command status: success command: python -m unittest discover -s tests -p "test_*.py" -v undefined toolName: todo_write status: success Todos updated: 5 items toolName: view_files status: success filePath: /Users/fanyang/Desktop/FunMD_Convert/docling/app/services/unified_converter.py **重构概要** - 统一了两套转换能力(Docling 通用转换 + Word2Markdown 定制转换),新增统一入口类并在后端暴露标准化接口。 - 按文档规范统一了接口输出结构(POST、JSON、包含 code/msg/data),并为错误处理加上 try/except 包装。 - 增强 Markdown/HTML 的后处理:将 `:::` 提示块转换为 `!!!`,HTML 表格标签强制小写,去除标签后无意义换行。 - 扩展批量上传:支持 ZIP/TAR 压缩包上传,自动解压、资源上传至 MinIO、Markdown 资产重写并回传可用 URL。 - 预置了单元测试样例,覆盖统一转换和接口返回结构,后续安装依赖后可运行。 **核心改动** - 统一转换类 - 新增 `FormatConverter`,可一键选择使用 Docling 或 Word2Markdown 引擎,统一返回目标格式。 - 文件位置:`docling/app/services/unified_converter.py:57`,核心方法 `convert` 在 `docling/app/services/unified_converter.py:61`。 - 行为要点: - 引擎选择:传 `engine=word2markdown|pandoc|custom` 时走 `word2markdown.convert_any`;默认走 `Docling.DocumentConverter`。 - Markdown后处理:将 `:::` 提示块转换为 `!!!`(`docling/app/services/unified_converter.py:33`);HTML表格标签小写化(`docling/app/services/unified_converter.py:22`)。 - 标准化接口 - 新增统一转换接口 `POST /api/convert`,提供 file 或 source_url 二选一;可选 `export=markdown|html|json|doctags` 和 `engine`;支持 `save=true` 上传到 MinIO。 - 位置:`docling/app/server.py:1413`,统一返回结构在 `docling/app/server.py:1380` 和 `docling/app/server.py:1396`。 - 返回字段: - 不保存:`data = {encoding, content, name, media_type}` - 保存到 MinIO:`data = {encoding, name, media_type, minio_url, minio_presigned_url}` - 所有响应遵循 `{"code": 0|500, "msg": "...", "data": ...}`,实现于 `docling/app/server.py:1363` 和 `docling/app/server.py:1368`。 - 压缩包上传 - 新增 `POST /api/upload-archive`:上传 ZIP/TAR 包,后端解压→批量上传资源至 MinIO→重写 `*.md` 内部相对资源链接为 MinIO URL→返回列表。 - 位置:`docling/app/server.py:1413`(函数起始见 `docling/app/server.py:1414`)。 - 返回数据:`data={count, files:[{source, minio_url, minio_presigned_url, mappings}]}`。 - 用到了现有资产上传与链接重写能力:`_bulk_upload_assets` 在 `docling/app/server.py:1079`,`_rewrite_md_assets_to_minio` 在 `docling/app/server.py:885`。 - MinIO 配置与管理 - 运行时配置体 `RUNTIME_CONFIG` 保持不变,动态载入/保存配置、桶管理接口均已可用。相关路径: - 载入/保存配置:`docling/app/server.py:349`、`docling/app/server.py:369`、`docling/app/server.py:386` - 桶列表/创建:`docling/app/server.py:287`、`docling/app/server.py:304` - Markdown → DOCX/PDF - 保留原有 `POST /md/convert` 接口能力(页眉页脚、logo、封面、CSS 模板、目录 TOC、MinIO 存储等),详见 `docling/app/server.py:473`。 - 实际转换使用 `md_to_docx_bytes` 与 `md_to_pdf_bytes_with_renderer`(`docling/app/services/docling_adapter.py:105`、`docling/app/services/docling_adapter.py:676`),并已支持页眉页脚与封面元信息渲染(`normalize_html` 流水线在 `docling/app/services/docling_adapter.py:433` 到 `docling/app/services/docling_adapter.py:662`)。 **满足 PRD 的关键点** - 统一 POST + JSON + code/msg/data:新增 `api_convert` 与 `api_upload-archive`,所有返回体包含 `code/msg/data`(`docling/app/server.py:1363`、`docling/app/server.py:1368`、`docling/app/server.py:1413`)。 - 异常捕获:统一接口有 `try...except...` 并转换为标准结构(`docling/app/server.py:1408`、`docling/app/server.py:1476`),全局异常中间件已存在(`docling/app/server.py:1231`、`docling/app/server.py:1245`、`docling/app/server.py:1255`)。 - 转换特性: - 非 UTF-8 文档编码识别与归一化由 `word2markdown.convert_to_md` 内置实现(例如 `word2markdown/convert_to_md.py:309`、`word2markdown/convert_to_md.py:326`)。 - DOC/DOCX 单行单列表格识别为代码块由 `word2markdown/convert_to_md.py:196`、`word2markdown/convert_to_md.py:200` 完成。 - HTML `