feat(plan): 生成 label_backend 完整实施规划文档

Phase 0:research.md(10项技术决策,无需澄清项)
Phase 1:data-model.md(11张表+Redis结构),contracts/(8个模块API契约),quickstart.md(Docker Compose启动+流水线验证)
plan.md:宪章11条全部通过,项目结构确认
This commit is contained in:
wh
2026-04-09 12:27:16 +08:00
parent 0891ae188d
commit 4054a1133b
15 changed files with 1741 additions and 22 deletions

View File

@@ -0,0 +1,148 @@
# API 契约:认证与用户管理
**统一响应格式**:
- 成功:`{"code": "SUCCESS", "data": {...}}`
- 成功(无数据):`{"code": "SUCCESS", "data": null}`
- 失败:`{"code": "ERROR_CODE", "message": "描述"}`
- 分页成功:`{"code": "SUCCESS", "data": {"items": [...], "total": 100, "page": 1, "pageSize": 20}}`
---
## POST /api/auth/login
**权限**: 匿名
**描述**: 用户登录,返回会话凭证
**请求体**:
```json
{
"companyCode": "COMPANY_A",
"username": "zhangsan",
"password": "plaintext_password"
}
```
**成功响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"token": "550e8400-e29b-41d4-a716-446655440000",
"userId": 1,
"username": "zhangsan",
"role": "ANNOTATOR",
"expiresIn": 7200
}
}
```
**失败响应**:
- `401` `USER_NOT_FOUND`: 用户名或密码错误(不区分哪个错误,防止枚举)
- `403` `USER_DISABLED`: 账号已禁用
---
## POST /api/auth/logout
**权限**: 已登录Bearer Token
**描述**: 退出登录,立即删除 Redis 会话
**请求头**: `Authorization: Bearer {token}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
---
## GET /api/auth/me
**权限**: 已登录
**描述**: 获取当前用户信息
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"id": 1,
"username": "zhangsan",
"realName": "张三",
"role": "ANNOTATOR",
"companyId": 10,
"companyName": "测试公司"
}
}
```
---
## GET /api/users
**权限**: ADMIN
**描述**: 分页查询本公司用户列表
**查询参数**: `page`(默认 1`pageSize`(默认 20最大 100`role`(可选过滤)、`status`(可选过滤)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"items": [
{"id": 1, "username": "zhangsan", "realName": "张三", "role": "ANNOTATOR", "status": "ACTIVE"}
],
"total": 50,
"page": 1,
"pageSize": 20
}
}
```
---
## POST /api/users
**权限**: ADMIN
**描述**: 创建用户
**请求体**:
```json
{
"username": "lisi",
"password": "initial_password",
"realName": "李四",
"role": "ANNOTATOR"
}
```
**响应** `201`: `{"code": "SUCCESS", "data": {"id": 2, "username": "lisi", ...}}`
**失败**: `409` `USERNAME_EXISTS`: 用户名已存在
---
## PUT /api/users/{id}
**权限**: ADMIN
**描述**: 更新用户基本信息
**请求体**: `{"realName": "新姓名"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
---
## PUT /api/users/{id}/status
**权限**: ADMIN
**描述**: 启用或禁用账号,立即驱逐权限缓存
**请求体**: `{"status": "DISABLED"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
---
## PUT /api/users/{id}/role
**权限**: ADMIN
**描述**: 变更用户角色,立即驱逐权限缓存
**请求体**: `{"role": "REVIEWER"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `400` `INVALID_ROLE`: 角色值不合法

View File

@@ -0,0 +1,53 @@
# API 契约:系统配置
*所有接口需要 ADMIN 权限*
---
## GET /api/config
**描述**: 获取所有配置项(公司级配置 + 全局默认配置合并,公司级优先)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"items": [
{
"configKey": "prompt_extract_text",
"configValue": "请提取以下文本中的主语-谓语-宾语三元组...",
"description": "文本三元组提取 Prompt 模板",
"scope": "GLOBAL",
"updatedAt": "2026-04-09T00:00:00"
},
{
"configKey": "model_default",
"configValue": "glm-4-turbo",
"description": "默认 AI 辅助模型",
"scope": "COMPANY",
"updatedAt": "2026-04-09T09:00:00"
}
]
}
}
```
`scope` 字段:`GLOBAL`(来自全局默认)、`COMPANY`(来自公司级覆盖)
---
## PUT /api/config/{key}
**描述**: 更新单项配置(若公司级配置不存在则创建;若存在则覆盖)
**请求体**:
```json
{
"configValue": "glm-4-turbo",
"description": "升级到 GLM-4-Turbo 模型"
}
```
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `400` `UNKNOWN_CONFIG_KEY`: 未知的配置键(防止拼写错误创建无效配置)

View File

@@ -0,0 +1,113 @@
# API 契约:训练数据导出与微调
*所有接口需要 ADMIN 权限*
---
## GET /api/training/samples
**描述**: 分页查询已审批、可导出的训练样本
**查询参数**: `page``pageSize``sampleType`TEXT / IMAGE / VIDEO_FRAME可选`exported`true/false可选
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"items": [
{
"id": 1001,
"sampleType": "TEXT",
"status": "APPROVED",
"exportBatchId": null,
"sourceId": 50,
"createdAt": "2026-04-09T12:00:00"
}
],
"total": 500,
"page": 1,
"pageSize": 20
}
}
```
---
## POST /api/export/batch
**描述**: 创建导出批次,合并选定样本为 JSONL 并上传 RustFS
**请求体**:
```json
{
"sampleIds": [1001, 1002, 1003]
}
```
**成功响应** `201`:
```json
{
"code": "SUCCESS",
"data": {
"id": 10,
"batchUuid": "550e8400-e29b-41d4-a716-446655440000",
"sampleCount": 3,
"datasetFilePath": "export/550e8400.jsonl",
"finetuneStatus": "NOT_STARTED"
}
}
```
**失败**:
- `400` `INVALID_SAMPLES`: 部分样本不处于 APPROVED 状态
- `400` `EMPTY_SAMPLES`: sampleIds 为空
---
## POST /api/export/{batchId}/finetune
**描述**: 向 GLM AI 服务提交微调任务
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"glmJobId": "glm-finetune-abc123",
"finetuneStatus": "RUNNING"
}
}
```
**失败**: `409` `FINETUNE_ALREADY_STARTED`: 微调任务已提交
---
## GET /api/export/{batchId}/status
**描述**: 查询微调任务状态(向 AI 服务实时查询)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"batchId": 10,
"glmJobId": "glm-finetune-abc123",
"finetuneStatus": "RUNNING",
"progress": 45,
"errorMessage": null
}
}
```
---
## GET /api/export/list
**描述**: 分页查询所有导出批次
**查询参数**: `page``pageSize`
**响应** `200`: 批次列表(含 finetuneStatus、sampleCount、createdAt 等字段)

View File

@@ -0,0 +1,97 @@
# API 契约:提取阶段标注工作台
---
## GET /api/extraction/{taskId}
**权限**: ANNOTATOR且为任务持有者
**描述**: 获取当前提取结果(含 AI 预标注候选,供人工编辑)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"taskId": 101,
"sourceType": "TEXT",
"sourceFilePath": "text/202604/50.txt",
"isFinal": false,
"resultJson": {
"items": [
{
"subject": "北京",
"predicate": "是...首都",
"object": "中国",
"sourceText": "北京是中国的首都",
"startOffset": 0,
"endOffset": 8
}
]
}
}
}
```
---
## PUT /api/extraction/{taskId}
**权限**: ANNOTATOR且为任务持有者
**描述**: 更新提取结果(**整体 JSONB 覆盖PUT 语义,禁止局部 PATCH**
**请求体**:
```json
{
"items": [
{
"subject": "北京",
"predicate": "是...首都",
"object": "中国",
"sourceText": "北京是中国的首都",
"startOffset": 0,
"endOffset": 8
}
]
}
```
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `400` `INVALID_JSON`: 提交的 JSON 格式不合法
---
## POST /api/extraction/{taskId}/submit
**权限**: ANNOTATOR且为任务持有者
**描述**: 提交提取结果,任务状态 IN_PROGRESS → SUBMITTED进入审批队列
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `409` `INVALID_STATE`: 任务当前状态不允许提交
---
## POST /api/extraction/{taskId}/approve
**权限**: REVIEWER
**描述**: 审批通过。**两阶段操作**
1. 同步(同一事务):`annotation_result.is_final = true`,任务状态 SUBMITTED → APPROVED写任务历史
2. 异步事务提交后AI 生成候选问答对 → 写 training_dataset → 创建 QA_GENERATION 任务 → source_data 状态推进
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `403` `SELF_REVIEW_FORBIDDEN`: 不允许审批自己提交的任务
- `409` `INVALID_STATE`: 任务状态不为 SUBMITTED
---
## POST /api/extraction/{taskId}/reject
**权限**: REVIEWER
**描述**: 驳回提取结果,任务状态 SUBMITTED → REJECTED标注员可重领
**请求体**: `{"reason": "三元组边界不准确,请重新标注"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `403` `SELF_REVIEW_FORBIDDEN`: 不允许驳回自己提交的任务
- `409` `INVALID_STATE`: 任务状态不为 SUBMITTED
- `400` `REASON_REQUIRED`: 驳回原因不能为空

View File

@@ -0,0 +1,83 @@
# API 契约:问答生成阶段
---
## GET /api/qa/{taskId}
**权限**: ANNOTATOR且为任务持有者
**描述**: 获取候选问答对列表(由提取阶段审批触发 AI 生成)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"taskId": 202,
"sourceType": "TEXT",
"items": [
{
"id": 1001,
"question": "北京是哪个国家的首都?",
"answer": "中国",
"status": "PENDING_REVIEW"
}
]
}
}
```
---
## PUT /api/qa/{taskId}
**权限**: ANNOTATOR且为任务持有者
**描述**: 修改问答对(**整体覆盖PUT 语义**,每次提交包含完整 items 数组)
**请求体**:
```json
{
"items": [
{
"question": "北京是哪个国家的首都?",
"answer": "中国。北京自1949年起成为中华人民共和国的首都。"
}
]
}
```
**响应** `200`: `{"code": "SUCCESS", "data": null}`
---
## POST /api/qa/{taskId}/submit
**权限**: ANNOTATOR且为任务持有者
**描述**: 提交问答对,任务状态 IN_PROGRESS → SUBMITTED
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `409` `INVALID_STATE`: 任务当前状态不允许提交
---
## POST /api/qa/{taskId}/approve
**权限**: REVIEWER
**描述**: 审批通过。同一事务中:先校验任务 → training_dataset 状态 → 任务状态 SUBMITTED → APPROVED → source_data 状态 → 写任务历史
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `403` `SELF_REVIEW_FORBIDDEN`: 不允许审批自己提交的任务
- `409` `INVALID_STATE`: 任务状态不为 SUBMITTED
---
## POST /api/qa/{taskId}/reject
**权限**: REVIEWER
**描述**: 驳回问答对,删除候选记录,任务状态 SUBMITTED → REJECTED
**请求体**: `{"reason": "问题描述不准确,请修改"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `403` `SELF_REVIEW_FORBIDDEN`: 不允许驳回自己提交的任务
- `400` `REASON_REQUIRED`: 驳回原因不能为空

View File

@@ -0,0 +1,96 @@
# API 契约:资料管理
---
## POST /api/source/upload
**权限**: UPLOADER
**描述**: 上传文件,创建 source_data 记录,文件字节流写入 RustFS
**请求**: `multipart/form-data`,字段:`file`(必填)、`dataType`TEXT / IMAGE / VIDEO
**响应** `201`:
```json
{
"code": "SUCCESS",
"data": {
"id": 50,
"fileName": "document.txt",
"dataType": "TEXT",
"fileSize": 204800,
"status": "PENDING",
"createdAt": "2026-04-09T10:00:00"
}
}
```
**失败**:
- `400` `INVALID_TYPE`: 不支持的资料类型
- `400` `FILE_EMPTY`: 文件为空
---
## GET /api/source/list
**权限**: UPLOADER
**描述**: 分页查询资料列表。UPLOADER 只见自己上传的资料ADMIN 见本公司全部资料
**查询参数**: `page`(默认 1`pageSize`(默认 20`dataType`(可选)、`status`(可选)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"items": [
{
"id": 50,
"fileName": "document.txt",
"dataType": "TEXT",
"status": "PENDING",
"uploaderId": 1,
"createdAt": "2026-04-09T10:00:00"
}
],
"total": 120,
"page": 1,
"pageSize": 20
}
}
```
---
## GET /api/source/{id}
**权限**: UPLOADER
**描述**: 查看资料详情,含 RustFS 预签名临时下载链接(有效期 15 分钟)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"id": 50,
"dataType": "TEXT",
"fileName": "document.txt",
"fileSize": 204800,
"status": "EXTRACTING",
"presignedUrl": "https://rustfs.example.com/...",
"parentSourceId": null,
"createdAt": "2026-04-09T10:00:00"
}
}
```
---
## DELETE /api/source/{id}
**权限**: ADMIN
**描述**: 删除资料(同时删除 RustFS 文件及元数据)
**前置条件**: 资料状态为 PENDING不允许删除已进入流水线的资料
**响应** `204`: 无响应体
**失败**: `409` `SOURCE_IN_PIPELINE`: 资料已进入标注流程,不可删除

View File

@@ -0,0 +1,150 @@
# API 契约:任务管理
---
## GET /api/tasks/pool
**权限**: ANNOTATOR
**描述**: 查看可领取任务池。角色过滤规则:
- ANNOTATOR仅返回 EXTRACTION 阶段、status=UNCLAIMED 的任务
- REVIEWER/ADMIN仅返回 SUBMITTED 状态(待审批队列)的任务
**查询参数**: `page`(默认 1`pageSize`(默认 20
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"items": [
{
"id": 101,
"sourceId": 50,
"sourceType": "TEXT",
"phase": "EXTRACTION",
"status": "UNCLAIMED",
"createdAt": "2026-04-09T10:00:00"
}
],
"total": 30,
"page": 1,
"pageSize": 20
}
}
```
---
## GET /api/tasks/pending-review
**权限**: REVIEWER
**描述**: REVIEWER 专属审批入口,查看 status=SUBMITTED 的任务列表
**查询参数**: `page``pageSize``phase`可选EXTRACTION / QA_GENERATION
**响应**: 同 `/api/tasks/pool` 结构
---
## POST /api/tasks/{id}/claim
**权限**: ANNOTATOR
**描述**: 领取任务双重并发保障Redis SET NX + DB 乐观约束)
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `409` `TASK_CLAIMED`: 任务已被他人领取
- `404` `TASK_NOT_FOUND`: 任务不存在
---
## POST /api/tasks/{id}/unclaim
**权限**: ANNOTATOR且为任务持有者
**描述**: 放弃任务退回任务池status: IN_PROGRESS → UNCLAIMED
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `403` `NOT_TASK_OWNER`: 非任务持有者
---
## GET /api/tasks/mine
**权限**: ANNOTATOR
**描述**: 查询当前用户领取的任务(含 IN_PROGRESS、SUBMITTED、REJECTED 三种状态)
**查询参数**: `page``pageSize``status`(可选过滤)
**响应**: 同任务列表结构,含 `rejectReason` 字段REJECTED 状态时非空)
---
## POST /api/tasks/{id}/reclaim
**权限**: ANNOTATOR
**描述**: 重领被驳回的任务status 必须为 REJECTED 且 claimedBy = 当前用户,流转 REJECTED → IN_PROGRESS
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**:
- `403` `NOT_TASK_OWNER`: 非原持有者
- `409` `INVALID_STATE`: 任务状态不为 REJECTED
---
## GET /api/tasks/{id}
**权限**: ANNOTATOR
**描述**: 查看任务详情(含驳回原因、历史记录摘要)
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"id": 101,
"sourceId": 50,
"phase": "EXTRACTION",
"status": "IN_PROGRESS",
"claimedBy": 1,
"claimedAt": "2026-04-09T10:05:00",
"rejectReason": null,
"historyCount": 2
}
}
```
---
## GET /api/tasks
**权限**: ADMIN
**描述**: 查询全部任务(支持过滤,分页)
**查询参数**: `page``pageSize``phase``status``claimedBy``sourceId`
---
## PUT /api/tasks/{id}/reassign
**权限**: ADMIN
**描述**: 强制转移任务归属status 保持 IN_PROGRESS仅 claimedBy 变更)
**请求体**: `{"newOwnerId": 5, "reason": "原持有者长期未操作"}`
**响应** `200`: `{"code": "SUCCESS", "data": null}`
---
## POST /api/tasks
**权限**: ADMIN
**描述**: 为指定资料创建 EXTRACTION 任务
**请求体**:
```json
{
"sourceId": 50,
"taskType": "AI_ASSISTED",
"aiModel": "glm-4"
}
```
**响应** `201`: `{"code": "SUCCESS", "data": {"id": 101, ...}}`

View File

@@ -0,0 +1,87 @@
# API 契约:视频处理
---
## POST /api/video/process
**权限**: ADMIN
**描述**: 为已上传的视频资料创建异步处理任务
**请求体**:
```json
{
"sourceId": 50,
"jobType": "FRAME_EXTRACT",
"params": {
"frameInterval": 30,
"mode": "FRAME"
}
}
```
jobType 可选值:`FRAME_EXTRACT`(帧提取)、`VIDEO_TO_TEXT`(片段转文字)
**响应** `201`:
```json
{
"code": "SUCCESS",
"data": {
"jobId": 200,
"sourceId": 50,
"jobType": "FRAME_EXTRACT",
"status": "PENDING"
}
}
```
---
## GET /api/video/jobs/{jobId}
**权限**: ADMIN
**描述**: 查询视频处理任务状态
**响应** `200`:
```json
{
"code": "SUCCESS",
"data": {
"id": 200,
"status": "RUNNING",
"processedUnits": 15,
"totalUnits": 50,
"retryCount": 0,
"errorMessage": null,
"startedAt": "2026-04-09T10:05:00"
}
}
```
---
## POST /api/video/jobs/{jobId}/reset
**权限**: ADMIN
**描述**: 手动重置 FAILED 状态的任务为 PENDING允许重新触发FAILED → PENDING 不在自动状态机中)
**响应** `200`: `{"code": "SUCCESS", "data": null}`
**失败**: `409` `INVALID_STATE`: 任务状态不为 FAILED
---
## POST /api/video/callback内部接口
**权限**: AI 服务内部调用IP 白名单 / 服务密钥)
**描述**: AI 服务回调,通知视频处理结果(幂等:重复成功回调静默忽略)
**请求体**:
```json
{
"jobId": 200,
"success": true,
"outputPath": "frames/50/",
"errorMessage": null
}
```
**响应** `200`: `{"code": "SUCCESS", "data": null}`