Files
label_backend/specs/001-label-backend-spec/data-model.md
wh 4054a1133b feat(plan): 生成 label_backend 完整实施规划文档
Phase 0:research.md(10项技术决策,无需澄清项)
Phase 1:data-model.md(11张表+Redis结构),contracts/(8个模块API契约),quickstart.md(Docker Compose启动+流水线验证)
plan.md:宪章11条全部通过,项目结构确认
2026-04-09 12:27:16 +08:00

356 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 数据模型label_backend
**日期**: 2026-04-09
**分支**: `001-label-backend-spec`
---
## 实体关系概览
```
sys_company ─┬─ sys_user (company_id FK)
├─ source_data (company_id FK)
│ └─ source_data (parent_source_id 自引用,视频溯源链)
├─ annotation_task (company_id FK)
│ ├─ annotation_result (task_id FK)
│ └─ annotation_task_history (task_id FK)
├─ training_dataset (company_id FK)
├─ export_batch (company_id FK)
├─ sys_config (company_id FK可为 NULL 表示全局默认)
├─ sys_operation_log (company_id FK)
└─ video_process_job (company_id FK)
```
**多租户规则**:除 `sys_company` 本身外,所有业务表均包含 `company_id NOT NULL`。查询时由 `TenantLineInnerInterceptor` 自动注入 `WHERE company_id = ?`。唯一例外:`sys_config` 允许 `company_id = NULL` 表示全局默认配置。
---
## 实体详情
### 1. sys_company — 公司(租户)
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | 自增主键 |
| company_name | VARCHAR(100) | NOT NULL UNIQUE | 公司名称 |
| company_code | VARCHAR(50) | NOT NULL UNIQUE | 公司编码 |
| status | VARCHAR(10) | NOT NULL DEFAULT 'ACTIVE' | ACTIVE / DISABLED |
| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | |
| updated_at | TIMESTAMP | NOT NULL DEFAULT NOW() | |
**状态**: 无状态机(仅 ACTIVE/DISABLED 标志)
---
### 2. sys_user — 用户
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | 租户隔离键 |
| username | VARCHAR(50) | NOT NULL | 同公司内唯一 |
| password_hash | VARCHAR(255) | NOT NULL | BCrypt 强度≥10禁止序列化到响应 |
| real_name | VARCHAR(50) | — | |
| role | VARCHAR(20) | NOT NULL | UPLOADER / ANNOTATOR / REVIEWER / ADMIN |
| status | VARCHAR(10) | NOT NULL DEFAULT 'ACTIVE' | ACTIVE / DISABLED |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**约束**: `UNIQUE(company_id, username)`
**索引**: `(company_id)`
**角色继承**: ADMIN ⊃ REVIEWER ⊃ ANNOTATOR ⊃ UPLOADER由 Shiro Realm 的 addInheritedRoles() 实现)
---
### 3. source_data — 原始资料
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| uploader_id | BIGINT | FK→sys_user | |
| data_type | VARCHAR(20) | NOT NULL | TEXT / IMAGE / VIDEO |
| file_path | VARCHAR(500) | NOT NULL | RustFS 对象路径 |
| file_name | VARCHAR(255) | NOT NULL | 原始文件名 |
| file_size | BIGINT | — | 字节数 |
| bucket_name | VARCHAR(100) | NOT NULL | RustFS 桶名 |
| parent_source_id | BIGINT | FK→source_data | 视频片段转文本时指向原视频 |
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING' | 见状态机 |
| reject_reason | TEXT | — | 保留字段(当前无 REJECTED 状态) |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**索引**: `(company_id)``(company_id, status)``(parent_source_id)`
**状态机**:
```
PENDING → EXTRACTING直接上传的文本/图片)
PENDING → PREPROCESSING视频上传后
PREPROCESSING → PENDING视频预处理完成后进入标注流程
EXTRACTING → QA_REVIEW提取任务审批通过后
QA_REVIEW → APPROVEDQA 任务审批通过后,整条流水线完成)
```
*注source_data 无 REJECTED 状态。QA 阶段驳回作用于 annotation_task→REJECTEDsource_data 保持 QA_REVIEW 不变。*
---
### 4. annotation_task — 标注任务
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| source_id | BIGINT | NOT NULL FK→source_data | |
| phase | VARCHAR(20) | NOT NULL | EXTRACTION / QA_GENERATION |
| task_type | VARCHAR(20) | NOT NULL | AI_ASSISTED / MANUAL |
| ai_model | VARCHAR(50) | — | 使用的 AI 模型 |
| video_unit_type | VARCHAR(20) | — | FRAME视频帧模式/ NULL |
| video_unit_info | JSONB | — | `{frame_index, time_sec, frame_path}` |
| claimed_by | BIGINT | FK→sys_user | 当前持有者 |
| claimed_at | TIMESTAMP | — | |
| status | VARCHAR(20) | NOT NULL DEFAULT 'UNCLAIMED' | 见状态机 |
| reject_reason | TEXT | — | 驳回原因 |
| submitted_at | TIMESTAMP | — | |
| completed_at | TIMESTAMP | — | |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**索引**: `(company_id)``(company_id, phase, status)`(任务池查询)、`(claimed_by, status)`(我的任务)
**状态机**:
```
UNCLAIMED → IN_PROGRESS领取
IN_PROGRESS → SUBMITTED提交
IN_PROGRESS → UNCLAIMED放弃
IN_PROGRESS → IN_PROGRESSADMIN 强制转移,持有人变更,状态不变)
SUBMITTED → APPROVED审批通过
SUBMITTED → REJECTED审批驳回
REJECTED → IN_PROGRESS标注员重领
```
**并发控制**: 领取时双重保障:① Redis `SET NX task:claim:{taskId}` TTL 30s② DB `UPDATE ... WHERE status='UNCLAIMED'` 影响行数为 0 时返回错误
---
### 5. annotation_result — 标注结果(提取阶段)
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| task_id | BIGINT | NOT NULL FK→annotation_task | |
| result_json | JSONB | NOT NULL | 整体覆盖,禁止局部 PATCH |
| is_final | BOOLEAN | NOT NULL DEFAULT FALSE | 审批通过后置 TRUE |
| submitted_by | BIGINT | FK→sys_user | |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**result_json 结构**(文本三元组示例):
```json
{
"items": [
{
"subject": "北京",
"predicate": "是...首都",
"object": "中国",
"source_text": "北京是中国的首都",
"start_offset": 0,
"end_offset": 8
}
]
}
```
**result_json 结构**(图片四元组示例):
```json
{
"items": [
{
"subject": "猫",
"relation": "坐在",
"object": "椅子",
"modifier": "白色的",
"bbox": [100, 200, 300, 400],
"crop_path": "crops/123/0.jpg"
}
]
}
```
**索引**: `(task_id)``(company_id, is_final)`
---
### 6. training_dataset — 训练样本
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| task_id | BIGINT | NOT NULL FK→annotation_task | |
| source_id | BIGINT | NOT NULL FK→source_data | |
| extraction_result_id | BIGINT | NOT NULL FK→annotation_result | |
| sample_type | VARCHAR(20) | NOT NULL | TEXT / IMAGE / VIDEO_FRAME |
| glm_format_json | JSONB | NOT NULL | GLM 微调格式 |
| export_batch_id | VARCHAR(50) | — | NULL 表示未导出 |
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING_REVIEW' | 见状态机 |
| reject_reason | TEXT | — | |
| reviewed_by | BIGINT | FK→sys_user | |
| exported_at | TIMESTAMP | — | |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**状态机**:
```
PENDING_REVIEW → APPROVEDQA 审批通过)
PENDING_REVIEW → REJECTEDQA 审批驳回)
REJECTED → PENDING_REVIEW标注员修改后重提
```
**glm_format_json 结构**:
```json
{
"conversations": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"source_type": "TEXT"
}
```
**索引**: `(company_id)``(company_id, status)``(export_batch_id)`
---
### 7. export_batch — 导出批次
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| batch_uuid | VARCHAR(50) | NOT NULL UNIQUE | 批次标识符 |
| dataset_file_path | VARCHAR(500) | — | RustFS JSONL 路径 |
| sample_count | INT | NOT NULL DEFAULT 0 | |
| glm_job_id | VARCHAR(100) | — | 微调任务 ID |
| finetune_status | VARCHAR(20) | NOT NULL DEFAULT 'NOT_STARTED' | 见状态 |
| error_message | TEXT | — | |
| created_by | BIGINT | FK→sys_user | |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**finetune_status 值**: NOT_STARTED / RUNNING / SUCCESS / FAILED
**索引**: `(company_id)`
---
### 8. sys_config — 系统配置
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | FK→sys_company可 NULL | NULL = 全局默认配置 |
| config_key | VARCHAR(100) | NOT NULL | |
| config_value | TEXT | NOT NULL | |
| description | TEXT | — | |
| updated_by | BIGINT | FK→sys_user | |
| updated_at | TIMESTAMP | NOT NULL | |
**约束**: `UNIQUE(company_id, config_key)`
**查询规则**: 先按 `(companyId, configKey)` 查;未命中则按 `(NULL, configKey)` 查全局默认。
**预置全局配置键**:
- `prompt_extract_text``prompt_extract_image``prompt_video_to_text`
- `prompt_qa_gen_text``prompt_qa_gen_image`
- `model_default`(默认:`glm-4`
- `video_frame_interval`(默认:`30`
- `token_ttl_seconds`(默认:`7200`
- `glm_api_base_url`
---
### 9. sys_operation_log — 操作审计日志
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | FK→sys_company | |
| operator_id | BIGINT | FK→sys_user | 登录失败时可为 NULL |
| operator_name | VARCHAR(50) | NOT NULL | **操作时用户名快照**(不随改名变化) |
| operation_type | VARCHAR(50) | NOT NULL | 见枚举列表 |
| target_type | VARCHAR(30) | — | |
| target_id | BIGINT | — | |
| detail | JSONB | — | 补充信息 |
| ip_address | VARCHAR(50) | — | |
| result | VARCHAR(10) | NOT NULL | SUCCESS / FAIL |
| error_message | TEXT | — | |
| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | 分区键 |
**只追加**:应用层禁止 UPDATE/DELETE建议 DB 层添加触发器强制执行
**分区**:按 `created_at` Range 分区,以月为单位(`sys_operation_log_YYYY_MM`
**operation_type 枚举**:
`USER_LOGIN``USER_LOGOUT``USER_CREATE``USER_UPDATE``USER_DISABLE``USER_ROLE_CHANGE``SOURCE_UPLOAD``SOURCE_DELETE``TASK_CREATE``TASK_CLAIM``TASK_UNCLAIM``TASK_SUBMIT``EXTRACTION_APPROVE``EXTRACTION_REJECT``QA_APPROVE``QA_REJECT``TASK_REASSIGN``EXPORT_CREATE``FINETUNE_START``CONFIG_UPDATE``VIDEO_JOB_RESET`
---
### 10. annotation_task_history — 任务流转历史
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| task_id | BIGINT | NOT NULL FK→annotation_task | |
| from_status | VARCHAR(20) | — | 任务初建时为 NULL |
| to_status | VARCHAR(20) | NOT NULL | |
| operator_id | BIGINT | NOT NULL FK→sys_user | |
| operator_role | VARCHAR(20) | NOT NULL | **操作时角色快照** |
| note | TEXT | — | 驳回原因、转移说明等 |
| created_at | TIMESTAMP | NOT NULL | |
**只追加**:每次 annotation_task.status 变更时同步插入,与业务操作在同一事务中
**索引**: `(task_id)`
---
### 11. video_process_job — 视频异步处理任务
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGSERIAL | PK | |
| company_id | BIGINT | NOT NULL FK→sys_company | |
| source_id | BIGINT | NOT NULL FK→source_data | |
| job_type | VARCHAR(20) | NOT NULL | FRAME_EXTRACT / VIDEO_TO_TEXT |
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING' | 见状态机 |
| params | JSONB | NOT NULL | 处理参数 |
| total_units | INT | — | 总帧数/片段数 |
| processed_units | INT | NOT NULL DEFAULT 0 | |
| output_path | VARCHAR(500) | — | |
| retry_count | INT | NOT NULL DEFAULT 0 | |
| max_retries | INT | NOT NULL DEFAULT 3 | |
| error_message | TEXT | — | |
| started_at / completed_at | TIMESTAMP | — | |
| created_at / updated_at | TIMESTAMP | NOT NULL | |
**状态机**:
```
PENDING → RUNNING
RUNNING → SUCCESS处理成功
RUNNING → RETRYING失败且 retry_count < max_retries
RUNNING → FAILED失败且 retry_count >= max_retries
RETRYING → RUNNINGAI 服务自动重试)
RETRYING → FAILED超过最大重试次数
```
*FAILED → PENDING由 ADMIN 手动触发接口,不在状态机自动流转中*
**幂等规则**: 回调时若 `status == SUCCESS` 则静默忽略,不执行任何 DB 写入
**索引**: `(source_id)``(status)`
---
## Redis 数据结构
| Key 模式 | 类型 | TTL | 内容 |
|---------|------|-----|------|
| `token:{uuid}` | Hash | 2h滑动 | `{userId, role, companyId, username}` |
| `user:perm:{userId}` | String | 5min | 用户角色字符串 |
| `task:claim:{taskId}` | String | 30s | 持有者 userId |
*禁止在上述三类命名空间之外自造 Key 用于认证、权限或锁目的。*