356 lines
13 KiB
Markdown
356 lines
13 KiB
Markdown
|
|
# 数据模型:label_backend
|
|||
|
|
|
|||
|
|
**日期**: 2026-04-09
|
|||
|
|
**分支**: `001-label-backend-spec`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 实体关系概览
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
sys_company ─┬─ sys_user (company_id FK)
|
|||
|
|
├─ source_data (company_id FK)
|
|||
|
|
│ └─ source_data (parent_source_id 自引用,视频溯源链)
|
|||
|
|
├─ annotation_task (company_id FK)
|
|||
|
|
│ ├─ annotation_result (task_id FK)
|
|||
|
|
│ └─ annotation_task_history (task_id FK)
|
|||
|
|
├─ training_dataset (company_id FK)
|
|||
|
|
├─ export_batch (company_id FK)
|
|||
|
|
├─ sys_config (company_id FK,可为 NULL 表示全局默认)
|
|||
|
|
├─ sys_operation_log (company_id FK)
|
|||
|
|
└─ video_process_job (company_id FK)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**多租户规则**:除 `sys_company` 本身外,所有业务表均包含 `company_id NOT NULL`。查询时由 `TenantLineInnerInterceptor` 自动注入 `WHERE company_id = ?`。唯一例外:`sys_config` 允许 `company_id = NULL` 表示全局默认配置。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 实体详情
|
|||
|
|
|
|||
|
|
### 1. sys_company — 公司(租户)
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | 自增主键 |
|
|||
|
|
| company_name | VARCHAR(100) | NOT NULL UNIQUE | 公司名称 |
|
|||
|
|
| company_code | VARCHAR(50) | NOT NULL UNIQUE | 公司编码 |
|
|||
|
|
| status | VARCHAR(10) | NOT NULL DEFAULT 'ACTIVE' | ACTIVE / DISABLED |
|
|||
|
|
| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | |
|
|||
|
|
| updated_at | TIMESTAMP | NOT NULL DEFAULT NOW() | |
|
|||
|
|
|
|||
|
|
**状态**: 无状态机(仅 ACTIVE/DISABLED 标志)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 2. sys_user — 用户
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | 租户隔离键 |
|
|||
|
|
| username | VARCHAR(50) | NOT NULL | 同公司内唯一 |
|
|||
|
|
| password_hash | VARCHAR(255) | NOT NULL | BCrypt 强度≥10,禁止序列化到响应 |
|
|||
|
|
| real_name | VARCHAR(50) | — | |
|
|||
|
|
| role | VARCHAR(20) | NOT NULL | UPLOADER / ANNOTATOR / REVIEWER / ADMIN |
|
|||
|
|
| status | VARCHAR(10) | NOT NULL DEFAULT 'ACTIVE' | ACTIVE / DISABLED |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**约束**: `UNIQUE(company_id, username)`
|
|||
|
|
**索引**: `(company_id)`
|
|||
|
|
**角色继承**: ADMIN ⊃ REVIEWER ⊃ ANNOTATOR ⊃ UPLOADER(由 Shiro Realm 的 addInheritedRoles() 实现)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 3. source_data — 原始资料
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| uploader_id | BIGINT | FK→sys_user | |
|
|||
|
|
| data_type | VARCHAR(20) | NOT NULL | TEXT / IMAGE / VIDEO |
|
|||
|
|
| file_path | VARCHAR(500) | NOT NULL | RustFS 对象路径 |
|
|||
|
|
| file_name | VARCHAR(255) | NOT NULL | 原始文件名 |
|
|||
|
|
| file_size | BIGINT | — | 字节数 |
|
|||
|
|
| bucket_name | VARCHAR(100) | NOT NULL | RustFS 桶名 |
|
|||
|
|
| parent_source_id | BIGINT | FK→source_data | 视频片段转文本时指向原视频 |
|
|||
|
|
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING' | 见状态机 |
|
|||
|
|
| reject_reason | TEXT | — | 保留字段(当前无 REJECTED 状态) |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**索引**: `(company_id)`、`(company_id, status)`、`(parent_source_id)`
|
|||
|
|
|
|||
|
|
**状态机**:
|
|||
|
|
```
|
|||
|
|
PENDING → EXTRACTING(直接上传的文本/图片)
|
|||
|
|
PENDING → PREPROCESSING(视频上传后)
|
|||
|
|
PREPROCESSING → PENDING(视频预处理完成后进入标注流程)
|
|||
|
|
EXTRACTING → QA_REVIEW(提取任务审批通过后)
|
|||
|
|
QA_REVIEW → APPROVED(QA 任务审批通过后,整条流水线完成)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
*注:source_data 无 REJECTED 状态。QA 阶段驳回作用于 annotation_task(→REJECTED),source_data 保持 QA_REVIEW 不变。*
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 4. annotation_task — 标注任务
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| source_id | BIGINT | NOT NULL FK→source_data | |
|
|||
|
|
| phase | VARCHAR(20) | NOT NULL | EXTRACTION / QA_GENERATION |
|
|||
|
|
| task_type | VARCHAR(20) | NOT NULL | AI_ASSISTED / MANUAL |
|
|||
|
|
| ai_model | VARCHAR(50) | — | 使用的 AI 模型 |
|
|||
|
|
| video_unit_type | VARCHAR(20) | — | FRAME(视频帧模式)/ NULL |
|
|||
|
|
| video_unit_info | JSONB | — | `{frame_index, time_sec, frame_path}` |
|
|||
|
|
| claimed_by | BIGINT | FK→sys_user | 当前持有者 |
|
|||
|
|
| claimed_at | TIMESTAMP | — | |
|
|||
|
|
| status | VARCHAR(20) | NOT NULL DEFAULT 'UNCLAIMED' | 见状态机 |
|
|||
|
|
| reject_reason | TEXT | — | 驳回原因 |
|
|||
|
|
| submitted_at | TIMESTAMP | — | |
|
|||
|
|
| completed_at | TIMESTAMP | — | |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**索引**: `(company_id)`、`(company_id, phase, status)`(任务池查询)、`(claimed_by, status)`(我的任务)
|
|||
|
|
|
|||
|
|
**状态机**:
|
|||
|
|
```
|
|||
|
|
UNCLAIMED → IN_PROGRESS(领取)
|
|||
|
|
IN_PROGRESS → SUBMITTED(提交)
|
|||
|
|
IN_PROGRESS → UNCLAIMED(放弃)
|
|||
|
|
IN_PROGRESS → IN_PROGRESS(ADMIN 强制转移,持有人变更,状态不变)
|
|||
|
|
SUBMITTED → APPROVED(审批通过)
|
|||
|
|
SUBMITTED → REJECTED(审批驳回)
|
|||
|
|
REJECTED → IN_PROGRESS(标注员重领)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**并发控制**: 领取时双重保障:① Redis `SET NX task:claim:{taskId}` TTL 30s;② DB `UPDATE ... WHERE status='UNCLAIMED'` 影响行数为 0 时返回错误
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 5. annotation_result — 标注结果(提取阶段)
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| task_id | BIGINT | NOT NULL FK→annotation_task | |
|
|||
|
|
| result_json | JSONB | NOT NULL | 整体覆盖,禁止局部 PATCH |
|
|||
|
|
| is_final | BOOLEAN | NOT NULL DEFAULT FALSE | 审批通过后置 TRUE |
|
|||
|
|
| submitted_by | BIGINT | FK→sys_user | |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**result_json 结构**(文本三元组示例):
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"items": [
|
|||
|
|
{
|
|||
|
|
"subject": "北京",
|
|||
|
|
"predicate": "是...首都",
|
|||
|
|
"object": "中国",
|
|||
|
|
"source_text": "北京是中国的首都",
|
|||
|
|
"start_offset": 0,
|
|||
|
|
"end_offset": 8
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**result_json 结构**(图片四元组示例):
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"items": [
|
|||
|
|
{
|
|||
|
|
"subject": "猫",
|
|||
|
|
"relation": "坐在",
|
|||
|
|
"object": "椅子",
|
|||
|
|
"modifier": "白色的",
|
|||
|
|
"bbox": [100, 200, 300, 400],
|
|||
|
|
"crop_path": "crops/123/0.jpg"
|
|||
|
|
}
|
|||
|
|
]
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**索引**: `(task_id)`、`(company_id, is_final)`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 6. training_dataset — 训练样本
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| task_id | BIGINT | NOT NULL FK→annotation_task | |
|
|||
|
|
| source_id | BIGINT | NOT NULL FK→source_data | |
|
|||
|
|
| extraction_result_id | BIGINT | NOT NULL FK→annotation_result | |
|
|||
|
|
| sample_type | VARCHAR(20) | NOT NULL | TEXT / IMAGE / VIDEO_FRAME |
|
|||
|
|
| glm_format_json | JSONB | NOT NULL | GLM 微调格式 |
|
|||
|
|
| export_batch_id | VARCHAR(50) | — | NULL 表示未导出 |
|
|||
|
|
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING_REVIEW' | 见状态机 |
|
|||
|
|
| reject_reason | TEXT | — | |
|
|||
|
|
| reviewed_by | BIGINT | FK→sys_user | |
|
|||
|
|
| exported_at | TIMESTAMP | — | |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**状态机**:
|
|||
|
|
```
|
|||
|
|
PENDING_REVIEW → APPROVED(QA 审批通过)
|
|||
|
|
PENDING_REVIEW → REJECTED(QA 审批驳回)
|
|||
|
|
REJECTED → PENDING_REVIEW(标注员修改后重提)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**glm_format_json 结构**:
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"conversations": [
|
|||
|
|
{"role": "user", "content": "..."},
|
|||
|
|
{"role": "assistant", "content": "..."}
|
|||
|
|
],
|
|||
|
|
"source_type": "TEXT"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**索引**: `(company_id)`、`(company_id, status)`、`(export_batch_id)`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 7. export_batch — 导出批次
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| batch_uuid | VARCHAR(50) | NOT NULL UNIQUE | 批次标识符 |
|
|||
|
|
| dataset_file_path | VARCHAR(500) | — | RustFS JSONL 路径 |
|
|||
|
|
| sample_count | INT | NOT NULL DEFAULT 0 | |
|
|||
|
|
| glm_job_id | VARCHAR(100) | — | 微调任务 ID |
|
|||
|
|
| finetune_status | VARCHAR(20) | NOT NULL DEFAULT 'NOT_STARTED' | 见状态 |
|
|||
|
|
| error_message | TEXT | — | |
|
|||
|
|
| created_by | BIGINT | FK→sys_user | |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**finetune_status 值**: NOT_STARTED / RUNNING / SUCCESS / FAILED
|
|||
|
|
|
|||
|
|
**索引**: `(company_id)`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 8. sys_config — 系统配置
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | FK→sys_company,可 NULL | NULL = 全局默认配置 |
|
|||
|
|
| config_key | VARCHAR(100) | NOT NULL | |
|
|||
|
|
| config_value | TEXT | NOT NULL | |
|
|||
|
|
| description | TEXT | — | |
|
|||
|
|
| updated_by | BIGINT | FK→sys_user | |
|
|||
|
|
| updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**约束**: `UNIQUE(company_id, config_key)`
|
|||
|
|
**查询规则**: 先按 `(companyId, configKey)` 查;未命中则按 `(NULL, configKey)` 查全局默认。
|
|||
|
|
|
|||
|
|
**预置全局配置键**:
|
|||
|
|
- `prompt_extract_text`、`prompt_extract_image`、`prompt_video_to_text`
|
|||
|
|
- `prompt_qa_gen_text`、`prompt_qa_gen_image`
|
|||
|
|
- `model_default`(默认:`glm-4`)
|
|||
|
|
- `video_frame_interval`(默认:`30`)
|
|||
|
|
- `token_ttl_seconds`(默认:`7200`)
|
|||
|
|
- `glm_api_base_url`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 9. sys_operation_log — 操作审计日志
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | FK→sys_company | |
|
|||
|
|
| operator_id | BIGINT | FK→sys_user | 登录失败时可为 NULL |
|
|||
|
|
| operator_name | VARCHAR(50) | NOT NULL | **操作时用户名快照**(不随改名变化) |
|
|||
|
|
| operation_type | VARCHAR(50) | NOT NULL | 见枚举列表 |
|
|||
|
|
| target_type | VARCHAR(30) | — | |
|
|||
|
|
| target_id | BIGINT | — | |
|
|||
|
|
| detail | JSONB | — | 补充信息 |
|
|||
|
|
| ip_address | VARCHAR(50) | — | |
|
|||
|
|
| result | VARCHAR(10) | NOT NULL | SUCCESS / FAIL |
|
|||
|
|
| error_message | TEXT | — | |
|
|||
|
|
| created_at | TIMESTAMP | NOT NULL DEFAULT NOW() | 分区键 |
|
|||
|
|
|
|||
|
|
**只追加**:应用层禁止 UPDATE/DELETE,建议 DB 层添加触发器强制执行
|
|||
|
|
**分区**:按 `created_at` Range 分区,以月为单位(`sys_operation_log_YYYY_MM`)
|
|||
|
|
|
|||
|
|
**operation_type 枚举**:
|
|||
|
|
`USER_LOGIN`、`USER_LOGOUT`、`USER_CREATE`、`USER_UPDATE`、`USER_DISABLE`、`USER_ROLE_CHANGE`、`SOURCE_UPLOAD`、`SOURCE_DELETE`、`TASK_CREATE`、`TASK_CLAIM`、`TASK_UNCLAIM`、`TASK_SUBMIT`、`EXTRACTION_APPROVE`、`EXTRACTION_REJECT`、`QA_APPROVE`、`QA_REJECT`、`TASK_REASSIGN`、`EXPORT_CREATE`、`FINETUNE_START`、`CONFIG_UPDATE`、`VIDEO_JOB_RESET`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 10. annotation_task_history — 任务流转历史
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| task_id | BIGINT | NOT NULL FK→annotation_task | |
|
|||
|
|
| from_status | VARCHAR(20) | — | 任务初建时为 NULL |
|
|||
|
|
| to_status | VARCHAR(20) | NOT NULL | |
|
|||
|
|
| operator_id | BIGINT | NOT NULL FK→sys_user | |
|
|||
|
|
| operator_role | VARCHAR(20) | NOT NULL | **操作时角色快照** |
|
|||
|
|
| note | TEXT | — | 驳回原因、转移说明等 |
|
|||
|
|
| created_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**只追加**:每次 annotation_task.status 变更时同步插入,与业务操作在同一事务中
|
|||
|
|
**索引**: `(task_id)`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 11. video_process_job — 视频异步处理任务
|
|||
|
|
|
|||
|
|
| 字段 | 类型 | 约束 | 说明 |
|
|||
|
|
|------|------|------|------|
|
|||
|
|
| id | BIGSERIAL | PK | |
|
|||
|
|
| company_id | BIGINT | NOT NULL FK→sys_company | |
|
|||
|
|
| source_id | BIGINT | NOT NULL FK→source_data | |
|
|||
|
|
| job_type | VARCHAR(20) | NOT NULL | FRAME_EXTRACT / VIDEO_TO_TEXT |
|
|||
|
|
| status | VARCHAR(20) | NOT NULL DEFAULT 'PENDING' | 见状态机 |
|
|||
|
|
| params | JSONB | NOT NULL | 处理参数 |
|
|||
|
|
| total_units | INT | — | 总帧数/片段数 |
|
|||
|
|
| processed_units | INT | NOT NULL DEFAULT 0 | |
|
|||
|
|
| output_path | VARCHAR(500) | — | |
|
|||
|
|
| retry_count | INT | NOT NULL DEFAULT 0 | |
|
|||
|
|
| max_retries | INT | NOT NULL DEFAULT 3 | |
|
|||
|
|
| error_message | TEXT | — | |
|
|||
|
|
| started_at / completed_at | TIMESTAMP | — | |
|
|||
|
|
| created_at / updated_at | TIMESTAMP | NOT NULL | |
|
|||
|
|
|
|||
|
|
**状态机**:
|
|||
|
|
```
|
|||
|
|
PENDING → RUNNING
|
|||
|
|
RUNNING → SUCCESS(处理成功)
|
|||
|
|
RUNNING → RETRYING(失败且 retry_count < max_retries)
|
|||
|
|
RUNNING → FAILED(失败且 retry_count >= max_retries)
|
|||
|
|
RETRYING → RUNNING(AI 服务自动重试)
|
|||
|
|
RETRYING → FAILED(超过最大重试次数)
|
|||
|
|
```
|
|||
|
|
*FAILED → PENDING:由 ADMIN 手动触发接口,不在状态机自动流转中*
|
|||
|
|
|
|||
|
|
**幂等规则**: 回调时若 `status == SUCCESS` 则静默忽略,不执行任何 DB 写入
|
|||
|
|
|
|||
|
|
**索引**: `(source_id)`、`(status)`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Redis 数据结构
|
|||
|
|
|
|||
|
|
| Key 模式 | 类型 | TTL | 内容 |
|
|||
|
|
|---------|------|-----|------|
|
|||
|
|
| `token:{uuid}` | Hash | 2h(滑动) | `{userId, role, companyId, username}` |
|
|||
|
|
| `user:perm:{userId}` | String | 5min | 用户角色字符串 |
|
|||
|
|
| `task:claim:{taskId}` | String | 30s | 持有者 userId |
|
|||
|
|
|
|||
|
|
*禁止在上述三类命名空间之外自造 Key 用于认证、权限或锁目的。*
|