feat(US1): text triple extraction — POST /api/v1/text/extract

- app/models/text_models.py: TripleItem, SourceOffset, TextExtract{Request,Response}
- app/services/text_service.py: TXT/PDF/DOCX parsing + LLM call + JSON parse
- app/routers/text.py: POST /text/extract handler with Depends injection
- tests/test_text_service.py: 6 unit tests (formats, errors)
- tests/test_text_router.py: 4 router tests (200, 400, 502×2)
- 10/10 tests passing
This commit is contained in:
wh
2026-04-10 15:27:27 +08:00
parent e1eb5e47b1
commit dd8da386f4
18 changed files with 321 additions and 1 deletions

25
app/models/text_models.py Normal file
View File

@@ -0,0 +1,25 @@
from pydantic import BaseModel
class SourceOffset(BaseModel):
start: int
end: int
class TripleItem(BaseModel):
subject: str
predicate: str
object: str
source_snippet: str
source_offset: SourceOffset
class TextExtractRequest(BaseModel):
file_path: str
file_name: str
model: str | None = None
prompt_template: str | None = None
class TextExtractResponse(BaseModel):
items: list[TripleItem]