Skip to main content

File Service Architecture (Draft)

Mission Summary

  • Provide a permission-gated file management subsystem with drag-and-drop uploads, secure sharing, and backend-driven downloads.
  • Enforce API-only access to binary content while supporting multiple storage providers and ExecModule powered transformations.
  • Build with TBWEEP (tests before writing executable production code) so functionality is always covered by automated checks.

Domain Model

  • files – core metadata (UUID id, org_id, uploader_id, storage_key, driver, filename, size_bytes, mime_type, sha256, etag, status, virus_scan_state, created_at, updated_at, deleted_at, retention_expires_at).
  • file_acl – resource-level grants {file_id, subject_type, subject_id, permissions(read,write,share,delete), granted_by, granted_at}; mirrors but does not replace Spring Security ACL tables.
  • file_upload_sessions – multipart/resumable uploads (upload_id, file_id, storage_key, storage_driver, part_size, initiated_by, expires_at, completed_at, metadata_json).
  • spaces – logical data rooms (id, org_id, name, description, owner_id, created_at, updated_at, visibility, status).
  • space_members – memberships {space_id, subject_type(user|org|role), subject_id, role(owner|admin|editor|viewer), invited_by, invited_at}.
  • space_files – join table that allows associating files with zero or more spaces (primary context for breadcrumbs / navigation).
  • file_jobs – ExecModule requests and status (job_id, file_id, module_key, status, requested_by, payload_json, result_json, started_at, completed_at, error_reason).
  • file_tokens – short-lived download/view tokens (token_id, file_id, slug, jwt_id, expires_at, max_uses, remaining_uses, disposition, scope, created_by).
  • file_tags – normalized tagging (file_id, tag, created_by, created_at) to support filters.

Storage Drivers

  • Interface StorageDriver with multipart init/complete, presign (GET/PUT), head, copy, move, delete.
  • Drivers shipped MVP: S3StorageDriver, LocalFSStorageDriver, GCSStorageDriver, AzureBlobStorageDriver.
  • Driver selection via Spring configuration (valkyrai.file-storage.driver=s3|local|gcs|azure).
  • ExecModule copy/move uses the driver factory to instantiate target/source provider at runtime.

UI Explorer (Current)

  • web/typescript/valkyr_labs_com/src/components/FileManager delivers a React-based tree with drag-and-drop uploads backed by the new file endpoints.
  • Folder selection persists through the directory_path column; uploads call POST /files/uploads/init followed by /files/uploads/{sessionId}/direct.
  • The Files experience now surfaces as a dedicated LCARS dashboard tab and reuses the enhanced FileUploader dropzone for manual uploads.

API Surface (OpenAPI 3.1)

  • POST /files/uploads/init – create upload session, return provider-specific multipart fields and upload_id.
  • POST /files/uploads/complete – verify parts, compute SHA-256, finalize record, trigger virus scan workflow.
  • GET /files – paginated list with filters: prefix, search, mime_type, tag, owner, space, status.
  • GET /files/{id}/meta – metadata with ACL summary and audit details.
  • PATCH /files/{id} – rename, move (prefix/space), tag updates (add/remove), update retention.
  • DELETE /files/{id} – soft delete; enqueues purge job respecting retention.
  • POST /files/{id}/actions/presign – create API-gated download/view token; returns single-use or TTL-bound JWT token.
  • GET /files/{id}/view?token= – stream file via backend; enforces virus scan status, ACL, and token scope.
  • POST /files/{id}/acl/grant & /revoke – manage resource ACL entries.
  • POST /spaces / GET /spaces / POST /spaces/{id}/share / POST /spaces/{id}/add-file – manage Data Rooms.
  • ExecModule integration: POST /files/{id}/exec/{module} (new job), GET /jobs/{id}, POST /jobs/{id}/cancel.

Security & Compliance

  • JWT auth reused from ValkyrAI; roles owner/admin/editor/viewer enforce baseline RBAC.
  • Resource ACL adds subject scoped permissions; default owner gets full rights.
  • Virus/DLP scan hook: file states UPLOADING → SCANNING → AVAILABLE; downloads blocked until AVAILABLE.
  • Soft delete retains data until retention_expires_at; purge worker deletes storage object and metadata.
  • Tokenized links always call /files/{id}/view route; raw provider URLs never exposed.
  • Events emitted on upload lifecycle, sharing actions, and ExecModule job transitions to the existing event bus.

TBWEEP Test Strategy

  • Unit tests first for storage driver factory, ACL evaluator, and upload session validator.
  • Service-level tests (Spring @DataJpaTest + @SpringBootTest) for upload finalize flow, ACL grant/revoke, token issuance.
  • Integration tests using LocalFS + Testcontainers MinIO (for S3 driver) verifying multipart init/complete + download gating.
  • End-to-end tests (Playwright or Cypress) for UI package once backend endpoints land; mocked API for CI.
  • Contract tests ensure OpenAPI spec stays aligned (use Schemathesis or OpenAPI snapshot validation).

Next Steps

  • Wire remaining Liquibase change sets for ACL/tokens and expand metadata indexes.
  • Add service/controller integration tests (MockMvc) and repository slices once DB fixtures land.
  • Flesh out advanced driver features (multipart persistence, resumable state) and ExecModule job orchestration.
  • Build UI explorer package with drag-and-drop upload, permissions-aware actions, and Storybook examples.