feat(server): near-duplicate detection (#8228)

* duplicate detection job, entity, config

* queueing

* job panel, update api

* use embedding in db instead of fetching

* disable concurrency

* only queue visible assets

* handle multiple duplicateIds

* update concurrent queue check

* add provider

* add web placeholder, server endpoint, migration, various fixes

* update sql

* select embedding by default

* rename variable

* simplify

* remove separate entity, handle re-running with different threshold, set default back to 0.02

* fix tests

* add tests

* add index to entity

* formatting

* update asset mock

* fix `upsertJobStatus` signature

* update sql

* formatting

* default to 0.03

* optimize clustering

* use asset's `duplicateId` if present

* update sql

* update tests

* expose admin setting

* refactor

* formatting

* skip if ml is disabled

* debug trash e2e

* remove from web

* remove from sidebar

* test if ml is disabled

* update sql

* separate duplicate detection from clip in config, disable by default for now

* fix doc

* lower minimum `maxDistance`

* update api

* Add and Use Duplicate Detection Feature Flag (#9364)

* Add Duplicate Detection Flag

* Use Duplicate Detection Flag

* Attempt Fixes for Failing Checks

* lower minimum `maxDistance`

* fix tests

---------

Co-authored-by: mertalev <101130780+mertalev@users.noreply.github.com>

* chore: fixes and additions after rebase

* chore: update api (remove new Role enum)

* fix: left join smart search so getAll works without machine learning

* test: trash e2e go back to checking length of assets is zero

* chore: regen api after rebase

* test: fix tests after rebase

* redundant join

---------

Co-authored-by: Nicholas Flamy <30300649+NicholasFlamy@users.noreply.github.com>
Co-authored-by: Zack Pollard <zackpollard@ymail.com>
Co-authored-by: Zack Pollard <zack@futo.org>
This commit is contained in:
Mert
2024-05-16 13:08:37 -04:00
committed by GitHub
parent 673e97e71d
commit 64636c0618
61 changed files with 1254 additions and 61 deletions
+36 -2
View File
@@ -35,6 +35,7 @@ FROM
"asset"."originalFileName" AS "asset_originalFileName",
"asset"."sidecarPath" AS "asset_sidecarPath",
"asset"."stackId" AS "asset_stackId",
"asset"."duplicateId" AS "asset_duplicateId",
"stack"."id" AS "stack_id",
"stack"."primaryAssetId" AS "stack_primaryAssetId",
"stackedAssets"."id" AS "stackedAssets_id",
@@ -64,7 +65,8 @@ FROM
"stackedAssets"."livePhotoVideoId" AS "stackedAssets_livePhotoVideoId",
"stackedAssets"."originalFileName" AS "stackedAssets_originalFileName",
"stackedAssets"."sidecarPath" AS "stackedAssets_sidecarPath",
"stackedAssets"."stackId" AS "stackedAssets_stackId"
"stackedAssets"."stackId" AS "stackedAssets_stackId",
"stackedAssets"."duplicateId" AS "stackedAssets_duplicateId"
FROM
"assets" "asset"
LEFT JOIN "exif" "exifInfo" ON "exifInfo"."assetId" = "asset"."id"
@@ -129,6 +131,7 @@ SELECT
"asset"."originalFileName" AS "asset_originalFileName",
"asset"."sidecarPath" AS "asset_sidecarPath",
"asset"."stackId" AS "asset_stackId",
"asset"."duplicateId" AS "asset_duplicateId",
"stack"."id" AS "stack_id",
"stack"."primaryAssetId" AS "stack_primaryAssetId",
"stackedAssets"."id" AS "stackedAssets_id",
@@ -158,7 +161,8 @@ SELECT
"stackedAssets"."livePhotoVideoId" AS "stackedAssets_livePhotoVideoId",
"stackedAssets"."originalFileName" AS "stackedAssets_originalFileName",
"stackedAssets"."sidecarPath" AS "stackedAssets_sidecarPath",
"stackedAssets"."stackId" AS "stackedAssets_stackId"
"stackedAssets"."stackId" AS "stackedAssets_stackId",
"stackedAssets"."duplicateId" AS "stackedAssets_duplicateId"
FROM
"assets" "asset"
LEFT JOIN "exif" "exifInfo" ON "exifInfo"."assetId" = "asset"."id"
@@ -185,6 +189,35 @@ LIMIT
101
COMMIT
-- SearchRepository.searchDuplicates
WITH
"cte" AS (
SELECT
"asset"."duplicateId" AS "duplicateId",
"search"."assetId" AS "assetId",
"search"."embedding" <= > $1 AS "distance"
FROM
"assets" "asset"
INNER JOIN "smart_search" "search" ON "search"."assetId" = "asset"."id"
WHERE
(
"asset"."ownerId" IN ($2)
AND "asset"."id" != $3
AND "asset"."isVisible" = $4
)
AND ("asset"."deletedAt" IS NULL)
ORDER BY
"search"."embedding" <= > $1 ASC
LIMIT
64
)
SELECT
res.*
FROM
"cte" "res"
WHERE
res.distance <= $5
-- SearchRepository.searchFaces
START TRANSACTION
SET
@@ -337,6 +370,7 @@ SELECT
"asset"."originalFileName" AS "asset_originalFileName",
"asset"."sidecarPath" AS "asset_sidecarPath",
"asset"."stackId" AS "asset_stackId",
"asset"."duplicateId" AS "asset_duplicateId",
"exif"."assetId" AS "exif_assetId",
"exif"."description" AS "exif_description",
"exif"."exifImageWidth" AS "exif_exifImageWidth",