feat(server): near-duplicate detection (#8228)
* duplicate detection job, entity, config * queueing * job panel, update api * use embedding in db instead of fetching * disable concurrency * only queue visible assets * handle multiple duplicateIds * update concurrent queue check * add provider * add web placeholder, server endpoint, migration, various fixes * update sql * select embedding by default * rename variable * simplify * remove separate entity, handle re-running with different threshold, set default back to 0.02 * fix tests * add tests * add index to entity * formatting * update asset mock * fix `upsertJobStatus` signature * update sql * formatting * default to 0.03 * optimize clustering * use asset's `duplicateId` if present * update sql * update tests * expose admin setting * refactor * formatting * skip if ml is disabled * debug trash e2e * remove from web * remove from sidebar * test if ml is disabled * update sql * separate duplicate detection from clip in config, disable by default for now * fix doc * lower minimum `maxDistance` * update api * Add and Use Duplicate Detection Feature Flag (#9364) * Add Duplicate Detection Flag * Use Duplicate Detection Flag * Attempt Fixes for Failing Checks * lower minimum `maxDistance` * fix tests --------- Co-authored-by: mertalev <101130780+mertalev@users.noreply.github.com> * chore: fixes and additions after rebase * chore: update api (remove new Role enum) * fix: left join smart search so getAll works without machine learning * test: trash e2e go back to checking length of assets is zero * chore: regen api after rebase * test: fix tests after rebase * redundant join --------- Co-authored-by: Nicholas Flamy <30300649+NicholasFlamy@users.noreply.github.com> Co-authored-by: Zack Pollard <zackpollard@ymail.com> Co-authored-by: Zack Pollard <zack@futo.org>
This commit is contained in:
@@ -35,6 +35,7 @@ FROM
|
||||
"asset"."originalFileName" AS "asset_originalFileName",
|
||||
"asset"."sidecarPath" AS "asset_sidecarPath",
|
||||
"asset"."stackId" AS "asset_stackId",
|
||||
"asset"."duplicateId" AS "asset_duplicateId",
|
||||
"stack"."id" AS "stack_id",
|
||||
"stack"."primaryAssetId" AS "stack_primaryAssetId",
|
||||
"stackedAssets"."id" AS "stackedAssets_id",
|
||||
@@ -64,7 +65,8 @@ FROM
|
||||
"stackedAssets"."livePhotoVideoId" AS "stackedAssets_livePhotoVideoId",
|
||||
"stackedAssets"."originalFileName" AS "stackedAssets_originalFileName",
|
||||
"stackedAssets"."sidecarPath" AS "stackedAssets_sidecarPath",
|
||||
"stackedAssets"."stackId" AS "stackedAssets_stackId"
|
||||
"stackedAssets"."stackId" AS "stackedAssets_stackId",
|
||||
"stackedAssets"."duplicateId" AS "stackedAssets_duplicateId"
|
||||
FROM
|
||||
"assets" "asset"
|
||||
LEFT JOIN "exif" "exifInfo" ON "exifInfo"."assetId" = "asset"."id"
|
||||
@@ -129,6 +131,7 @@ SELECT
|
||||
"asset"."originalFileName" AS "asset_originalFileName",
|
||||
"asset"."sidecarPath" AS "asset_sidecarPath",
|
||||
"asset"."stackId" AS "asset_stackId",
|
||||
"asset"."duplicateId" AS "asset_duplicateId",
|
||||
"stack"."id" AS "stack_id",
|
||||
"stack"."primaryAssetId" AS "stack_primaryAssetId",
|
||||
"stackedAssets"."id" AS "stackedAssets_id",
|
||||
@@ -158,7 +161,8 @@ SELECT
|
||||
"stackedAssets"."livePhotoVideoId" AS "stackedAssets_livePhotoVideoId",
|
||||
"stackedAssets"."originalFileName" AS "stackedAssets_originalFileName",
|
||||
"stackedAssets"."sidecarPath" AS "stackedAssets_sidecarPath",
|
||||
"stackedAssets"."stackId" AS "stackedAssets_stackId"
|
||||
"stackedAssets"."stackId" AS "stackedAssets_stackId",
|
||||
"stackedAssets"."duplicateId" AS "stackedAssets_duplicateId"
|
||||
FROM
|
||||
"assets" "asset"
|
||||
LEFT JOIN "exif" "exifInfo" ON "exifInfo"."assetId" = "asset"."id"
|
||||
@@ -185,6 +189,35 @@ LIMIT
|
||||
101
|
||||
COMMIT
|
||||
|
||||
-- SearchRepository.searchDuplicates
|
||||
WITH
|
||||
"cte" AS (
|
||||
SELECT
|
||||
"asset"."duplicateId" AS "duplicateId",
|
||||
"search"."assetId" AS "assetId",
|
||||
"search"."embedding" <= > $1 AS "distance"
|
||||
FROM
|
||||
"assets" "asset"
|
||||
INNER JOIN "smart_search" "search" ON "search"."assetId" = "asset"."id"
|
||||
WHERE
|
||||
(
|
||||
"asset"."ownerId" IN ($2)
|
||||
AND "asset"."id" != $3
|
||||
AND "asset"."isVisible" = $4
|
||||
)
|
||||
AND ("asset"."deletedAt" IS NULL)
|
||||
ORDER BY
|
||||
"search"."embedding" <= > $1 ASC
|
||||
LIMIT
|
||||
64
|
||||
)
|
||||
SELECT
|
||||
res.*
|
||||
FROM
|
||||
"cte" "res"
|
||||
WHERE
|
||||
res.distance <= $5
|
||||
|
||||
-- SearchRepository.searchFaces
|
||||
START TRANSACTION
|
||||
SET
|
||||
@@ -337,6 +370,7 @@ SELECT
|
||||
"asset"."originalFileName" AS "asset_originalFileName",
|
||||
"asset"."sidecarPath" AS "asset_sidecarPath",
|
||||
"asset"."stackId" AS "asset_stackId",
|
||||
"asset"."duplicateId" AS "asset_duplicateId",
|
||||
"exif"."assetId" AS "exif_assetId",
|
||||
"exif"."description" AS "exif_description",
|
||||
"exif"."exifImageWidth" AS "exif_exifImageWidth",
|
||||
|
||||
Reference in New Issue
Block a user