Duplicate keywords appear at top of hierarchy

Joanna · November 17, 2021, 9:55am

This is indeed a problem.

As software developers, we are taught the golden rule of SPOD (Single Point Of Definition). This means that important information should only be stored in one place and always referred to and searched for from that place.

Unfortunately, DxO have chosen to keep keywords in, at least, three places: their internal database, DOP files and, for RAW images, XMP sidecar files. Which then leads to questions about what is the truth.

If an image has already been keyworded in another app, PL can read the XMP sidecar file and know what keywords come with it. In theory, that should remain the SPOD that everything depends on.

The problem is that searching through thousands of files to see which ones contain certain keywords is a slow process - hence the reason why most “DAM” solutions resort to keeping that information in a database. And this is true not just for PL but for most Windows based DAM solutions.

Why do I single out “Windows based” instead of including Mac as well? Because macOS includes a comprehensive metadata indexing system as part of the everyday operation of the operating system. They even include their own metadata “tags”, which can serve perfectly well as the basis for searching for files, of any type, by any of approximately 160 metadata terms, which include those automatically retrieved from image files, like aperture, shutter speed, ISO, GPS, keywords in XMP files, etc. All of these metadata are automatically indexed by the Spotlight mechanism and all it takes is to use Finder to search for files that contain them - no external database required.

I am in the final stages of writing my own lightweight keywording solution for Mac and, even though I don’t maintain a database, it can do lightning fast searches in tens of thousands of images.

Unfortunately, until only relatively recently, Windows did not possess such a metadata indexing system, hence why most DAM writers had to fall back on creating and maintaining their own databases, for a “one size fits all” solution for both Windows and Mac, even though it isn’t necessary for Mac.

So now we have two places where keywords have to be stored and updated when changes are made, creating a reconciliation and synchronisation problems which is what you are seeing when using another DAM to do some metadata manipulations and PL to do other manipulations. If it were a simple matter of everything being stored in only the XMP sidecars, life would be simple but, as it stands, something like Lr holds it’s own database and PL holds its own, so now we have three “sources of truth”.

But it doesn’t end there! DxO use DOP sidecar files to keep a record of edits made to the image, including virtual copies, which used to be fine until they decided to put keywords in there as well. So, now you have a third record of those keywords to keep in sync in PL alone.

And there’s more! If you have more than one virtual copy of an image, the DOP file contains a record of the keywords, which can be different, for both copies, but an XMP is automatically created only for the master copy. Other copies have their keywords written only when an image is exported, when they are written to the exported file - yet another source of truth

Not forgetting that Adobe provides alternative formats for storing hierarchical keywords, which may or may match other DAM solutions. And this is possibly the underlying cause of Mark’s problems.

In theory, the MWG (Metadata Working Group) provide guidelines for metadata storage but a lot of DAM writers take them as just that - guidelines. Even though Adobe participating in drawing them top, they then went ahead and contravened some of them themselves setting up their own “standards”, which most DAM writers are now afraid of breaking, even though the XMP standards are now deemed to be the more “universal”.

All in all, it’s a mess. I for one feel that DxO should not have stuck their head into this particular Hornet’s nest. But, having said that, they haven’t done too bad a job for a first foray into the field.

platypus · November 17, 2021, 11:56am

PhotoLab’s primary source of information is the PhotoLab database.

The sidecar files’ purpose is information exchange, be it with other applications that know .xmp, be it within a DxO universe with its .dop sidecar files.

As far as I’ve tested interoperability of .xmp sidecars between PhotoLab, PhotoMechanic, Lightroom and Capture One, the exchange mostly works, except for metadata fields that are not provided in all apps and some differences in how apps handle adding and removing metadata…

Nevertheless, users are well advised to

use exactly one application for metadata management
test migrations carefully before changing from one metadata management app to an other

Joanna · November 17, 2021, 2:32pm

You could not have given better advice

markinlcri · November 18, 2021, 1:29pm

Thanks @platypus, @Joanna, and others. I couldn’t face rebuilding the PL catalog so will abandon using the PhotoLibrary. I purchased other DAM software hat isn’t as nicely designed at the PhotoLibrary but doesn’t cause me grief! I wish I could tell you that I was able to follow your steps and found relief but, in fact, I am not sure.

The more I think about this problem, the less responsible I feel. I think that there is a pure bug in the software that is causing it to create new top-level keywords rather than assigning existing leaf keywords.

I’m really not sure how I activated this bug but I can’t really trust myself to avoid it in the future so I’m safer to use other software.

Again, thanks for engaging in this discussion.