Keyword bugs and enhancements

Yes, I know there are a million posts on keyword problems. I’m not sure if anything I write here has already been covered—it’s possible, but I haven’t seen these exact things.

I started to use keywords for the first time in PL7. I used to use keywords when I worked with Photoshop back in 2019 and earlier. One never gets the keyword hierarchy correct the first time, and I’m doing a lot of clean-up.

Problems:

  • My keywords are hierarchical, with all parent keywords selected when a child is added. If I drag an existing keyword from an old location to a new one, the parents appear to be unaffected. So if I move Arizona > Phoenix to Locations > Arizona, the count for the first Arizona stays the same and the count for the second (Locations > Arizona) also stays the same. I might forgive the first, but the second looks like a bug.

  • There appears to be no way of merging keywords. If I have Arizona > Phoenix and Locations > Arizona > Phoenix, then I cannot move Arizona > Phoenix to Locations > Arizona. The workarounds are painful. This is likely an enhancement, but it feels like a bug.

Enhancements:

  • If I’m viewing a folder, I can select images in the folder. The Keywords pane tells me which keywords are included in my selection. Often, the keyword is included in only some of the selected images. I would like an option to reduce the selection to include only the images with the keyword.

  • When I’m looking at the keyword list, I’d like to be able to right-click on a keyword and do a search for that keyword. The workaround is to switch from “Customize” to “PhotoLibrary”, type the keyword in the search box, and click one of the selections offered in order to begin the search. This is a slow way of doing what should be fast. Another potential interface for this feature would be to include a keyword pane in the PhotoLibrary tab and make it so that clicking on a keyword would launch a search.

  • This one I can’t understand: why can’t I resize the Keywords pane? I have plenty of room on my monitor, and yet I have to peer through a small porthole at my keywords. Even undocking the pane doesn’t help. This is idiotic, but probably a limitation of the framework. However, the Folders pane seems to expand as needed (although it has the alternate problem that it can’t be collapsed).

  • I think this may have been covered: it would be nice if the keyword list could be exported in some nice usable format (e.g. XML) .It would be even nicer if a keyword list could be imported.

OK, back to trying to fix my keywords, slowly and laboriously.

2 Likes

@freixas I will respond properly after Christmas Day but having the second post means that you will get to see the statistics for your post.

1 Like

Hi,

This is not a mistake
There are two ways in which hierarchical keywords can be written.
An old one that apparently DxO still uses and a new one that e.g. LR uses.

Let’s assume it’s about this hierarchy

Herkunft → Europa → Deutschland → Nordrhein-Westfalen → Köln

The “old” notation is to write:
Herkunft
Herkunft → Europa
Herkunft → Europa → Deutschland
Herkunft → Europa → Deutschland → Nordrhein-Westfalen
Herkunft → Europa → Deutschland → Nordrhein-Westfalen → Köln

The “new” notation ist:
Herkunft → Europa → Deutschland → Nordrhein-Westfalen → Köln

In the old notation are all keywords from this hierarchy assigned to your photo. In the new notation is only the last keyword assigned. But, because the hierarchy exists you can find your photo with all this keywords.

The easiest way to correct this is I mean with Lightroom or Photo Supreme :slight_smile:

1 Like

Unfortunately, I don’t know which this you are referring to. Your note about the change in the way hierarchies are written doesn’t seem relevant to any of the problems I noted. I’m not sure how the “new” notation would fix any of them.

You can move parts of the hierarchy as you wish. This automatically changes the number of images in the higher-level part of the old and new hierarchy.

This does not work with the old notation that DxO obviously uses, because every single keyword from a hierarchy is assigned to an image as a keyword.

I can’t explain it any better. Sorry. But these are reasons why I will never change my stars, colors or keywords in programs like DxO PL. I leave that to programs that do it properly.

Got it. Thanks.

However, whether old or new notation, DxO could do it right.

For compatibility with some programs, it looks like PL doesn’t require you to enable a parent keyword when the child is selected. Adobe’s own Bridge program provides the same option as PL, so it’s surprising that LR would work differently.

I tested Adobe Bridge. I found a random file and tagged it with keywords Test and Test > Subtest. Then I used exiftool to see what Adobe Bridge wrote:

<XMP-lr:HierarchicalSubject>
<rdf:Bag>
<rdf:li>Test</rdf:li>
<rdf:li>Test|Subtest</rdf:li>
</rdf:Bag>
</XMP-lr:HierarchicalSubject>

Adobe Bridge does not appear to be using the new notation (and I just installed the latest version a few days ago). How sure are you of your information?

100% sure.
My preferred RAW editor is Capture One, and CaptureOne has the same problem :slight_smile:
I’m using to manage my metadata different programs. They all works with defined standard

ExifTool, IMatch, Photo Supreme, Lightroom

exiftool -subject -hierarchicalSubject ‘C:\DSLR\DSLR-RAW\ExtHDD 2506\OMDEM1X\2019\04\27_rr43916.xmp’

Subject                         : Bienenfresser, Calera y Chozas, Europa, Herkunft, Kastilien-La Mancha, Racke Eisvogel Hopfartige, Spanien, Tier, Umwelt, Vogel
Hierarchical Subject            : Herkunft|Europa|Spanien|Kastilien-La Mancha|Calera y Chozas, Umwelt|Tier|Vogel|Racke Eisvogel Hopfartige|Bienenfresser`
tippe oder füge den Code hier ein

notepad ‘C:\DSLR\DSLR-RAW\ExtHDD 2506\OMDEM1X\2019\04\27_rr43916.xmp’

 <rdf:Description rdf:about=''
  xmlns:lr='http://ns.adobe.com/lightroom/1.0/'>
  <lr:hierarchicalSubject>
   <rdf:Bag>
    <rdf:li>Herkunft|Europa|Spanien|Kastilien-La Mancha|Calera y Chozas</rdf:li>
    <rdf:li>Umwelt|Tier|Vogel|Racke Eisvogel Hopfartige|Bienenfresser</rdf:li>
   </rdf:Bag>
  </lr:hierarchicalSubject>
  <lr:weightedFlatSubject>
   <rdf:Bag>
    <rdf:li>Calera y Chozas</rdf:li>
    <rdf:li>Kastilien-La Mancha</rdf:li>
    <rdf:li>Spanien</rdf:li>
    <rdf:li>Europa</rdf:li>
    <rdf:li>Herkunft</rdf:li>
    <rdf:li>Bienenfresser</rdf:li>
    <rdf:li>Racke Eisvogel Hopfartige</rdf:li>
    <rdf:li>Vogel</rdf:li>
    <rdf:li>Tier</rdf:li>
    <rdf:li>Umwelt</rdf:li>
   </rdf:Bag>
  </lr:weightedFlatSubject>
 </rdf:Description>

When it comes to keywords, there are standards and there are standards. The trick is to use the most universally accepted standard, in order to make the metadata accessible to the majority of management apps.

During the beta and into the release for PL5, I did a lot of work on compatibility, whilst writing my own keyword management app that gave me more flexibility in browsing and, especially, searching after keywords had been assigned.

I found it is essential to understand the exact minimum XMP tags to use for the best results.

The first, and most important, tag is xmp-dc:subject. This is also mapped to xmp:subject and mwg:keywords. its purpose is to store a flattened list of all keywords stored, whether they be standalone or part of hierarchies.

This is the tag that the vast majority of software uses for searching. If you want to search for a complete hierarchy, then I can simply AND all the constituent keywords in that hierarchy together in the predicate.

So I can have a file marked…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Entreprise</rdf:li>
    <rdf:li>Télécommunications</rdf:li>
    <rdf:li>Orange</rdf:li>
   </rdf:Bag>
  </dc:subject>

And that will allow me to search for either the complete hierarchy or files that contain part of the hierarchy.

Take the example of an image library that contains four different images, one with each of the following hierarchies…

Couleur|Orange
Fruit|Orange|Satsuma
Matériel|Télécommunications|Orange
Entreprise |Télécommunications|Orange

Now, imagine I am writing an article on the effect of the word Orange in advertising and I want to search for all images that contain Orange, regardless of any other keywords or hierarchical context.

See what happens in PL7 when I try to search for Orange in more than one hierarchical context…

The first time around, I am offered all four contexts with a count of one file per reference.

Now, because that first predicate only refers to Orange in the Couleur hierarchy, I now proceed to try and add Orange from the Fruit hierarchy…

So, it seems it is impossible, in PL7, as in previous versions, to search for a keyword in multiple hierarchical contexts at the same time.


And this is just one example of the problems that DxO have produced, because they have indexed files using hierarchies rather than just the simple dc:subject tag.

What gets written to the DOP file, and also in the database, is…

			Keywords = {
				{
					"Entreprise",
					"Télécommunications",
					"Orange",
				},
				{
					"Entreprise",
				},
				{
					"Entreprise",
					"Télécommunications",
				},
			},

… which clearly shows that Orange is only accessible through the fully qualified hierarchy, which must include both Entreprise and `Télécommunications, thereby precluding any other context.


So, to clarify what should be stored where and why…

According to the metadata guidelines, the dc:subject tag should contain all keywords including those that result from flattening any hierarchies. This is the “working” tag that is used for searching and, if correctly used, should allow any software to find files for any complex logic query.

However, the lr:hierarchicalSubject tag is only meant to be a means of “transmitting” any hierarchical contexts that may exist for the keywords in the dc:subject tag.

e.g.…

         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Entreprise</rdf:li>
               <rdf:li>Entreprise|Télécommunications</rdf:li>
               <rdf:li>Entreprise|Télécommunications|Orange</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

The “bug” that was introduced in PL5, and that still exists, and that can be the cause of incompatibility is all down to the above described “misuse” of hierarchical tags.

Oh, and not forgetting that DxO have not maintained the idea of SPOD (single point of definition), by allowing a keyword to be written to…

  1. the database
  2. the DOP file
  3. an XMP file

Which is why my best recommendation is to never use PhotoLab for keywording unless you are planning on never using any other keyword management software concurrently. And, even then don’t expect to be able to search easily for anything other than simple keywords in non-compound conditions.


See what happens using my software…

  1. search for Orange…

The search results show 5 images and selecting one of them produces…

… and another…

… etc.

2 Likes

Indeed…and I think the the “selection of the most usable standard” means that the “largest usable deviation in relation to the standard(s)” is really what is used in all the apps dealing with such things. This way of dealing with standards restricts the interchange and shared management to what is common in the apps we use.

69fc2423
The more apps we use to manage metadata, the smaller the usable space gets.
Rules (and standards) are kept alive by their exceptions.
:innocent:

Thanks for the detailed explanation—it’s very useful.

At this point, I’m not using any other keyword tool, nor do I currently have any cases such as your Orange example.

However…

I deleted PL’s database, indexed everything, and then cleaned up the keyword mess with sidecar synch enabled. I then renamed the database and indexed everything again. In theory, I should have the same keyword list as before, but the operation has revealed some problems. I need to go back to my backups and try to figure out who did what, since some of my keywording was done with Adobe Bridge CS6 long ago.

Keywording inside PL is buggy. Keywording with a different tool has drawbacks. . Writing my own keyword manager is not the path I want to go. I’m still trying to figure out the best option.

Imo, you can use DPL to manage your keywords as long as you use it exclusively and all the keywords you added outside of DPL are stored in XMP, be it in the image files or affiliated sidecars.

Cleaning up keywords needs to be done in the keyword list tool and then written to files again. As of now, DPL can’t store XMP for all files, so you’ll have to do it folder by folder. Annoyingly stupid work depending how finely structured one’s photo archive is.

Some of the pain of re-writing XMP can possibly be relieved by searching for files without sidecars (using system tools) …or by copying all images to a project and write XMP using the project. But even then, a lot of manual effort is necessary…and I suppose that projects can only carry a limited number of files.

1 Like

In which files can’t it store XMP? I thought that for RGB files, keyword changes go into the image, an associated DOP, and the database. In some cases (RGB files produced by Affinity Photo, for example, or a RAW file with an existing XMP sidecar), it will also write to an XMP sidecar.

I also don’t understand "As of now, DPL can’t store XMP for all files, so you’ll have to do it folder by folder. " What happens differently when you go folder-by-folder?

I could live with this with one caveat: If I delete the PL database and re-index the images, I want all my keywords back exactly as I left them. I don’t want my keywords trapped inside the PL database. As I mentioned above, this is not the case and I’m trying to figure out why.–some keywords survived, others were messed up.

XMP sidecars are used for files that PhotoLab will not write to, e.g. ooc RAW image files. File formats that PhotoLab will write to therefore don’t need separate XMP sidecars. I’ve never bothered to store sidecar files in PhotoLab and have never tested all possible file sources and formats and therefore only learned the general rules. Also, DNG files can be treated as RAW files and non-RAW files depending on how image data is stored in them…but DNG was designed to be known and open, while RAW is usually undocumented, can carry encrypted content etc, the reason for which most RAW editors don’t write to those formats - except applications, like e.g. Canon DPP, provided by the camera manufacturer.

…at once, that is.

DPL only sees the images in the selected folder and will only write what it sees. XMP of files in unselected folders are not updated. Lightroom on the other hand, can be made to update all files contained within a folder and its subfolders with one command. Using DPL means that, if you have a photo archive of e.g. 10 folders that contain 10 subfolders each, you’ll have to make >100 selections and launch metadata writing >100 times.

PhotoLab’s indexing does read from all files and sub-folders of the selected folder, but there is no way yet to make DPL write in the same way.

This might be right, but I haven’t figured out the exact rule.

I made some global changes. Some were done with drag-and-drop, and some by searching for a keyword, selecting everything matched, and then adjusting the keywords.

I just spot-checked a RAW file from 2019. It started with a keyword hierarchy of

Birds|Herons, Ibis, and Allies|Great Egret

I changed this globally (not sure whether I used drag/drop or my alternate approach) to

Wildlife|Bird|Herons, Ibis, and Allies|Great Egret

and then much later to

Wildlife|Birds|Herons, Ibis, and Allies|Great Egret

The XMP file now has

               <rdf:li>Wildlife</rdf:li>
               <rdf:li>Wildlife|Bird</rdf:li>
               <rdf:li>Wildlife|Bird|Herons, Ibis, and Allies</rdf:li>
               <rdf:li>Wildlife|Bird|Herons, Ibis, and Allies|Great Egret</rdf:

so it missed my change of Bird to Birds. However, it caught some earlier keyword changes.

The DOP has the all the edits

{
"Wildlife",
"Birds",
"Herons, Ibis, and Allies",
"Great Egret",
}

I never went folder-by-folder. It’s possible that an XMP file got created on the first keyword edit and was then not updated on later edits. Even if it did, it was done as part of a global edit.

The bottom line, though, is that there are a pile of bugs even though a lot of things do work.

Hmmm. That seems a bit odd. Are these from the dc:subject tag or the xmp-lr:hierarchicalSubject tag?

This also seems odd as there is no hierarchy specified. It is just a list of three standalone keywords.

The latter.

I didn’t quote the entire thing. There was one group with just “Wildlife”, one group with “Wildlife”, “Birds”, etc. I just showed the last one to demonstrate the presence/absence of the “s” on “Birds”.

I assumed that this is how the DOP file handles hierarchy. I see the same thing in other DOP files. Here is a complete random sample from elsewhere:

Keywords = {
{
"Locations",
"United States",
"Oregon",
"Portland",
"Powell Butte",
}
,
{
"Locations",
"United States",
"Oregon",
"Portland",
}
,
{
"Locations",
"United States",
"Oregon",
}
,
{
"Locations",
}
,
{
"Locations",
"United States",
}

The hierarchy should be obvious: Locations > United States > Oregon > Portland > Powell Butte.

Okay, there were two changes, the first added two parent levels, the second changed the spelling. “much later” seems to indicate that DPL was restarted before change number two.

If I read your hierarchy correctly, it is


Wildlife

  • Birds
    • Herons
    • Ibis
    • Allies
      • Great Egret

Counting peas here in order to try to reproduce what you did :wink:

If I may indulge myself…

Given a keyword hierarchy of Fruit | Orange | Satsuma.

If I use the keyword palette to assign just Satsuma…

First, the drop-down shows the hierarchy in reverse order (leaf to root)

If I select the proposed full hierarchy, I get…

… as expected.

But, if I continue typing just Satsuma…

… I get…

… where PL has assumed I want a standalone version of Satsuma and not in its hierarchical context. This also adds the standalone version to the keyword dictionary.


I can now add in the hierarchical version without its parentage…

But this now starts to make things rather confusing when the metadata gets written out.

Using my software, this mixture of standalone and hierarchical contexts gets written to the XMP file as…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Satsuma</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Orange</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
   <rdf:Bag>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Fruit|Orange</rdf:li>
    <rdf:li>Fruit|Orange|Satsuma</rdf:li>
   </rdf:Bag>
  </lr:hierarchicalSubject>

The standalone is simply written to the dc:subject tag and the hierarchical version is defined in the lr:hierarchicalSubject tag.

Whereas PL does something subtly different…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Fruit</rdf:li>
               <rdf:li>Orange</rdf:li>
               <rdf:li>Satsuma</rdf:li>
            </rdf:Bag>
         </dc:subject>
         
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Fruit|Orange|Satsuma</rdf:li>
               <rdf:li>Satsuma</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

It defines a single level hierarchical reference to the, otherwise, standalone version of Satsuma.

According to metadata guidelines, this is totally unnecessary but, because PL insists on using hierarchies for searching in its database, it then writes this out to the DOP files as…

			Keywords = {
				{
					"Fruit",
					"Orange",
					"Satsuma",
				},
				{
					"Satsuma",
				},
			},

This might seem like an interesting way of maintaining the different contexts of a keyword but, if I then try to search for Satsuma, I get some really weird results.


I assign both the hierarchical Satsuma and the standalone Satsuma to one image in the folder…

… but just the standalone Satsuma to another image…

So, both images contain the standalone Satsuma.

And yet, when I try to search for files that contain the standalone version…

… the indication is that there is only one image.

But, when I try to add the hierarchical version to the search…

… all of a sudden, the second reference becomes visible. This seems like a timing error, where the drop-down list is not getting updated quickly enough.


So, what happens if I want to implement my scenario of finding all images that contain the keyword Satsuma regardless of where that word is found in any hierarchies?

First, I add the standalone word and get this…

Two images, as expected.

But, if I want to also include the hierarchical version, I get an unexpected result…

Only one image!!!

This is because PL doesn’t know how to create compound predicates, apart from ANDing components. There is no facility for asking for Satsuma OR Fruit|Orange|Satsuma.

In fact, the search mechanism cannot specify ORed compound predicates, regardless of the attributes being searched on.


My strong advice is to completely avoid using PL for either metadata storage or searching, unless you only want the simplest of standalone attributes and, definitely, not for keywords, especially hierarchical ones.

1 Like

Deleted by author pending review

Completely avoiding PL for metadata searching seems a bit extreme. Say that I’ve tagged images as potentials for a gallery exhibition and I want to export these images for discussion with others. With PL, I can do a search, select all the images, and export them to a folder. With external software, I would search for the images, but then what? Only DxO has access to the image adjustments.

Tagging images with keywords externally can also be a problem. Again, using the gallery images, I need to see the processed image in order to decide whether to include it or not. An external viewer, displaying a preview of the RAW file, is not going to enable me to make this decision. So I would have to view the image in PL and then go find it and tag it externally.

In some cases, I am tagging virtual copies. Try adding a keyword to a virtual copy in an external tool. Keep in mind that the original might be tagged “Color” and the copy might be tagged “B&W”.

I have a large hierarchy, but most of my hierarchy branches are independent. ANDing is mostly all I need. There might be some oddball case where I want to view all Great Blue Herons and all Great Egrets, but that would be rare.

Right now, my biggest problem is that I know that if I lose my database and rebuild it, I will not get my original keyword hierarchy back.

My second problem is that I’m not sure, after all the keyword editing I did, that if I search for “Birds”, I get everything that has “Bird” as a parent. There’s a tedious fix for that.

I’m tempted to write my own program to ensure that what is in the database is actually properly copied to the associated DOP/XMP files, and into RGB files, but that’s not how I really want to spend my time. It’s not a 2-hour project, it’s a 2-week project.

I haven’t bothered much with any other metadata, so I’m not sure what problems might lurk there.

I’m not happy that PL keywording is buggy, but using an external tool is not a solution that really works for me.