What is the purpose of the <dc:subject> tag?

freixas · January 24, 2024, 5:13pm

Some programmer went to some trouble to add the feature, so somewhere there is someone who knows its history and rationale. As I said, I might post the same question on an Adobe forum. Maybe someone there remembers…

There is a separate question as to whether PL’s output (or Adobe Bridge’s for that matter) violates some “standard”. @Joanna has repeatedly claimed that it does, although I haven’t seen the standards she refers to.

Of course, asking PL to follow the standard that @Joanna cites is equivalent to removing any value for the feature (except within PL, where it is just an irritation). For people who don’t want the “feature”, PL provides an incomplete solution: the options Write keyword hierarchies in the XMP dc:subject tag and When assigning keywords, apply whole hierarchies of keywords don’t do the job. The first option is fine, but the second only applies during assignment, and so does an incomplete job. We need to apply whole hierarchies on assignment by the user, during import, and during drag-and-drop. When un-assigning keywords, deleting keywords, and (again) drag-and-drop, the keyword structure needs to be carefully re-worked to preserve the correct associations.

For instance, an image contains A > B > C, A > E, and X > Y with all items checked. If I drag the C over the Y (NOTE: Updated from “move the E over the Y”). the correct end result, in my opinion would be A > E and X > Y > C (all checked).

To clarify, if an image contains A > B > C and A >E and I unassign C, then I would unassign the parents A and B (because they were theoretically added when C was assigned) and then restore A (because it is the parent of E). Dropping the B is not always correct, but the user could add it back.

With this behavior as an option, the Write keyword hierarchies in the XMP dc:subject tag option would not be needed–it would happen automatically.

platypus · January 24, 2024, 7:49pm

@freixas , from a point of view of logic, your example is not what I’d expect.

If we assume that A, B, C, E, X and Y are unique and that each “leaf” can belong to one “branch” only and that dragging over means “attach as leaf”, the old and new keyword lists should look like shown below:

Initial Relations (A in R1 and R2 are identical)
  R1:	A > B > C
  R2:	A > E
  R3:	X > Y

Modified Relations
  R1:	A > B > C
  R3:	X > Y > E

As we see, Relation R2 goes away because E is dragged away from it and because A is already present in R1. This is what I get in DPL7:

Update: The same happens in Lightroom Classic.

freixas · January 24, 2024, 8:12pm

Sorry, typo (I do try to proof-read what I write). I write that E got moved, when I pictured C moving. This is indicated in my second paragraph, where I say “To clarify…”.

I will correct my post. Thanks for catching this. I agree with your analysis of the results of moving the E onto Y.

It’s even more interesting if we start with A > B > C and A > E (all checked) and X > Y (neither checked). PL7 will wind up with A > B, A > E and X > Y > C, with A, B, and C still checked, but X and Y not checked.

Also, if I start with A > B > C (all checked, no E in this case) and uncheck C, then A and B remain checked in PL7. What the user really wants in this case is ambiguous; I was voting to uncheck A and B and then let the user re-check B if that is his intention.

platypus · January 24, 2024, 8:34pm

@freixas , please be very specific of what you write in every case. Are you talking about the keywords you see in the keywords tool in DPL, in its keyword list tool or in the XMP sidecar file. What we see is not always exactly what we get, although things have improved regarding keywords and keyword hierarchies.

What gets added (and to where) also depends on where entries are typed in or if boxes for existing keywords in the keyword list are checked or not. Again, be precise and complete, you know what you’re doing and thinking (I presume) , but we only see what you wrote.

freixas · January 24, 2024, 10:21pm

There are no checkmarks in an XMP file. There are no checkmarks in the Photolab database. Keyword checkmarks only exist in the Photolab keywords pane’s UI.

If anything is still unclear to you, feel free to ask.

Joanna · January 24, 2024, 11:46pm

I’m referring to the Metadata Working Group, which published guidelines

More info on the group can be found here.

Of course, Adobe was a founding member but then promptly introduced its own metadata tags as well. But, the principles in the MWG guidelines can easily be translated to use ‘lr:hierarchicalSubject’. Instead of the MWG hierarchy structure.

The problem is, when you do these manipulations, are you altering the hierarchies in the database, or in the current file? My testing shows that other files get changed because such movements affect the database, which then affects more than just the currently selected file.

Such is the danger of the present UI.

freixas · January 25, 2024, 1:08am

Thanks, I’ll take a look.

Yes, I want it to change more than just the current file. A drag-and-drop, for example, is a request to change the hierarchy, not just to alter one file. PL does this mostly right.

Admittedly, renaming my Locations keyword, currently associated with > 21,000 images, is a bit scary (it’s going to take some time!), but if I don’t change things globally, then I wind up with some files with the same keyword named one way and some the other, which is pointless. If I really want to rename it, the cost is updating > 21,000 files.

And this is the biggest problem, in my opinion, with the whole keyword system: that the keyword structure is disbursed among many files. This is insanity and creates huge performance problems when one is trying to just manipulate the structure. That’s in addition to all the inconsistencies that arise when one pulls in files from multiple sources (or even one’s new and old files, which might have been tagged at different times).

platypus · January 25, 2024, 7:56am

Keyword hierarchies are updated in the database, but not in the files. Re-writing metadata to the files is a weak feature of DPL, as it can only write the files folder by folder.

You could trial Adobe Lightroom, restructure your hierarchical keywords in it, then write to files, which Lr can do for the complete tree of folders with about 1000 writes per minute (on my Mac) and then let DPL re-index the folder structure, which it can do at another 1000 reads per minute. Subscribe on a monthly basis, if you need more time…or ask @Joanna to let you test her own beta keywording app.

freixas · January 25, 2024, 2:22pm

You must be running a different PL than I am. Perhaps you can clarify what you mean by “it can only write the files folder by folder.”

I wrote a tool that reads the keywords in the database and compares them to the keywords in the DOP, XMP, and RGB files. PL does have problems. I wrote in a separate post about some of the problems I found, which all looked like PL didn’t completely update all the files.

However, I have made structural changes to the database which were correctly updated in the files (including some drag-and-drop changes), so PL is certainly updating the files and not just the database. I can even hear it chugging away on my disk drive during some of the longer updates.

I don’t need LR since my tool can also repair any inconsistencies between the database and the files. I also don’t want to do keywording outside of PL, since I can’t view the processed images nor the virtual copies.

freixas · January 25, 2024, 3:18pm

Ok, I got around to checking this out.

The guidelines are from 2010 and the MWG group seems to have disbanded (their website is not responding, anyway).

The relevant section you cited is:

A Changer …

MUST write the XMP dc:subject property to store the individual keywords. Hierarchical path elements MUST be flattened, which means that each hierarchy node needs to be stored as a separate keyword entry to XMP dc:subject.

But elsewhere we find:

Categories

The perceived motivation for categories is to have nodes in the hierarchy that serve only to help organize the keywords. The applications that support categories (Adobe Lightroom and Photo Mechanic) do so by allowing any node to be called a category instead of a normal keyword. For example, “States” might be called a category. In that case, searching for “States” might not be allowed and metadata embedded in a file might only mention “Places” and “Wyoming”, leaving out “States”.

Note: This MWG guidance will not provide a specific solution for categories, as it seems not worth the effort to introduce this level of complexity for the consumer.

The standard doesn’t address what should happen when nodes are “categories” (in PL or Adobe Bridge, a “category” appears to be a parent keyword with no checkmark), but it suggests that, in that case, the category keywords might not be searchable and could be excluded from being embedded in the file.

Tools that decide to support “categories” could reasonably choose not to write the category keywords to the dc:subject tag. As I read it, the rules section you refer to (the first quote above) only applies when all nodes are keywords.

PL, I believe, gives the user the option of including or omitting the category keywords. Neither choice appears to violate the MGW guidance.

This is why I prefer to see original sources when things are cited. You repeatedly quote one section of the standard while omitting the other relevant section.

By the way, the Categories section of the MGW 2010 standard does answer the question I started with!

platypus · January 25, 2024, 4:48pm

Assume that you change a keyword in the keyword list while DPL is showing an empty folder. The changed keyword will only appear in the XMP sidecar file, once a folder is selected and DPL has synced metadata or has been told to write metadata to files.

Note that I’m on Mac (as you can see from the line next to my name tag) and therefore use DPL on macOS. Currently, this is DPL 7.3.0 build 43…which has no option to write metadata that has been changed outside of DPL to XMP sidecars automatically and/or in the background. The changed KW is present in the database and the relations are unchanged, but, again, nothing gets written to XMP sidecar files.

freixas · January 25, 2024, 6:28pm

Hmm…

My diagnostic tool found many problems, but this was after I had made a ton of changes to my database, so it wasn’t clear which changes caused which problems. I have been reluctant to file a bug report that said “I found some bugs but I don’t have a reproducible test case for you.”

After fixing all the problems (using my tool), I made many further changes to my keyword structure. However, as I wanted to get some actual work done, I took a cautious approach and always selected all the images involved in a rename or drag-and-drop before I made the change. The diagnostic tool found no further problems.

I did have one case where I had to drag-and-drop something where it wasn’t possible to select everything. After making the change, I ran the diagnostic tool again. It reported no problems. This led me to believe the bugs were not absolutely connected to whether the file was selected or not.

However , I used your description to create a test. Here’s what I found:

I started with PL 7.2 build 133.
I enabled the Synchronize metadata with XMP sidecar files option.
I selected two images, created a new keyword X, and tagged them both with the keyword. One image was a JPG, the other a RAW file.
I verified that the JPG had the keyword X with exiftool and I verified that the DOP and XMP files had the keyword X with a text editor.
I moved to a folder with no images.
I renamed X to Y.
I visited the original image folder with File Explorer.

The results:

Using a text editor to check, both files had a DOPs with the keyword Y. This is correct.
Using exiftool to check, the JPG still had the keyword X. This is wrong.
Using a text editor to check, the XMP file for the RAW image had the keyword X. This is wrong.
Returning to PL, I changed the folder back to the original folder.
On first viewing the JPG, PL claimed its keyword was Y.
On first viewing the RAW image, I got inconsistent results. One time, it claimed the keyword was Y. I repeated the entire test from the beginning and now the first visit said that the keyword was X.
One time, I flipped back and forth between the two images and the JPGs keyword changed to X at some point. I wasn’t able to repeat this.
Running the DOS equivalent of the Linux command 'touch" (which updates the modification date on a file) on the JPG and the XMP file caused PL to immediately update the keywords back to X in the database and the DOP files.

Definitely buggy. Thanks for the test case. I now have a specific test case to file as a bug report.

This doesn’t account for all the problems I found with my tool. For example, sometimes it was the DOP file that was in error.

FWIW, based on my experience, you can avoid a lot of problems by first selecting all the files affected by a keyword structure change. The images can be in different folders–they just have to be selected.

freixas · January 25, 2024, 8:48pm

I finally submitted a bug about this problem, #458690! We’ll see what DxO says.

Has anyone else already reported something like this? If so, what was DxO’s response?

platypus · January 26, 2024, 6:57am

Now, imagine a structure of folders containing 20’000 images and a flat list of keywords that should be a hierarchical list instead. In DPL, the procedure would/could be as follows.

display all images with Keyword “leaf”
select all images
drag “leaf” to “branch” and “branch” to “stem” and “stem” to “root” in the keyword list tool
make sure that all sidecars have been updated
restart with the next keyword

While this might be feasible, it certainly feels like boring, error prone work…and don’t forget that no-one has your database tweaker but you.

Even if one would restructure keywords in the database with suitable DB tools, one would have to apply the changes folder by folder or keyword by keyword to ensure updating the 20’000 files.

I suppose that DPL doesn’t feature an “update all” command because of the issues that could arise from DB issues that cannot be avoided because DPL has no provisions for that either.

freixas · January 26, 2024, 3:44pm

If you have 20,000 files embedded with flat keywords that you want to change to a hierarchy, how would you go about it? The procedure you outlined does work, tedious as it is. I never claimed it was not tedious, just that it would avoid mismatches between the database and the files.

If you want it, I can get you a copy. It should run on a Mac, although I haven’t tested it there. The copy I give out is a file checker, not tweaker. It reports problems, but makes no changes.

If you want it to fix problems, you have to edit the code, as I commented out that functionality. It’s easy to re-enable, but if the modified program causes any problems, it’s your program, not mine.

Send me a PM if you’re interested.

Remember your post about being very specific about what you write? I have no idea what you’re talking about here.

platypus · January 26, 2024, 4:22pm

When I change a keyword’s forebears in the Keyword List tool, the change only happens in PhotoLab’s database. If there were an “update all” command (with respective functionality), we could tell DPL to update all the files that contain the keyword with the forebears - and depending on how keyword options are set (include all forbears, write to dc:subject).

freixas · January 26, 2024, 5:03pm

Change how? Do you rename the parent keyword or do you drag-and-drop a keyword so it has a different parent?

In the following scenarios, I will assume that, for every keyword, you’ve made sure that all its parent keywords are selected.

If you want to rename a parent keyword, search for and select all images with that parent keyword. Then rename it. PL should update all relevant files.

If you want to drag-and-drop a keyword, then search for and select all images the keyword being dragged. Drag the image to its new location and wait for PL to update all the relevant files.

You will then want to enable the new parent. Hopefully, this will enable all other parents and update the files in a single pass, rather than one pass per parent.

The final, painful step is to disable the parents in the old hierarchy, if appropriate. Each change will trigger an update. Hopefully, you don’t have a lot of images associated with the keyword.

If parent keywords aren’t always selected, then the solution becomes even worse. PL’s database structure won’t let it find all the relevant files. To be absolutely sure you get everything, you have to search for all the terminal child keywords of a parent and fix them all.

Hmm… Now I’m considering filing a bug report. If I rename a parent keyword, even with no files selected, PL should find and update all images with that keyword. Even if they fix the other bug I reported, this problem would remain since the database doesn’t provide them any way of find all parent keywords unless they checked in every instance.

platypus · January 26, 2024, 5:24pm

Both

Except that it doesn’t. The database files are updated, the displayed image files might be updated, but the image and sidecar files lingering in a folder that is not shown remain as they are. They might be updated, once the respective folder is selected.

Not sure if the requested feature is a good idea. It could mean that every image and sidecar file needs to be written to, depending on what level of keyword is changed. There should at least be some kind of info about how many items will be changed plus buttons to revert (the change), cancel or accept the update.

Anyways, whatever DxO is planning or doing in that area, it will take a lot of convincing, for me to drop Lightroom with its fairly comprehensive set of keyword management and database maintenance features.

Joanna · January 28, 2024, 12:05pm

If you will permit me to ramble on a bit about this…

The biggest problem I see in PL’s keyword functionality is the lack of separation between keyword management and keyword assignment.

The tiny size of the palette gives you the impression that it is just a single tool, when it is, in fact, two separate tools. One for editing keywords for the currently selected image and another for “managing” the dictionary of keywords.

From what I can see, this mixture of ideas leads to wrong thinking about how to use hierarchical keywords.

Flat keywords are easy to understand - they are just one single word.

But a hierarchical “keyword” should be regarded as the whole path that it takes to define the leaf keyword.

A single keyword is not the equivalent of a “hierarchy of one” as PL would have us believe - it is simply a standalone keyword and, as such, should only be mentioned in dc:subject but not in lr:hierarchicalSubject.

Whereas, a hierarchical keyword must be fully defined in lr:hierarchicalSubject in order to maintain and transmit the structure or context of the “word”. Then, as recommended in the MWG guidance, all members of that hierarchy should be mentioned in dc:subject so that they are easily searchable from other software.

Using a Mac means I have access to an in-built metadata indexing engine, known as Spotlight. If I don’t write all keywords in a given hierarchy to dc:subject Spotlight can’t search for them - neither can some other software. But Spotlight doesn’t know anything about hierarchical structure so, in order to transmit that structure to other software, we also need to record it in the lr:hierarchicalSubject tag.

In all of the above, it is assumed that we are not relying on a “database” or “catalogue”, since that would mean we couldn’t easily transfer files from one computer to another without tedious export and import routines. So each image can be fully described, either independently in the case of non-RAW files, or by use of an XMP sidecar in the case of RAW files.

But every good DAM also needs to maintain a dictionary of keywords, including their hierarchies, in order to validate keywords on entry and provide a lookup on searching. The PL UI conflates the act of constructing and maintaining hierarchies with the act of assigning them to images.

Especially for the inexperienced user, it is not at all evident that whatever is done, ostensibly to one selected image, can also propagate, not only to the dictionary, but also to all other images with the same keyword or hierarchy.

Maybe my intent was to simply change the spelling of one keyword for just one image, because I typed it wrongly but, it is all too easy to accidentally end up applying the correction to that keyword where it also forms part of one or more hierarchies.

By far the best solution is to clearly visually separate keyword management from keyword assignment. IF they don’t want to add a popup window, why not add such a keyword management panel to the Preferences dialog?

My app has such a floating panel, which allows me to show a QuickLook panel of images that contain a selected keyword, just by pressing the spacebar…

… or to instigate a search in the main window, just by double-clicking on the selected word…

Assume I apply a hierarchical keyword to an image…

The XMP records…

[XMP]           Subject                         : Fruit, Orange, Satsuma
[XMP]           Hierarchical Subject            : Fruit, Fruit|Orange, Fruit|Orange|Satsuma

If I change the spelling of Orange…

Then the XMP changes to…

[XMP]           Subject                         : Fruit, Orage, Satsuma

… because I am clearly editing only the currently selected image in the main window.

Meanwhile, the keyword management dialog has been updated from…

… with five references to Orange to…

… with four references to Orange and one reference to Orage, but the original hierarchy hasn’t been changed and, if we look further down the left list of standalone keywords, we find…

… Orage has been correctly added to that list.

Change the spelling for that image back to include Orange and the XMP is returned to its original state.

But, if we change the spelling in the management dialog…

… updates all references in all hierarchies…

But what it hasn’t done is to update the XMP in any images - just in case that the change was a mistake. I have made changing hierarchies in the manager deliberately difficult to do to prevent such problems.

As you can see, you need to be especially careful when renaming keywords, just in case they participate in multiple hierarchies. It is by far the best to make such changes by creating a parallel hierarchy with the new spelling in the dictionary - searching for the old hierarchy and then replacing the old with the new in your images. Only after you are satisfied with the changes in your images can you then remove the old hierarchy from the dictionary.

So, my app enforces

use the keyword box on the main form and only the currently selected images get updated
use the keyword manager and only everything in the dictionary gets changed. Clear and logical.

freixas · January 28, 2024, 3:40pm

You and I think very differently about how keywording should work. What you see as the biggest problem doesn’t even show up on my radar.

As far as I’m concerned, flat keywords are just a hierarchy of one. Saying that A is one thing and that the A in A > B is another thing is what would seem confusing to me. I realize that keyword hierarchies can be flattened in the dc:subject tag. To me, this as a problem, not a feature.

Hmm…I’ve already mentioned that this is not what the MGW document recommends. This is only what they recommend if you don’t have “Categories”. You are deliberately dropping context and misstating what the MGW document says.

As I read it, the behavior you describe is incorrect per the MGW document. Users who use “Categories” are saying that they don’t want others to search by any keywords classified as a category (in PL, any parent keyword without a checkmark). Omitting category keywords from dc:subject is the correct behavior and the fact that Spotlight fails to find images using category keywords is exactly what should happen.

Disagree.

Say that a user assigns “tain” to an image by typing, but meant to assign “train”. Let’s walk through some scenarios:

There is no “tain” or “train” in the keyword hierarchy and you realize your error immediately. Renaming the keyword will do the right thing.
There is no “tain” in the in the keyword hierarchy and you realize your error immediately, but there is a “train”. You can’t rename “tain” to “train”. You would assign “train” to your image and unassign “tain”. You could then also delete “tain” from the database.
“tain” exists in the keyword hierarchy, but not “train” and you realize you’ve been consistently mistyping the word. You rename “tain” to “train”. You want all instances corrected.
“tain” and “train” exist in the keyword hierarchy. You realize that you’ve mistakenly tagged the current image “tain” when you meant “train” and that you might have done so for other images. You find all images tagged “tain”, select those which should have been “train”, tag them with “train” and untag “tain”.

I see no case where your scenario occurs.

At this point, you introduced an example using a lot of screenshots. Let me sum it up: You have assigned Fruit > Orange > Satsuma to an image and now you want to change “Orange” or “Orage” so as to wind up with Fruit > Orage > Satsuma, but only for one image.

The example doesn’t talk about what the user is trying to do. Is “Orage” a new kind of fruit? Are there Satsumas that are Orages and Satsumas that are Oranges? Presumably.

So we have an image showing an Orage Satsuma that was mistakenly tagged as an Orange Satsuma. In addition, Fruit > Orage is not currently in the keyword hierarchy. What I would do is to add the new hierarchy, tag the image with the proper keyword (Fruit > Orage > Satsuma) and untag it from the incorrect hierarchy (Fruit > Orange > Satsuma).

I would rename “Orange” to “Orage” only if “Orange” was the wrong term for that entire hierarchy. In that case, I would want all instances to be corrected.

I don’t see this at all. Say I have keyword “Months” with children “April”, “May”, and “June” and keyword “People” with children “April”, “May”, and “June”. In PL, there is no way to rename all instances of “May” to “May Smith”, so I don’t know why I would need to be worried or cautious.

Nor would I ever want to rename all instances of “April” everywhere in the keyword structure. Your tool seems to support this and it just seems like asking for trouble.

Within PL, a search for “April” would offer two possible sets of results: April (People) and April (Months). Within Spotlight, a search for “April” would return results including both, which is, in my opinion, a limitation of Spotlight and not a problem in the keyword hierarchy. Should Spotlight become “hierarchy-aware”, it could distinguish between the two, which is the right solution.

If you want flat keywords, use flat keywords. If you want hierarchies, use hierarchies. Search tools that flatten a hierarchy ignore important context.

Everyone is different. If you’ve found something that works for you, great. I would have a problem with your approach only if you were to convince the people at DxO to make changes that restrict how I like to keyword. Then I might have to write my own keyword tool.

I find the whole “type in a keyword” concept crazy. I mostly ignored the top part of the Keywords palette until I had a conversation with someone who said I could type in “A > B > C” to create that hierarchy. I tried it. Cute, but then I went back to working with the keywords list. If I could turn that top part of the Keywords palette off, I would.

I don’t like your interface where the keywords are flattened on the right. I have way too many keywords for that to ever work. Also, I can’t tell if you only include terminal keywords. I sometimes tag images with a parent keyword. For example, I might tag something “Portland” rather than “Portland > Reed College”. If your right hand list only includes terminal keywords, then it would be too restrictive. If you include everything, then the length problem is even worse.

My own advice for beginners: never type in a keyword except when first creating it. Use “+” to create top-level keywords, use right-click on an existing keyword to add child keywords.