What is the purpose of the <dc:subject> tag?

freixas · January 22, 2024, 10:53pm

Let’s say I have a keyword hierarchy of A > B > C. I select an image and I place a checkmark by the box next to C. Neither A or B is checked.

My question is: if I see the scenario I began with, what could I conclude was the user’s intent in placing a checkmark only next to C?

If I request that Photolab (PL) write these keywords to an XMP file, I know what it will write: with respect to the <dc:subject> tag, PL will include only C as a subject. Dublin Core says that the subject tag should contain “The topic of the resource.” My question could be revised to: what is the user’s intent in generating only a subject tag for C and not ones for A and B?

One might guess that the main purpose of the subject tag is for searches. If I check only C, then I am stating that C is the only term that should be searchable. This is how PL’s searches appear to work—an image tagged as above could only be located with a search for C and not by searches for A and B.

At this point, designating which keywords can be used for searching seems to me to be the most likely purpose of the <dc:subject> tag, inside or outside PL.

In reading through various posts on keywords in this forum, I see that some people believe that A, B, and C should be written as subjects, whether checked or not (and that some version of PL 5 used to work that way). If this is your belief, then I am particularly interested in your interpretation of what checking only C means—and why.

Joanna · January 22, 2024, 11:20pm

If you intend to record ‘C’ only in dc:subject, that indicates it is a standalone keyword. The “rules” state that all components of a hierarchy should be recorded so, if ‘C’ is part of ‘A > B > C’, then dc:subject should contain all three words.

For some reason, better known only to DxO, they have decided to make that behaviour optional, controlled by a switch in the metadata preferences.

As you have found, to only record ‘C’ in dc:subject makes it impossible to search for the full hierarchy. especially in other software that is properly compliant and is looking in the XMP sidecars.

freixas · January 23, 2024, 12:05am

Your answer is not clear. My question was not whether I intended to record C only, but what it means to have A > B > C with only the C having a checkmark. The software allows me to create this scenario; the question is: why would I want to?

Let me try asking it a different way: I have the scenario above and the software has written parts of the hierarchy to the dc:subject tag. How is that different from having placed a checkmark next to A and B.

Yes, I realize that the lr:hierarchicalSubject values will differ. So to keep at it, what value is there is generating hierarchy A|B|C vs. hierarchies A, A|B, and A|B|C in the lr:hierarchicalSubject other than placing a checkmark next to various keywords (which doesn’t seem particularly useful or interesting).

Where are these “rules”?

BHAYT · January 23, 2024, 12:20am

@freixas while what @Joanna says is correct it makes absolutely no difference to the searchability of keywords in DxPL. It makes a difference to the contents of the dc:subject when an hierarchical keyword is assigned to an image but that is an output only issue, output as in writing back to the image or into an export.

The following tests was with option 2 unset and option 4 unselected but it would also be true with option 2 set and option 4 unset.

I set A|B|C|D (or A>B>C>D) in the keywords of an image and did a search on D and this is what I got

So only D in the hierarchy has been selected (ticked).

All keywords in the hierarchy are recorded in the ‘Keywords’ Table so it is possible to search on A or B or C or D.

D is linked to C is linked to B is linked to A in the ‘Keywords’ Table, so it is possible for DxPL to recreate the keyword going from D to A, i.e. giving A|B|C|D or A>B>C>D.

But their is only one entry in ‘Itemskeywords’ Table, the entry corresponding to the item that was selected (ticked) i.e. D!

So a search on A, B and C will discover no images because they were not selected (ticked) and therefore have no entry that would enable a suitable image to be located.

Hence, the current design is “rubbish” the keywords that should be stored are those destined for the dc:subject field, with option 2 set that would be A, B, C and D but those are generated by DxPL when required for writing back to the image or writing to an exported file.

They do not exist as searchable entries in the database at all, or rather they do but are not linked to items, i.e. images.

PS:- The ‘ItemsKeywords’ enables DxPL to reconstruct the hierarchical keywords from its simple keyword components and if option 4 is set then it will be possible to search on A, B, C and D and will provide a full house of hierarchical keyword elements

So image 01.RW2 has A|B|C|D assigned and D selected and 02.RW2 has A|B|C|D assigned and A, B, C, D selected. both have option 2 OFF/unselected.

The ‘Keywords’ Table looks the same as before but the ‘ItemsKeywords’ table now has more entries linking to the ‘Items’ (images).

freixas · January 23, 2024, 12:37am

Brian, you are also missing the point of my question. Since I was specifically referring to the dc:subject tag, the internals of the Photolab database is irrelevant.

Yes, I know that within Photolab, a checkmark makes it possible to find an image using the checked keyword—but that wasn’t my question.

BHAYT · January 23, 2024, 12:45am

@freixas then @Joanna has answered your question and option 2 governs the contents of dc:subject

But you still miss the point, nowhere in the database are the contents of the dc:subject that are to be derived from the hierarchical keyword elements stored.

They are created as DxPL traverses and recreates the hierarchy for output using option 2 to determine what will be placed there so A|B|C|D with only D selected will give you the full set with option 2 set in the same way as A|B|C|D with all items selected because DxPL is taking those by “factoring” A|B|C|D which is how it stores them in the database.

freixas · January 23, 2024, 1:19am

@Joanna did not answer my question. Neither have you.

Joanna · January 23, 2024, 11:29am

Assuming your keyword assignment panel looks like this…

And your Preferences looks like this…

… then your DOP file will contain…

			Keywords = {
				{
					"A",
					"B",
					"C",
				},
			},

… and your XMP will contain…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>B</rdf:li>
               <rdf:li>C</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>A|B|C</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

In PL7 searching for ‘C’ produces a result…

Capture d’écran 2024-01-23 à 12.16.02

… but searching for ‘B’, which is present in both the DOP and XMP files, says there are no results…

Capture d’écran 2024-01-23 à 12.16.12

… and the same applies for ‘A’…

Capture d’écran 2024-01-23 à 12.16.19

… whereas using another app produces…

But such an app also allows to search for partial hierarchies as well…

This is because PL uses its own database instead of relying on the metadata saved in either the image file or an XMP sidecar.

If you select the full hierarchy in PL…

… then the PL search will also find ‘A’ and ‘B’.

Which is why I always recommend using the full hierarchy option in the metadata preferences and ensuring that the full hierarchy is ticked.

freixas · January 23, 2024, 3:25pm

Thanks, @Joanna, but I can see my question remains unclear.

Both PL and Adobe Bridge (and perhaps LR) support hierarchies in which you can omit the checkmark next to a parent keyword. Why?

In my original question, the scenario was A > B > C with only C checked. Given that software resources were devoted to implementing this feature, one would assume there was some value in being able to do this.

I understand that in PL, you will only find matches for a keyword in images in which the keyword’s checkmark is enabled. That may be an issue with PL, but it isn’t what I’m asking about.

Right now, the ability to unselect a parent keyword seems worse than useless. If I were designing a keywording system, I would omit this capability. Before I did that, though, I would want to understand what value it might have. I’ve tried to find out with web searches and by asking here, and have yet to learn the answer.

Stenis · January 23, 2024, 3:34pm

Maybee you also shall look into this:

Using the IPTC Subject Scene and Genre codes with your Controlled Vocabulary Keyword Catalog

Dont miss the last lines of boredom:

" Question: How do I get the IPTC Subject, Scene or Genre codes into the appropriate field? They currently appear in the Controlled Vocabulary Keyword Catalog which only allows placing the terms into the Keywords field?

Answer: Unfortunately the ability to directly place the terms from the IPTC Subject, Scene and Genre codes directly into their respective fields is something governed by the particular software you are using. The only application in which these sets of terms are separated out is iView Media Pro (now Expression Media). The Photo Mechanic Structured Keywords Panel, when accessed from the Image menu (Image >> Structured Keywords Panel) does allow you to write terms to either the Keyword field or the Caption field.

All the keywording features in the rest of the applications: Bridge, Lightroom, Breeze Browser, FotoStation, InView and METAmachine are only designed at present to place terms automatically into the keyword field. With these other applications the best option is to use their keyword catalog structure to locate the code number (or name) and then to copy and paste that term to the appropriate field. With some applications (such as Lightroom) it may be hard to tell when you have actually selected an IPTC Code. If this is a problem, I would recommend removing those three top level fields (IPTC Genre, IPTC Scene, IPTC Subject) to prevent the inadvertant addition of those terms into your Keywords field."

Also remember that the “Subject”-element of the Dublin core standard schema (13. Subject – “The topic of the resource”.) even is used in a lot of other implementations on the internet. It is not always used for “keywords”.

Dublin Core Metadata Element Set[edit]

The original DCMES Version 1.1 consists of 15 metadata elements, defined this way in the original specification:[6][14]

Contributor – “An entity responsible for making contributions to the resource”.
Coverage – “The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant”.
Creator – “An entity primarily responsible for making the resource”.
Date – “A point or period of time associated with an event in the lifecycle of the resource”.
Description – “An account of the resource”.
Format – “The file format, physical medium, or dimensions of the resource”.
Identifier – “An unambiguous reference to the resource within a given context”.
Language – “A language of the resource”.
Publisher – “An entity responsible for making the resource available”.
Relation – “A related resource”.
Rights – “Information about rights held in and over the resource”.
Source – “A related resource from which the described resource is derived”.
Subject – “The topic of the resource”.
Title – “A name given to the resource”.
Type – “The nature or genre of the resource”.

Also consider: The Dublin Core namespace elements in XMP is the very back bone of the XMP-metadata standard and is used in many automatic machine to machine communications.

When I read things like this I realize there still is quite a mess when it comes to how these standards are interpreted and implemented not the least in our RAW-converters. And if the designers of these softwares don´t know how to handle this, how shall we know?

I´m not surprised your conversation in this tread looks quite confused too

platypus · January 23, 2024, 4:04pm

If this is your question, this is my answer: genuine and personal

Whatever someone does in that case could be dismissed as “so what”, but reasons can differ and whatever they are, do they correspond to ours?

We might want to convey the complete hierarchical dependency of a keyword e.g. for the sake of not losing information that is available anyway, or we might just be content with adding “bear” without the “mammal” and “animal” parents and grandparents because everyone knows what a bear is.

One important reason for keywording is finding and if one does care for relations and structures, the whole path is useful. But again, needs vary per individual and e.g. Lightroom caters for variances by providing means to add/show/hide keywords of the respective positions with a separate dialog window, while PhotoLab spreads this over checkboxes in the keyword list tool an application preference settings.

Nevertheless, if one adds bits and pieces or the whole lot is a matter of personal choice imo.

freixas · January 23, 2024, 4:33pm

Congratulations! You actually addressed the question.

A clarification: Are you saying that someone might not want to search for “mammal” or “animal”, but would like to search for “bear”?

If your answer is yes, then PL’s search behavior is correct, and the value of the checkmark is to include keywords in searches. Unfortunately, it makes life more complicated for those who want to search for every keyword.

It is also important to export the data so as to maintain the behavior externally. The exact output to produce is an open question as there are so many other keywording programs.

If the answer is no, that these users would still like to search for all keywords, then the exclusion of things like “mammal” or “animal” is pointless as it would not affect anything (in this case. PL’s behavior would be a bug). Remove this “feature” and life becomes simpler.

freixas · January 23, 2024, 4:58pm

The Dublin Core comment for “subject” says: “Recommended practice is to refer to the subject with a URI. If this is not possible or feasible, a literal value that identifies the subject may be provided. Both should preferably refer to a subject in a controlled vocabulary.”

I like the idea of a controlled vocabulary. Some organizations require strictly enforced vocabularies. The typical photographer at home would get upset if they couldn’t manage their own keyword structures. I’ve worked out a system that could support vocabulary-driven keywording for both kinds of users. This is not useful as I’m in no position to drive any standards. Things will remain a mess.

While there is a Dublin Core definition for “subject” (which, in my opinion seems way too loose), there is no equivalent for “hierarchy” and no clear definition of the relationship of the two.

The current keywording systems also lack standards for things like synonyms (well, LR has them) and internationalization. We really need an ISO standard with a reference implementation for image hierarchical keywording. DxO is unlikely to drive a push for a new standard.

This is all an interesting aside from my original question. @platypus has been the only one so far to address it, although I am still dubious about the value of excluding parent keywords from an image.

Joanna · January 24, 2024, 12:17pm

That is a very good question. From my extensive research, this “breaks the rules” and, as I have mentioned, can cause some DAMs to not find images.

When I wrote my DAM software, I was careful to separate out keyword management from keyword use - something that PL doesn’t do and that can be confusing to use.

Here is my keyword management dialog…

As you can see, all possible keywords are listed in the right column, whether they belong to a hierarchy or not, whereas the left column shows any hierarchies. This is to allow them to be used as standalone keywords, or used at different levels in different hierarchies. Just drag a keyword from right to left to add to a parent.

Having created your “dictionary”, you can now choose from any keyword, in any of its contexts, and add them to the image…

Now, imagine that you apply one of each of those hierarchies to different images.

Now try to search for all images that contain Orange, no matter what the context.

In my software, or any other that allows ORing of search predicates, You get…

But at present, with PL, this is totally impossible. Choosing one of these contexts…

… makes it impossible to choose any other context, as no image contains both Orange and e.g. Couleur | Orange…

Now, the XMP gets written correctly, with all the keywords in the selected hierarchies written to the dc:subject tag in the XMP sidecar, which is what most DAMs use for searching, but PL doesn’t use the XMP for searching. Instead, it uses the PL database, with its own search algorithm and no way of ORing predicates.

So, you can write keyword images successfully in PL, but you can’t use PL to do complex searches.

According to Dublin Core and other standards, all keywords, in all hierarchies, must be written to dc:subject but, although the lr:hierarchicalSubject tag is used by most software, including PL, to store hierarchical contexts, it is not part of Dublin Core.

In conclusion, as long as you don’t untick parent keywords, PL seems to do a reasonable job of writing keywords, it’s just that, whilst other software can search the XMP files written by PL, the PL search is totally inadequate as it only uses the database, which, if you don’t include all parents, is totally inadequate.

So-

dc:subject must contain all keywords, hierarchical or not - it is used for searching
lr:hierarchicalSubject should contain the complete path to any leaf hierarchical keyword mentioned - it is used for transmission of hierarchical contexts between different DAMs

PL’s search does not rely on XMP metadata for searching, which is where things can get a bit screwy.

Yes, this would be a nice feature but it has to be implemented in a keyword management system as it is mainly used by the search mechanism and such synonyms do not usually appear in file metadata.

Now, have I answered your question yet?

freixas · January 24, 2024, 2:33pm

Actually, no, but you did address it by saying that you don’t know the answer., and that, like me, you see little value in this feature. I may need to ask this question on the Adobe forum.

Could you point me to the exact place where this is stated? I have yet to find it. @Stenis pointed at the same definition of the Dublin Core subject tag that I found. The dc:subject is just “the topic of the resource” and, according to its description, its preferred value is a URI.

Given that the Dublin Core makes no references to hierarchies, it is difficult to imagine that it would make any claim about which parts of a hierarchy should be written to the dc:subject tag. If the intended use of dc:subject is for searching, and unchecking the chechmark next to a parent word means that you want to exclude it from searches, then it seems entirely reasonable to not include that hierarchical term in the dc:subject tag. PL’s behavior would seem to comply with the Dublin Core while including all terms would not.

You also mention “other standards”. What other standards are you referring to?

freixas · January 24, 2024, 3:53pm

This is a follow-up to my response to @Joanna, but intended for everyone following this thread.

My personal guess about the purpose of unchecked parent keywords matches what @platypus suggested: the checkmark indicates whether we want to use a particular term in the hierarchy for searching.

The easiest way to picture this is with hash tags. If I have an image with keywords People > John Smith and I wanted to create hash tags for the image, I might want to include #John Smith but not #People.

Essentially, I am saying what terms I would like other people to use to find my images.

The correct way to express this using the dc:subject tag is to only include the hierarchy terms that I indicate should be included. So PL is somewhat behaving correctly.

I say “somewhat” because, as the creator of the keywords and the hierarchy, I might want to search for every term so that I can maintain the hierarchy and find problems (as suggested to me by @BHAYT in private conversations). Perhaps within PL, we should be able to search for every hierarchy term or we should have a choice of either searching all keywords or searching only the terms designated as subjects.

I also say “somewhat” because PL provides an option to ignore the checkmarks with respect to what gets written to the dc:subject tag. External searches could search images by any term. But this option does not grant the same privilege to PL users!

When I initially posted my question, I was wondering if someone would say, yes, I use this feature, it’s important to me, and here’s why. There may be people using it, but they have yet to appear on this thread.

My guess about the purpose of the feature clashes with @Joanna’s claim that standards requires that the dc:subject tag contain all the keywords in a hierarchy, so I’d like to see those standards for myself. If the standards do indeed make this requirement, then there is no value in the feature (at least, for a standards-compliant program).

platypus · January 24, 2024, 4:02pm

@freixas , you ask a lot of “why” in an area that is ruled by relaxed interpretation of more or less rigid standards or recommendations, combined with user requirements (or lack thereof) and providers who cater (or not) for such requirements. Asking this question is completely legitimate, but expecting the one true answer is most probably philosophical rather than practical…

Panta rhei, as the old folks said.

If you know what you want (and that is a good enough reason) you can try to find how to get it with the things you use. Nice if it works - but for how long? If it doesn’t, then where is the problem? In the things or in the want? The answer might just be 42

Stenis · January 24, 2024, 4:32pm

I like that interface! It seems very flexible.

platypus · January 24, 2024, 4:39pm

Yes, @Joanna’s app does a good job with this interface. It also helps to build a structure with a lot less scrolling compared to the usual one-column keyword list. Noone has cared to steal that idea yet…but that’s probably due to the two figure max. user base , I suppose

Stenis · January 24, 2024, 4:46pm

Now I quote my own post above but I think this is an important thing to have in mind.

" (Subject) is used in a lot of other implementations on the internet. It is not always used for “keywords”.

That really disturbs me.
Keywording is such a basic thing so it really makes me wondering why that hasn´t been sorted properly yet by IPTC.org or Adobe (in XMP).