PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords

Can I just chip in with a test I have just made?

Delete PL5 database and any DOP and XMP files from test folder.

Open PL5 and select a RAW image with no keywords assigned.

Type A > A > A into the keywords field…

Capture d’écran 2022-06-21 à 17.14.15

Press Enter to accept…

Capture d’écran 2022-06-21 à 17.14.27

Note that all levels of the hierarchy are assigned.

Now enter A > B > C into the keywords field…

Capture d’écran 2022-06-21 à 17.16.28

Press Enter to accept…

Capture d’écran 2022-06-21 à 17.16.39

Note that only C gets assigned, in addition to the three levels of A. The middle B is ignored, whereas, with the A > A > A, the middle A was assigned.

Write the metadata to the file…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>C</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>A|A</rdf:li>
               <rdf:li>A|A|A</rdf:li>
               <rdf:li>A|B|C</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

My question, at this point is, why did PL5 assign all three levels when I type A > A > A, but not when I typed A > B > C?


Using Adobe Bridge, if I add the same values, I get an XMP file that looks like this…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>C</rdf:li>
    </rdf:Bag>
   </dc:subject>
   …
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>A|A|A</rdf:li>
     <rdf:li>A|B|C</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

Using Capture One 12, if I type the same hierarchies into the keywords field, I get…

Capture d’écran 2022-06-21 à 17.50.43

… in the UI and…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>B</rdf:li>
     <rdf:li>C</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>A</rdf:li>
     <rdf:li>A|A</rdf:li>
     <rdf:li>A|A|A</rdf:li>
     <rdf:li>A|B</rdf:li>
     <rdf:li>A|B|C</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

So, Capture One seems to be more compliant with MWG than both PL5 and Adobe Bridge. Although the UI doesn’t show me which A is the parent of which other A without having to invoke the tooltip. on each token.

Reading this back into PL5, I then get…

Capture d’écran 2022-06-21 à 17.53.41

But in PL5.1.3b55, the dc:subject used to confirm to the MWG guidance in doing the same as Capture One.

This is just the kind of thing that is getting so frustrating for users who want to use an external DAM in addition to PL5 and is why several of us have ended up saying Don’t do it to anyone thinking of so doing.


Now, in a recent discussion with @platypus we seem to be in agreement that, not only does PL5 rely on the lr:hierarchicalSubject tag, or more likely on its own internal database, to power the search, other apps may be doing the same and my guess is that DxO are following the crowd, rather than “doing the right thing”

If, as we suspect, some apps (Adobe included) are relying on their own database/catalogue instead of the actual XMP for searching and refusing to follow the MWG Guidance they help write, this would explain the missing keyword B in the dc:subject tag.

Now, this is all well and good, as long as users only play with software written by the “gang”, which seems to be led by Adobe.

But, if a user decides to use any other software, they will not be able to search for intermediate level keywords, since they’re not mentioned in the dc:subject tag, as recommended by the MWG.

It’s not exactly rocket science to write the dc:subject tag correctly, as Capture One proves.

This is what I get with XnViewMp if I want parent keywords to be used for ranking only:
image

And what if I want parent keywords to be added to my keywords:
image

Accessing the database should be much quicker than searching umpty files. That’s why a) search is based on the database and b) metadata changes should be taken into account, possiblywithout having to restart the app…

Not on Windows 10!

I get the following

Adding A>A>A:-

Please note that only the last “A” is selected

’Items’:-

’Keywords’:-

2022-06-21_232929_

ItemsKeywords:-

2022-06-21_233044_

Adding A>B>C:-

’Keywords’:-

2022-06-21_234110_

ItemsKeywords:-

2022-06-21_234131_

The xmp looks like this because not all the boxes have been checked, i.e assigned/selected. The assignment process (selection) drives the contents of the ‘ItemsKeywords’ structure which determines the Keywords layout for display, the contents of the metadata in the sidecar (or embedded) and the eligibility to be found in a search and Win 10 only “assigns” the “leaf” keyword of an hierarchy not all the keywords as seems to be the case on the Mac!?

         <dc:subject>
            <rdf:Bag>
               <rdf:li>A</rdf:li>
               <rdf:li>C</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <dc:format>image/x-panasonic-rw2</dc:format>
         <exif:DateTimeOriginal>2022-05-30T16:33:30.294+00:00</exif:DateTimeOriginal>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>A|A|A</rdf:li>
               <rdf:li>A|B|C</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>`Preformatted text`

For me to get something to compare I would need to assign the intermediate levels myself, however, we are not comparing like with like and why PL5 left “B” unassigned on the Mac I do not know.

I will try to find time over the next few days to look at @platypus’s tests but I have actually got a lot to do at the moment and it is clear that the two products are not the same with respect to keyword “assignments”, so we are in danger of trying to compare fruits that both happen to be round but there the similarity ends!

However, please continue to use this topic to investigate what is happening on the MAC and I will replicate as much as possible on Win 10 (when I have the time). While we wait for DxO to enlighten us all @sgospodarenko.

…assignment is one thing, putting keywords into XMP is another.

DPL’s way to express keywords in XMP has changed between versions 5.1 and 5.2. @Joanna and I have checked that yesterday and found that the Mac version 5.1.3 build 55 is the one that closely corresponds to MWG guidelines.

I appreciate Capture On’s sensible approach to add a hierarchy only from top down, which means that, from A>B>C>D, I can’t add a lonely D, which again prevents ambiguities, e.g. in @Joanna’s “orange” example.

1 Like

@platypus I have been "moaning about the change between Win 10 PL5.1.4 which I still have installed on my Test machine and PL5.2.?, I believe that it was the first PL5.2 release where it was mentioned by @Marie that hopefully things were now improved here PL5 completely messes up my Hierarchical Keywords when exporting - #19 by Marie and PL5 completely messes up my Hierarchical Keywords when exporting - #23 by Marie plus XMP-files gets F*cked up due hierachical mismanagement in dual management - #26 by Marie and XMP-files gets F*cked up due hierachical mismanagement in dual management - #28 by Marie.

When I tested the product I immediately commented on the ridiculous new ‘dc’ keyword(s) which I considered both wrong and wrongly executed, i.e. changing the product with no option to use the original feature which I consider to be better than the new one!

However, because the Win 10 PL5 default is to only assign (select) the “leaf” keyword I have never used the other checkboxes and discovered that will assign all the combinations that I had only discovered with Capture One up to that time! Those assignments are controlling the keyword combinations included in the display but also included in the xmp until DxO decided to “block” all but the “leaf” keyword in the ‘dc’ data!

I had been “saving” this for a new topic but now is as good a time as any @sgospodarenko, @Marie, @Musashi to publish part of my updated spreadsheet, the additional sheets then take the outputs from the various programs tested and pushes then through PL5.1.4 and PL5.3.0, this needs checking.

What is missing is a row for PL5 when all combinations are assigned but to do that in Windows is a real “pain” because it would need to be done for every image where the “full” set is required, hence, my requests for options @Musashi to

  1. Select 5.1.4 methodology
  2. Select 5.2.0 methodology
  3. Select all elements to be assigned automatically because we don’t get that in the Win 10 version!

Taking the following 4 scenario’s I entered the data into a number of packages to see what xmp data I got!?

  1. “animal”, “mammal”, “bear”
  2. “animal|mammal|bear” or “animal>mammal>bear” or “bear<mammal<animal” whichever you prefer!?
  3. “animals|mammals|bear|black bear” or your preferred syntax variant
  4. “animal”, “mammal”, “bear”, “animal|mammal|bear” or the alternative syntax variant of the hierarchical keyword

The first page of the spreadsheet attached below is the result from various packages when I enter each keyword with the appropriate delimiter (, or ; etc) into the package. Because the packages have preference options that can change the structure of the keywords that the package then places in the image metadata I am trying to include the options as well?

Capture One makes no direct changes to any image file always creating an xmp sidecar file regardless of the image type (I believe) so the only way of externalising keywords is via the export option!

I feel that the biggest mistake (by DxO) was the lack of communication about the DOP usage change, closely followed by a lack of realisation about the position that PL5 takes in the user’s work flow where users simply do not want their (DAM) metadata changed in any way, certainly not in the image itself but also not in the export from PL5, hence the proposal of a “DAM sandwich” from some!

PL5 is not really worse than most of the other packages which doesn’t make it right just not as wrong as it is being painted!

What is the “perfect” metadata configuration for my little collection?

I am particularly concerned about how many (if any) simple keywords should be in the ‘hr’ but with my “horror” combination (4) if there are no simple keys in the ‘hr’ field you wind up with the same combination for 2 and 3!

Copy of meta data setting _07-01.8 (first sheet only).zip (3.6 KB)

I also believe that there is a bug in the way that PL5 handles the storage of my “horror” combination internally (more tests + a new bug report if appropriate), which was why I started further investigation prompted by a comment about search issues with PL5 by @Joanna.

PS:-

The IMatch options:-

The first selected (not a good choice), versus both selected versus neither selected! Please note the similarity between IMatch options and the PL5.1.4 and PL5.2.0 alternatives!?

And this is something I have been trying to put across since before PL5 was released.

I really don’t care what DxO do in their database (since I regularly scrap it). What I do care about is compatibility with other apps.

XMP is a “universal” means of communicating metadata between apps, not just for use by a single app. In this regard, it is important to that PL writes XMP in a way that is the most compatible with everything if at all possible. Which is why the MWG Guidance was drawn up.

This paragraph is absolutely fundamental to the correct transmission of compatible metadata.

Note the phrase…

MUST write the XMP dc:subject property to store the individual keywords

The problem is that it would seem that some software authors have read this part of the paragraph and are duly storing leaf keywords selected by a user but, at the same time, ignoring any antecedents because such things are managed in their database and it may be seen as unnecessary duplication to also write them to the XMP.

However, the second part of the paragraph goes on to say…

Hierarchical path elements MUST be flattened, which means that each hierarchy node needs to be stored as a separate keyword entry to XMP dc:subject

So, whether the hierarchy contains a single entry for something like…

A|B|C|D

… or the more complete…

A
A|B
A|B|C
A|B|C|D

… the MWG guidance is absolutely clear that every keyword mentioned in lr:hierarchicalSubect MUST also be mentioned in dc:subject.

As @platypus says, this used to be the case in PL5.1.3 - something that made PL perfectly compatible with all other software but, for some unknown reason, this was changed in subsequent versions - something that has now rendered more recent versions to be incompatible with several apps.

The MWG Guidance document includes a very clear example of how a user might not select a complete hierarchy…

Note how Animals has not been selected, even though it is the parent of Mammals. This is analogous to the hierarchy UI in PL5.

But then the document goes on to clearly state that the correct way of writing this to XMP should be to include Animals in the dc:subject tag.

Capture d’écran 2022-06-22 à 09.34.15

Note the comment in green…

flat keyword list for interoperability

I really don’t know how clear this has to be to be seen as important. Certainly even DxO thought it important enough to adopt this behaviour in PL5.1.3 - only to revert to a breach of this guidance in subsequent versions.

As @platypus says, Capture One manages to do this with no problem at all - thus preventing all sorts of problems like restricted search in other software and ambiguities in the PL5 UI.

@Musashi & @sgospodarenko can you please point me to the documentation with respect to the reversion to the “old” PL4 DOP handling method where presenting a PL5 DOP with an image with embedded metadata will always take the DOP metadata with AS(OFF)! When it was PL4 the only blocking of image metadata was ‘Rating’ and ‘Rotation’ now it is essentially all the metadata will be blocked.

Users want to be able to present the DOP for their preserved edits but now we have “old” DOPs potentially getting in the way of newer metadata all because DxO didn’t want a new option. PL5.3.0 doesn’t work the same as PL5.2 which doesn’t work the same as PL5.1.4 (Win 10 numbering) and there is no way of going back and preserving the rest of the later features/bug fixes that may be useful, this is truly …

Hi,

I need to sync with the team before answering properly to this thread.
I’ll get back to you.

Best regards

1 Like

Just did a little test with Capture One on macOS Monterey on M1 MacBook Air 2020.

While C+ still writes keywords in what seems to be MWG compliant ways, C1 also added a information popup that appears, when I try to delete an intermediate level keyword:

The more I look into this topic in C1, the better I like what C1 is doing.

I would tend to agree - although I think I’ll stick with my own app :sunglasses: :roll_eyes: :wink:

Oh, please do.We’d love to be able to sort this out rather than spending days working out what it will take to put PL at the top of the metadata stack :smiling_face_with_three_hearts:

@joanna the “reason” was the reaction to the two topics I highlighted in my post above. I think that they took one more seriously than the other and “matched” an IMatch option.

What I consider DxO failed to understand was that no amount of “fiddling” would solve the “problem” when the solution was to find a way to leave users metadata absolutely, utterly, irrevocably, completely intact @Musashi, @Marie, @sgospodarenko!

Whether that metadata met any guidelines or not it was put there by their favourite package/DAM etc. and that was what was in their images and that was what they wanted carried forward BUT what I would conjecture is that you and @platypus and I (and other users) want the ability to optionally specify that we get the “best”, most accurate, most compliant keywords possible or, optionally, very close (without all the intermediate hierarchies)!

But that leaves the issue of what simple keywords should find their way (as if the keywords have any control) into the ‘hr’ fields.

In a post a while ago @joanna we had a discussion about the nature of the PL5 database and the structures ‘Keywords’ and ‘ItemsKeywords’ and I wrongly stated that because ‘Keywords’ contained all the keywords in a “flattened” state it should be easy for a search to work only to discover more recently that the “discovery” was “gated” by the ‘ItemsKeywords’ contents (or “missing” contents) which is determined by “assigned” keywords.

I also concluded that DxPL simply took every keyword in that structure to be a candidate for the ‘hr’ data, whereas the preferred strategy is that it should be a (potential) candidate for the ‘dc’ fields and only for the ‘hr’ fields when it is the top item in an hierarchy.

The reason that ‘ItemsKeywords’ is so important is because it is the link (the only link) between the ‘Items’ structure (holding the image) and the ‘Keywords’ structure which holds the various combinations of keywords that have been entered into PL5 or detected in the incoming image metadata as a series of “flattened” keywords.

It actually provides a link from ‘Items’ to ‘Keywords’ (needed for displaying and for creating the metadata) and ‘Keywords’ to ‘Items’ (and to ‘Sources’ and thence to ‘Folders’).

Nowhere is the origin of the keywords maintained/retained, e.g.

  1. ‘dc’ or ‘hr’ or both
  2. Image or DxPL

so reconstructing the metadata for output back to the image or into an export is solely based upon the contents of the ‘ItemsKeywords’ and the “algorithm” that DxPL “chooses” to apply at any given time!?

Item 1 has two associated keyword at 3 & 5;

  1. 3 points to A which points to 2 (A) which points to 1 (A) , giving A<A<A (or A|A|A or A>A>A)
  2. 5 points to C which points to 4 (B) which points to 1 (A), giving C<B<A (or A|B|C or A>B>C)

Once keywords are in DxPL ALL keywords are either simple or simple but part of a tree structure from the leaf upwards not top downwards.

However, once the keywords are deconstructed and stored in the ‘Keywords’ structure as simple keywords with pointers to re-establish hierarchical keywords from the bottom up (leaf back to tree) all keywords are the same except that there will be entries for the simple ‘dc’ keywords (if any existed) from the image metadata!. These entries are not actually present in the metadata entered into DxPL unless explicitly entered by the user or “assigned” from the list! .

I wrote but never posted (I think never posted) that the “algorithm” for completing the metadata was

  1. For every keyword in the ‘ItemsKeywords’, potentially only 1 for Win 10, follow the pointer to the keyword in the ‘Keywords’ structure.

  2. If that keyword is not part of an hierarchical keyword chain then output the keyword to the ‘dc’ fields and this is where I feel that PL5 (and the other products) are going wrong because they output a simple keyword to both the ‘dc’ fields and the ‘hr’ fields.

  3. If the keyword is part of an hierarchical keyword chain then output the keyword to the ‘dc’ field (this action is now modified by PL5.2.0) and begin to reconstruct the hierarchy. When the hierarchical keyword has been reconstructed output to the ‘hr’ fields + add the topmost keyword of the hierarchy.

  4. Repeat for every keyword in the ‘Keywords’ structure for the given image. If this was executed as written it would automatically create the structure that Capture One creates (or very close).

  5. At this point there should be one ‘dc’ keyword for every keyword in the hierarchy and both the ‘dc’ entries and the ‘hr’ entries should then be de-duplicated (I believe).

  6. Output the “perfect” set of keywords but that will be completely ignoring the exact combination that came in from the image!!!

At a cost unless bought as part of a package from some U.K. camera dealers (£120 I believe), i.e. £209 or £107 for first year(?) or £24.00 monthly.

Arguably that’s what I have been trying to get DxO to do since I started Beta Testing PL5 well over a year ago (when I essentially knew nothing about keyword handling but things didn’t look “right” even then) but hey … head + brick wall!

This is point on which I both agree and disagree at the same time :wink:

The first part is an absolute, as it confirms what the MWG guidance states, in order to maintain interoperability.

The second part is something I have argued over with myself (and others) many times. In principal, only one entry for each hierarchy needs to exist but…

It can be argued that each keyword in the “path” can be regarded as requiring full definition; thus the idea of having one definition per node in the hierarchy…

A
A|B
A|B|C
A|B|C|D

I cannot definitively state this as a requirement but I would argue it is more “complete” in that it implicitly reinforces the idea that all leaf nodes, at all levels, are mentioned in the dc:subject tag.

Thinking about Bryan’s (@BHAYT) comment about adding an “applied” column to the link table, brings me to highlight the MWG way of writing the hierarchical structure using the -xmp-mwg-kw:hierarchicalkeywords tag…

<mwg-kw:Keywords rdf:parseType=“Resource”>
  <mwg-kw:Hierarchy>
    <rdf:Bag>
      <rdf:li rdf:parseType=“Resource”>
        <mwg-kw:Keyword>Animals</mwg-kw:Keyword>
        <mwg-kw:Applied>False</mwg-kw:Applied>**
        <mwg-kw:Children>
          <rdf:Bag>
            <rdf:li rdf:parseType=“Resource”>
              <mwg-kw:Keyword>Mammals</mwg-kw:Keyword>
              <mwg-kw:Applied>True</mwg-kw:Applied>
              <mwg-kw:Children>
                <rdf:Bag>
                  …

Note that each keyword is defined as a typical tree node, with three nodes, one of which holds any children. So, we have…

  • Keyword - the keyword itself
  • Applied - whether the keyword was actually selected or whether, like the Animals node, it is only there for reference and completeness, but not actually a keyword that the user was interested in recording
  • Children - speaks for itself, remembering that each child also contains an Applied tag.

Now, Bryan, does that ring any bells with your ideas for the database?

Just had a lightbulb moment.

Using PL5, I referenced the hierarchy Animal > Mammal > Bear > Black Bear but I only selected Animal and Black Bear

Capture d’écran 2022-06-24 à 11.44.36

Writing this to the XMP gives me…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Animal</rdf:li>
               <rdf:li>Black Bear</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Animal</rdf:li>
               <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

… with some of the subject keywords missing. It struck me that DxO might be creating lr:hierarchicalSubject entries based on the selected keywords in dc:subject rather than referencing all keywords found in any hierarchies mentioned in the hierarchies, which is what the MWG guidance implies.

It is a subtle distinction but, at the moment, it makes sense, at least, to me.

I remember someone from DxO explaining that they based what they wrote on the hierarchy, during the beta for PL5. This seemed wrong at the time but, looking at it from this point of view, it makes sense and it complies with MWG

Comments? Arguments?

@joanna I have just parked a response that I was writing about a revised design but to answer your query DxPL doesn’t have a clue what is ‘dc’ or ‘hr’ once it is in ‘Keywords’ or ‘ItemsKeywords’!

When it is entered into DxPL is it just a string of simple or hierarchical keywords with no notion of where those fields will wind up. When it reads keywords from the image it throws away any notion of where those keywords came from (part of the proposed (by me) changes I am looking at).

So in both cases it simply starts with a list of keywords from which it “arbitrarily” creates whatever it creates. Hence the excess of ‘dc’ only keywords that wind up in ‘hr’ fields (not an issue exclusive to DxPL).

The assignments do determine what is considered for inclusion but not where they might reside. Hence, my writing out a set of rules because that is all DxPL can use regardless of the heritage of those keywords (entered into DxPL or obtained from the image)!!

Either DxPL requires the preservation of what was where on input (to use or not to use as it chooses) and/or a better algorithm, which is the only thing possible with the current database design!!

I will try to get back to my response later but was trying to help with another topic in my “early” morning slot, now it is back to DIY.

Take Care

EDIT:-

  1. Cleared db and discovered directory of 4 images with my standard scenario
  2. Photo 2 contains “animal|mammal|bear” with just “bear” selected on Win 10 by default but no other keywords"
  3. Selected “mammal” and PL5 completed all selections
  4. The structures look like this

2022-06-24_135233_

2022-06-24_135210_annotated

and keywords are constructed by going from ‘Items’ to ‘ItemsKeywords’ and then to ‘Keywords’.

But nowhere is there any indication in the database (no fields to hold the indicators) of ‘dc’ versus ‘hr’ at this point no such distinction is made or maintained!

The result (with all items selected) is

<dc:subject>
            <rdf:Bag>
               <rdf:li>animal</rdf:li>
               <rdf:li>bear</rdf:li>
               <rdf:li>mammal</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <crd:CameraProfile>Camera Standard</crd:CameraProfile>
         <crd:LookName/>
         <exifEX:LensModel>OLYMPUS M.12-200mm F3.5-6.3</exifEX:LensModel>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>animal</rdf:li>
               <rdf:li>animal|mammal</rdf:li>
               <rdf:li>animal|mammal|bear</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

Hello Joanna

For example, somebody selected “Bear” and “Black Bear”.
The user expects to see just that 2 keywords.
“Animal” in that case is some sort of category that helps to find the required keyword.

However, there will be “Animal”, “Mammal”, “Bear” and “Black Bear” in a software that uses dc:subject only (or that merges subject and hierarchicalSubject). And there is no possibility to get rid of “Animal” and “Mammal”. They can’t be unchecked in dc:subject

@Joanna it looks as if they have reverted to PL5.1.4 ‘dc’ keywords on release 5.3.1!?

But why?

Bear and Black Bear could just as well be standalone keywords as hierarchical ones.

Not exactly. The complete “definition” of Bear, in the context of the hierarchy we are talking about here, is actually Animal|Mammal|Bear and the complete definition of Black Bear is Animal|Mammal|Bear|BlackBear.

I have previously mentioned there is also another hierarchy for Beer > Craft Beer > Black Bear.

How does the user know which hierarchy Black Bear belongs to if they can only see Black Bear?

Or, if we have another hierarchy for Material > Fur > Bear > Black Bear?


Moving back to my Orange hierarchies - here is a sequence of screenshots of entering five different hierarchical cases for that keyword…

Capture d’écran 2022-06-24 à 16.48.52

Capture d’écran 2022-06-24 à 16.49.09

Capture d’écran 2022-06-24 à 16.49.31

Capture d’écran 2022-06-24 à 16.49.52

Now, the keywords token field contains…

Capture d’écran 2022-06-24 à 16.48.24

… but the XMP that gets written by my app is…

  <dc:subject>
   <rdf:Bag>
    <rdf:li>Couleur</rdf:li>
    <rdf:li>Orange</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Satsuma</rdf:li>
    <rdf:li>Entreprise</rdf:li>
    <rdf:li>Télécommunications</rdf:li>
    <rdf:li>Matériel</rdf:li>
   </rdf:Bag>
  </dc:subject>
  …
  <lr:hierarchicalSubject>
   <rdf:Bag>
    <rdf:li>Couleur</rdf:li>
    <rdf:li>Entreprise</rdf:li>
    <rdf:li>Fruit</rdf:li>
    <rdf:li>Matériel</rdf:li>
    <rdf:li>Couleur|Orange</rdf:li>
    <rdf:li>Entreprise|Télécommunications</rdf:li>
    <rdf:li>Fruit|Orange</rdf:li>
    <rdf:li>Matériel|Télécommunications</rdf:li>
    <rdf:li>Entreprise|Télécommunications|Orange</rdf:li>
    <rdf:li>Fruit|Orange|Satsuma</rdf:li>
    <rdf:li>Matériel|Télécommunications|Orange</rdf:li>
   </rdf:Bag>
  </lr:hierarchicalSubject>

Every keyword is recorded perfectly in its appropriate hierarchy and every hierarchical keyword is mentioned in the subject tag, as per MWG guidance.

And yet, the UI in my app has been carefully designed to show the user each keyword only once, along with any keywords in each of their contexts, again, only once.

At present, PL shows a very confusing display should it come across the XMP that my app writes…

Capture d’écran 2022-06-24 à 17.01.18

… where keywords in the field are apparently repeated and the only way to determine which is which is to hover over a token to invoke a tooltip.

In actual fact, if the hierarchy view is visible, there should be no need to duplicate the tokens in the token field.

Certainly but they can be unchecked in the hierarchy view, which is what is leading to confusion.


Not that I can see…

Capture d’écran 2022-06-24 à 17.08.06

… produces this for XMP…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Orange</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Couleur|Orange</rdf:li>
               <rdf:li>Entreprise|Télécommunications|Orange</rdf:li>
               <rdf:li>Fruit|Orange</rdf:li>
               <rdf:li>Matériel|Télécommunications|Orange</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

Whereas, PL5.1.3 (perfectly correctly) writes…

         <dc:subject>
            <rdf:Bag>
               <rdf:li>Couleur</rdf:li>
               <rdf:li>Entreprise</rdf:li>
               <rdf:li>Fruit</rdf:li>
               <rdf:li>Matériel</rdf:li>
               <rdf:li>Orange</rdf:li>
               <rdf:li>Télécommunications</rdf:li>
            </rdf:Bag>
         </dc:subject>
         …
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>Couleur|Orange</rdf:li>
               <rdf:li>Entreprise|Télécommunications|Orange</rdf:li>
               <rdf:li>Fruit|Orange</rdf:li>
               <rdf:li>Matériel|Télécommunications|Orange</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>

@Joanna sorry I forgot the essence of this topic, namely, selection or assignment I got the extra keys because I had selected/assigned “mammal” which automatically selected/assigned “animal” and as a consequence the more “rounded” hierarchical keys were created and the extra ‘dc’ keys along with them.

Removing the selection yielded

<dc:subject>
            <rdf:Bag>
               <rdf:li>bear</rdf:li>
            </rdf:Bag>
         </dc:subject>
         <crd:CameraProfile>Camera Standard</crd:CameraProfile>
         <crd:LookName/>
         <exifEX:LensModel>OLYMPUS M.12-200mm F3.5-6.3</exifEX:LensModel>
         <lr:hierarchicalSubject>
            <rdf:Bag>
               <rdf:li>animal|mammal|bear</rdf:li>
            </rdf:Bag>
         </lr:hierarchicalSubject>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

A very slimline, non-compliant set of keywords!

@amikhnev I have real problems with this statement ““Animal” in that case is some sort of category that helps to find the required keyword.” I “hate” the fact that Photo Supreme appears to be “obsessed” with the notion of Category and inserts “Miscellaneous” as the first keyword as a “Category”.

Arguably every keyword in an hierarchy is a (sub-)category until you arrive at the final keyword and that may actually be another (sub-)category where you have just stopped with the granularity of the classification!

While I understand what you say and that could be the reason that the current search works the way that it works when an hierarchical keyword is added. But regardless of the selection/assignments made by the user an hierarchical key exists and will be placed into the ‘hr’ fields with all the keywords in place, including the keywords not marked as selected (assigned) by the user.

This selection will influence the presentation of the keywords in the UI including showing the selections already made, and the contents of the metadata that will be written back to the image (if “allowed”/forced by the user) and the metadata that will be placed in the exported file and, currently, the results of any search, the reason for this topic.

I am no metadata expert but if the keyword will inevitably find its way into the metadata because it has been explicitly assigned or because it is part of a larger hierarchical keyword and therefore implicitly assigned, I personally believe it should be found in a search otherwise the search is being selective and distorting (not) the truth!

However, the default for Win 10 (at least on my machine) is that only “black bear” will be assigned. If I then assign “bear” DxPL will the automatically assign all keywords in the hierarchy. In addition, if all have been de-selected then assigning “black bear” will actually assign all in the hierarchy automatically (which I believe if the default for the Mac version of the product)!?

So I tested the following by selecting and deselecting keywords (but my first attempt at inserting a table was crushed by the post software being used)!

animals X X X X
mammals X X - - X
bear X X X X - -
black bear X X X X - X X X

which should look like

image

and I exported for each option in the above table and the results are as follows for PL5.3.1:-

The same combinations for PL5.1.4 results in the following;

The problem is assignment will not only condition the response to a search (wrongly in my opinion because it should include implicit assignment as well as explicit assignment) but also the metadata (keywords) output to exported JPGs and available to be written back to the image metadata etc…

If the design of the ‘ItemsKeywords’ structure was ‘ItemId’, ‘KeywordId’ and ‘Assigned’ @Musashi instead of just ‘ItemId’ and ‘Keywordid’ then every ‘Keyword’ for an image (which are stored in the ‘Items’ table in the database) could then be entered into the ‘ItemsKeywords’ database table when they immediately become candidates for searching.

So we would go from 1 entry in the default case on Win 10 (for “black bear”) to 4 entries, one for each of the keywords with each of the entries for “animals”, “mammals” and “bear” having a ‘N’ or ‘0’ (or whatever) in the ‘Assigned’ field and the entry for “black bear” would have a ‘Y’ or ‘1’ (or whatever) in the assigned field.

The UI and the metadata insertion code would now use the ‘Assigned’ field rather than the physical presence of an entry in the ‘ItemsKeywords’ table to do their respective tasks.

But now a search would retrieve all the keywords associated with the image, both implicitly and explicitly assigned. If users wanted to retain the existing way of working there could be (at least) two ways of handling that:-

  1. In the search return all keywords in the count field (and pointers to the images) and add an additional entry for those explicitly assigned (selected) in the line below.

  2. Add an additional selection box in the search to return only assigned keywords.

But surely that is the job of any software that subsequently encounters the image, now or in the future.

Limiting the potential for future searches is “short-sighted”, it is the job of any software that encounters that image to offer the ability to allow you to search on “animals” (broad though that category may be) but give you the option to never put such a search into the software because that item is “merely” a category with way to many entries but that is where AND (and OR!?) searches come into play.

“Tailoring” keywords is a “slippery slope”, better to have too much data and software that allows you to navigate that, than to throw data away (once it has been removed it can never easily be replaced automatically - not entirely true because software with dictionaries etc. can find the missing data and restore it).

The problem with PL5, in my opinion, is that it currently restricts the search potential by virtue of the explicit assignment! There should be no such restriction, other than my “desire” to enter “animals” into a PL5 search or not, for both explicitly (currently available) and implicitly (currently missing) assigned keywords @sgospodarenko !

1 Like