Win 10 PL5.3.1 - Use Keyword Format Templates instead of just reverting to the pre-PL5.2.0 keyword format

BHAYT · July 21, 2022, 11:06am

This is a long post but hopefully a worthwhile one! It seeks to address the issue of DxPL versus other software with respect to keyword metadata formats !?

A lot of work has gone into this post and into the “research” behind it and, hopefully, it will help DxO provide users with a more compatible product with respect to formatting keywords @Musashi.

Summary:-

The change to the keyword handling made in release PL5.2.0 should be treated as a “bug” @sgospodarenko. The change was not announced prior to releasing that version, it effectively took the keyword handling of DxPL further away from the standards and from what had been the DxO standard from at least PL3, according to my testing, and the change was not optional, i.e. users were “forced” to accept it!

or using “expressed” using the proposed Keyword Format Template table

In the course of discussions in the topic PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords @joanna, others and I discussed what we felt was wrong with handling of keywords in PL5.

After a lengthy discussion, which brought these issues to the attention of DxO, the following statement was made by @Musashi the issue PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #69 by Musashi which prompted further discussion and an additional statement by @Musashi PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #98 by Musashi which sought to bring the discussions to a close.

Firstly, I am pleased that DxO was listening and am encouraged by their commitment to provide an option to reverse the changes made to the metadata handling back to the pre-PL5.2.0 version!

But the complaints made in the forum, that prompted the change that I am flagging as a bug, were actually made about that original release! So the proposal is to return to a more standards compliant version (as an option) but one that actually caused much consternation amongst users in the first place, that doesn’t really make a whole lot of sense to me @Musashi, @sgospodarenko!?

Is there a better way of handling keywords that might go further to help satisfy the complaints of a larger audience of DxPL users? Looking for a potentially better way is what this topic is about!

Essentially, by analysing the nature of the keywords created by a number of packages, PL5 included, I believe that is is possible to create keyword format templates (“presets”) that should offer users selectable options to determine the format of keywords output from DxPL.

These keyword format templates would control the way that DxPL formats keywords output by DxPL to better match the other packages in use by DxPL users, i.e. if my analysis is correct then DxPL output formatting can be controlled to match the output of Photo Mechanics, Capture One etc. or to implement one of @Joanna’s schemes to better match the guidelines etc.

These outputs can be “tailored” to the users’ wishes/requirements and not restricted by any DxO decision. But how useful this change actually is to users might be is governed by how DxO chooses to implement the design.

DxO might:-

Not utilise this strategy at all, when the current options on offer are the original strategy or the later and worse option, neither of which is a step in the right direction, I feel!
Limit the number of template options available arbitrarily (possibly O.K. but only as a first implementation and still too limiting)!
Prevent users from modifying at least some of the templates on offer (why would you have any such limitation - except to simplify the data checking required at some stage in the process))
or
Not allow the creation of new templates to allow users to “tweak” and further refine existing templates (again - why would you have any limitation - except to simplify …)

This design is intended to remove constraints from DxPL so that users are not “locked” into any design that DxO decides is the best compromise, nor one coming from a user or group of users that have “petitioned” DxO to go in a certain direction, but free to choose from the existing formats (templates) on offer, e.g. matching their DAM, or an alternative DAM or a scheme of their own or another forum members “making”.

In fact it should be possible to define export options that would allow DxPL to act as a bridge between applications, reading keyword metadata in one “format” and exporting to one or many other formats!

Please NOTE:- DxPL does not “recognise” any specific keyword format when it encounters a new image (I believe) but treats each keyword in turn and stores them in the database as simple keywords regardless of their origins (i.e. whether from a ‘dc’ or ‘hr’ field). i.e.

in the ‘Keywords’ Table:-

and linked to an image in the ‘Items’ Table by the ‘ItemsKeywords’ Table which points to every “assigned” keyword (“assigned” is discussed later in this post) in the ‘Keywords’ Table for any given ‘Items’ (image) entry.

Nowhere are there any “clues” in the database structures as to the exact format of the ‘dc’ and ‘hr’ keywords as they were stored in the image (embedded or sidecar). Those “clues” are necessary to enable DxPL to exactly reconstruct those fields to write back to the image or to write to an export of the image i.e. that data is simply not recorded and, therefore cannot be used.

There are advantages to this approach because it means that DxPL will analyse and decompose an hierarchical keyword that it discovers in the ‘dc’ fields as well as one in the ‘hr’ fields. The disadvantage is that it will put those keywords in what its formatting rules deem the appropriate fields on output, i.e. an hierarchical’ keyword will be placed in the ‘hr’ fields, arguably correct but now out of reach for a non ‘hr’ aware program!

Therefore, DxPL currently constructs the keyword format according to its rules, as do all the other packages according to their rules (I believe - potentially another set of tests to take all the outputs I passed through PL5.1.4 and PL5.3.1 and feed them into all the other programs and … lose one’s sanity completely in the process).

Hence, my proposal to use keyword format templates (“presets”) that can be user selectable (and changeable) to apply formatting rules when the keyword data is written back to an image or exported.

Effectively, to tailor the output to best suit a users needs by providing DxPL with the format templates of the other packages so that anything that is output matches the chosen package but selected by the user!?

Why were there so many complaints about PL5 keywords?:-

I am sure about only one part of what I believe to be the reason(s) for the furore, namely that the metadata update that resulted automatically with AS(ON) wrote the keywords to the image metadata (or an export) in the DxPL format, i.e. overwriting the keyword format placed there by the users DAM etc…

But I never understood comments from users like the following comment from @jch2103

The problem I have with this statement is that I cannot get either PL4 or PL3 to do anything other than export in the same format used by PL5 up until PL5.2.0 and I have always had the ‘RAW’ option set

and it appears to do nothing special whatsoever to “preserve” the metadata from the RAW image!

Please note that @jch2103 is a Win10 (or Win 11) user as I am and also uses IMatch which I have owned since Christmas 2021.

As shown below in this post, IMatch writes metadata in (at least) one of three formats, one of which matches PL5.1.4 (and PL3 and PL4) and Lightroom, with one particular set of two IMatch options chosen, and another matches PL5.2.0, and the third appears to match Adobe Bridge.

DxPL takes the keywords from the image metadata (embedded or sidecar), “flattens” them and then stores them in the database. If any further keywords are created in PL3/PL4 then with this “RAW” option set all keywords from the external source plus those added by the user to PL3/4 will be output, and all in the PL keyword format (essentially matching the IMatch output when both of the following two IMatch options are set)

If the ‘RAW’ option is left unset in DxPL then no keywords will be exported at all!?

I can find no option that results in PL3 or PL4 leaving the keywords in exactly the same format as it was in the RAW sidecar. Inspecting the keyword format template table below shows that both IMatch (Both options selected) and Lightroom “match” format 4 and that is the format used for the PL3 and PL4 export.

A snapshot of a test with PL3 where an image contains in turn

A single hierarchical keyword (in the sidecar)
An export from PL3 of the image (‘Raw’ ‘Preferences’ option selected)
An export from PL3 of the image with an additional hierarchical keyword added in PL3
An export with the ‘Raw’ ‘Preference’ de-selected
The same image tested with PL4

Hence, I believe that much of the furore about there being a change to PL5 so that the RAW keywords were no longer maintained is not substantiated by my tests, until PL5.2.0 was released all exports followed the same keyword format template used since PL3 which matches IMatch (Both options selected) and Lightroom and that continuity was only lost on the release of PL5.2.0!

What do users really want?:-

It is presumptuous of me to assume that I actually know what other users want but I believe that the key (sorry) feature is that many users do not want DxPL to do anything with the keyword format!

Users certainly want the keyword data that has been carefully crafted by their DAM software, that they have been using for years, to escape any “tampering” by DxPL and to emerge from DxPL “unscathed”, i.e. for the keyword metadata to be passed out of DxPL in the same format that it was passed into DxPL, regardless of whatever point in their workflow DxPL occupies!

What does this Keyword Format Template Table look like?

Something like this perhaps

This table was developed from two sources, one was to generate keywords for RAW images in as many packages as I could gain access to, using different options available whenever I realised that such options existed (and I might still have missed some).

These keywords were then fed into versions of PL5 to see what happened to them in a PL5 JPG export, i.e. to measure the “damage” done and to attempt to “quantify” how bad things really were! The results are documented in the attached spreadsheet.

The second source was the PL5 database and the pseudo-code I developed for my own understanding to see how a Table of simple keywords could be turned back into the familiar keywords structures I expect in an image PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #103 by BHAYT.

In that post I stated

The processing cycle that DxPL “must” be using (or so I believe) became clear as did the table that I have attached above. Using the various keyword format templates will, I believe, produce the “desired” keyword output format, regardless of whether that is standards compliant or not @Joanna, either way it should be just what the users want (hopefully), i.e. representing no unexpected keyword “adjustments”.

The keyword combinations used for testing:-

The aim was to try a combination of keywords with every package that I could access and then output the metadata as part of the image or as an export as appropriate.

In fact, the final version used RAWs for everything created by the packages and then JPG exports from PL5, i.e. the various packages were “forced” (when necessary) to create an xmp sidecar file which PL5.1.4 and PL5.3.1 used as input (PL5.3.1 is shown as PL5.2.0 in the spreadsheet because that was the release when the output changed)!

The original keyword combinations were

animal, mammal, bear
animal|mammal|bear
animals|mammals|bear|black bear
animal, mammal, bear, animal|mammal|bear

Recently these were shortened, to help with pattern recognition (by eye) and space in the spreadsheet and a fifth combination added, so we now have

a, m, b (representing animal, mammal, bear)
a|m|b
as|ms|b|bb (i.e. animals|mammals|bear|black bear)
a, m, b, a|m|b
z, y, z, a|m|b

The packages tested were

ACDSee
Adobe Bridge
Capture One
ExifPro
Imatch with both of two options available selected
Imatch with the First Option selected
IMatch with neither option selected
Lightroom
PhotoLab 5.1.4 with default (only the leaf keyword) assignment
PhotoLab 5.1.4 with All elements of an hierarchical keyword assigned
PhotoLab 5.2.0 with default (only the leaf keyword) assignment
PhotoLab 5.2.0 with All elements of an hierarchical keyword assigned
Photo Mechanic (Option 1)
Photo Mechanic (Option 2)
Zoner

Please note that ACDSee, ExifPro and Zoner do not use, nor do they appear to recognise, the existence of ‘hr’ fields (I believe).

Please note that Photo Supreme is missing from this list because I have not managed to find how to input keywords without PS automatically assigning a ‘Category’, e.g. typically ‘Miscellaneous’!?

Caveat:-

The order of the “simple” keywords cannot be guaranteed, i.e. with some packages if you enter “a”,“m” and “b” in that order then they will be stored in the image keyword metadata in that order. With other packages the storage in the image will be in alphabetical order and alphabetical order will always be the case with DxPL outputs.

I appear to have shown the actual order from the package in the spreadsheet but cannot guarantee that to be the case for all cells. Certainly DxPL will not retain that order if it is not alphabetical, so the keyword format will be the same with respect to contents but not necessarily the order of the simple keys!

Standard Disclaimer:-

I have conducted these tests as carefully as I can but (transcription) errors have crept into the spreadsheet in the past, the last one I commented on turned out to be me not allowing sufficient space for the spreadsheet cells before taking a snapshot!

If you decide to conduct the tests with your own DAM and the results do not agree with mine then please inform me and make sure that I get a copy of the ‘preference’ settings etc. that may account for the differences.

If you decide to conduct the tests with your own DAM and different keyword combinations and they seem not to fit my “format rules” then please inform me and please make sure that I get a copy of the keywords, ‘preference’ settings etc. that may help account for the differences and/or allow me to repeat the tests.

What does the output from each package look like:-

Something like this

and with the keyword format template table we have

Please note:-

The similarity between the PL5.1.4 output and the IMatch(Both Options) outputs
The similarity between the PL5.2.0 output and the IMatch(Neither option) outputs

Classifying the components that make up a keyword:-

Effectively, all keywords are stored in the PL5db as simple keywords, some with links to other keywords, denoting (a component of) an hierarchical keyword (pointing from Leaf to First) and others without links, either designating a simple keyword or the First keyword of an hierarchical keyword.

So I have “classified” the keywords as ‘Simple’ (S), ‘Hierarchical Components’ (HC) and ‘Hierarchical’ (H). The Simple (S) and Hierarchical Keyword Components(HC) are obtained from the database and the Hierarchical Keywords are reconstructed by following the pointers from the leaf (Last) to the Head (First) keyword.

Using the keyword format table:-

So for the keyword combination of “x”, “y”, “z”, “a|m|b” we have the following

1. Simple                      = "x", "y", "z"
2. Hierarchical Component (HC) = "a", "m", "b" 
3. HC First                    = "a"
4. HC Last (or Leaf)           = "b"
5. Hierarchical (H)            = "a|m|b"
6. All Combinations (AC)       = a|m|b, a|m, a

and these elements are available to populate the two keyword elements of the metadata, i.e. the ‘dc’ Subject and the ‘hr’ Hierarchical Subject’, as required.

The various packages populate the ‘dc’ and ‘hr’ fields as they “choose”, some coming closer to the guidelines than others, but the users have a considerable amount of money and time invested in their chosen software and want that investment to continue/be protected!

Hence, the spreadsheet snapshot that I have shown attempting to “classify” (codify) the various combinations created by the various packages (hopefully correctly analysed and recorded, I hope!) and these are numbered and coloured accordingly.

So taking z, y, z, a|m|b and scheme 3, Capture One and PL5.1.4 (All assigned) and PL5.2.0 (All Assigned), we have

i.e.

By keyword "Type"

1. A('dc') = a, b, c        
2. A('hr') = x, y, z 
3. A('dc') = x, y, z                                           
4. F('hr') = a   
5. NULL
6. AC('hr') = a|m|b, a|m, a

giving 

'dc' = a, m, b, x, y, z
'hr' = a, a|m, a|m|b, x, y, z

arguably the “a” from the F and the “a” from the AC need to be de-duplicated or the F column is actually “redundant” (hence the “ghosting” of certain columns in the spreadsheet).

It should be possible to add entries in the table for all DAM/editing software that can output ‘keywords’.

It should be possible to add entries that are guidelines compliant and add additional entries as those guidelines change!

If adopted I believe this strategy should go a long way to answering the concerns users have made about Keywording in forum posts and create a “future-proofed” architecture for keyword formatting.

This is the spreadsheet as a set of outputs to a watermarked pdf file.

Keyword Format V09-01.pdf (253.7 KB)

“For what “price” can this change be achieved”:-

I cannot answer that exactly and it is made up of a number of elements

Designing a table that is provided as a starter but is then amenable to user input (that could actually be a hybrid of an DxO created/maintained table and an adjunct user table).
Managing and importing the table (or part of the table) into DxPL which may require a restart to ingest newly changed entries or a ‘Metadata’ command to (re-)load the whole table, or just the user elements
Storing the table in DxPL, in an array, in the database etc.
Format selection fields in the UI, e.g.
1- Global default keyword format
2- Export keyword format added to the export options, allowing for multiple output options to be created
3- Added to the ‘Metadata’/‘Write to image’ command to dictate the format to be used (versus the global default)
The actual coding change to implement using the user designated keyword format to construct the keywords. Going by my pseudo-code the actual process is currently very straightforward (by design I believe) and adding the use of the keyword formatting table, even accounting for the AC element discussed below, will certainly complicate it but not by much I believe and is arguably the most straightforward of the implementation elements. I feel the return on the investment should be very good value (but then I am biased)

DeDuplication:-

Typically after all keywords have been created for inclusion in the image (original and/or export) the list of candidate keywords is scanned and duplicates removed!

AC versus ‘Assign All’:-

In the spreadsheet and the format templates (“presets”) I have included an AC item, this stands for “All Combinations” and means that for “as|ms|b|bb” there will be “as|ms|b|bb”, “as|ms|b”, “as|ms” and “as” included.

This is included in the format templates for Capture One, PL5.1.4 (ALL assigned) and PL5.2.0 (ALL assigned) and, as the name for the PL5 formats indicates, it is possible to create this output by assigning ALL items in a tree in PL5.

The current Win 10 PL5 default for assignment is to select only the ‘Last’ or ‘Leaf’ keyword, i.e.

rather than ALL

But it is also possible to select

So relying on the use of assignment to accomplish the task requires all the keywords in the hierarchy to be selected for every hierarchy that exists, either manually as is currently on offer or automatically which might be part of the @Musashi “commitment” referenced at the start of this post.

A “safer” option might be to include the AC in the format definition and then to create all the elements of the keyword programmatically when there is no possibility of missing a step and the entire set of keywords is guaranteed to be generated, i.e. the “rule” is enforced.

There is a potential “clash” if AC has been selected and also levels of keywords have also been assigned (assign ALL) in PL5. This can be resolved in one of two ways

De-duping (removing) duplicates before writing the metadata to the image or to an export. This might mean excess overlapping work in PL5 generating excess keywords combinations which will then be removed in the de-duplication phase.
Programmatically checking if the keyword is already in the ‘ItemsKeywords’ structure which means that is already “assigned” and a programmatic generation of the keyword is not required because it will happen automatically using the normal PL5 process.

platypus · July 21, 2022, 12:44pm

Sorry for possibly sounding difficult, @BHAYT.

Reading the title of your post, I get the impression that you propose that PhotoLab should use keyword format templates. Without having read your text (too many trees for me)…

… I suppose that such templates should allow a user to make keyword records (format of how keywords are written to files) interoperable with a target app. Q: Yes or No?

uncoy · July 21, 2022, 12:45pm

You’ve made a fantastic case for avoiding hierarchical keywords altogether, Bryan. I can’t imagine the amount of work which went into this testing. Thanks for sharing.

BHAYT · July 21, 2022, 12:54pm

@platypus YES

BHAYT · July 21, 2022, 1:09pm

@uncoy or a case for actually how straightforward it is to use them?

When it can be “boiled” down to 6 simple parameters and arguably column 1 can be taken as implied (=A) and the F in column 4 is actually a remnant from my initial analysis and is encompassed within the AC (that it actually only ever exists with anyway)!

So we are down to 4 parameters controlling the formats (I believe/hope) of all the software I was able to test!

Is it really that complicated or is it just that different software adopting similar but not identical keyword formats seems to make it that way!

I believe that if DxO chose to adopt this approach they could be compatible with any package and even with any format that @joanna believes is “ideal”. To be honest it would actually be “simple” to write a program to take the keyword data in and spit it out in any format, until someone proves me wrong that is!?

To be honest the testing was the easy part it has taken many days (not full time) to write, refine, re-write, repeat that sequence to get to the post etc… I don’t mind people finding minor holes I just hope the principle mostly holds water!?

platypus · July 21, 2022, 1:29pm

You are absolutely right:
A program is meant to spit out whatever you want or need, depending on input

MWG et al. set up ways that allow a provider to transfer keywords to a consumer in a structure (bunch of tags). Even though such “ways” (standards) exist, they are not implemented in exactly the same way by everyone and I suppose that they never will.

Your proposal for using keyword format templates that allow a user to make keyword records interoperable with a target app sounds good, but who will a) be able and willing to write such templates and b) what will the interface to such templates be? I think we’d replace one issue with another…

Note to @BHAYT: Please post future proposals here, so that we can vote for them. No need to add win and bug flags and, oh, give us a few lines of what you propose, no need to spread PhotoLab’s innards all over the table, please.

BHAYT · July 21, 2022, 2:30pm

@platypus Thank you, I suspected there was something like that but didn’t know where it was.

But perhaps I like the sight of “innards” - actually I need to understand the innards (as much as is possible from the outside) to be sure I am on the right track and I was always told to show your “workings out”, not least because someone might actually spot where I went wrong.

Although you are partly correct with “out of the frying pan into the fire” the current templates that I have provided would (should) work for all the programs identified, i.e they can be used tomorrow once they have been ratified and once I have my “pound of flesh”, of course.

I would estimate a fairly short amount of development is required to actually modify the current code to pick up a template entry as designated by the user. e.g. #3 or "CO " or "C1 " or “BHT1” or “JO01” or “PL01” etc. and “spit” out the keyword metadata in the desired format.

This is not so much about standards but choice, which can include the strictest adherence to standards or compatibility with the “worst” program and everything in between, as required

All the real work has already been done by DxO, by the look of things it has been functioning since PL3, i.e. the parsing of the image metadata into the database structures and then re-formatting them on output, with PL3 and PL4 only for exports but with PL5 that was extended to outputting to the image metadata as well.

There is design work to be completed to agree how the table should be maintained and, hopefully, accessible to the users (please see the “For what “price” can this change be achieved”:- item in the original post) but this is way easier than much of the coding DxPL already contains!

The difference (if it can be implemented, which I believe it can) is transformative both in what it does to the keyword data format (literally) and what it does to the product and what is does for those users who want a “compatible” keyword handler as part of their “favourite” editor.

Joanna · July 21, 2022, 2:45pm

Very well put. I know I seem somewhat OCD about the MWG Guidance document but I thought, if there were a guiding standard, that should be it.

However, since even Adobe, who helped draw up MWG, can’t be bothered to stick to it, then the idea of “brew your own” for compatibility seems an eminently sensible idea.

Not forgetting that hierarchical keywords are only normally used for transmission of such structures from one DAM to another, never for use by image libraries and agencies - all they care about is the dc:subject tag. Which means that, if all words from any hierarchies are not included, an author’s images are going to be difficult to find.

BHAYT · July 21, 2022, 2:59pm

@Joanna and this shows which will or will not be “friendly” to this principle

Update: i.e. those with an “A” (ALL) in column 3 will contain all component keywords from an hierarchical keyword.

The “yellow” rows are non-hierarchical packages and if they ever contain hierarchical keywords they will be in the ‘dc’ fields. Typically they will be “stolen” and placed in the ‘hr’ fields but to maintain compatibility with the package DxPL would need to put any hierarchical keyword back into the ‘dc’ field as per the template!

But my proposal is for the format templates to be attached to export profiles so we can have one for “return” to or compatibility with ACDSee and one that meets the standards e.g. one designated AC and one designated e.g. JO01 or BHT1 etc.!

I do hope DxO realises the potential of this approach @Musashi!

The interesting thing would be to see what the “unfriendly” programs make of format #3 (Capture One and PL5 both releases with ALL assigned).

But the “proposal” is compatible with “good”, “bad” or “ugly” and flexible enough to take “excellent” etc. as well, this should be an inclusive design as much as possible!

platypus · July 21, 2022, 4:53pm

I’m not sure about that, supposing that they use many other tags from IPTC’s set. CapOne provides a separate set for one of the larger providers, just to mention one thing.

Back to templates: Imagine having to support the templates, every little change that e.g. Adobe comes up with, would need to be dealt with quickly in order to not be buried in complaints. Multiply this by the number of apps out there and load will grow accordingly. Add issues like duplicate “Title” tags and and and…but from a technical point of view, templates might be a viable option.

BHAYT · July 21, 2022, 11:20pm

@platypus the templates are only used for keyword metadata not the rest of the metadata and if there are major changes caused by changing/evolving standards then DxO will need to keep abreast of them with respect to the input process, so adding or amending an existing template is hardly going to be a major item.

In truth if the various packages start changing keyword layouts on a regular basis their users will be as vocal as they were with DxO, it is not something that is going to happen on a routine basis; DxO has continued with the same format since PL3 and would still be using it if they hadn’t over-reacted!

Currently some users are using a “DAM sandwich” to “re-align” the keyword data of exported images, wouldn’t it be better if that data simply didn’t need any “adjustment”.

The proposed scheme applies not only to the keyword data taken from the image but also any keyword data added in DxPL, in fact if something as simple as a ‘Rating’ is changed in PL5 the user currently won’t want to write that back to the image for fear of “damaging” the keyword format!

The use of the templates, providing they work as I believe they will, removes barriers to making DxPL a fully integrated element of the work flow; the benefits far outweigh any negatives by a long, long way!

Musashi · July 28, 2022, 3:41pm

Thank you very much for your deep analysis and comparative study with other applications. This will be helpful for us.

So if we well understand, you want:
1/ to control if all hierarchical levels are written in DC subject
2/ to control if the KW search uses all hierarchical levels to return results
3/ to control if all hierarchical levels should be assigned when assigning a KW from PL

That’s all what we’ve planned to introduce in a next PL6.X version

Do we miss a use case here ?

In parallel, I just want to recall that our team has plans to improve the KW and Database management, this takes time and is done step by step (as the 3 previous proposed solutions). As already stated by @CaptainPO in the previous post, we do put efforts in this part of the app but cannot answer to all requests.

Best regards

BHAYT · July 29, 2022, 2:58pm

@musashi the answer is yes and no, yes these are the items that you identified in your response and yes they were the items that we responded to but no that is not what I am proposing here,

Item 1 returns DxPL to the pre-PL5.2.0 formatting rules that have been in use since PL3 as far as I can tell. At the time of the closing of that post I had not completed the work shown here and what was proposed was better than what is currently available by a long way!

But item 1 can and should, in my opinion, be improved upon using the keyword format tables (templates) described in this topic.

However, in spite of the copious complaints about PL5 damaging keywords etc. from users this topic and Keyword Format Templates - A more flexible way of working with keywords in DxPL have almost zero comments from any of those complainants so I can only think that they are all completely happy with what happens with DxPL keywording!?

In the meantime I have chosen to try my hand at coding, the first time since 2009, and have started writing my own keyword (re-)formatting utility in Python to make use of the work that I did!?

While I suggested incorporating the table into the database and @platypus “hinted” that selecting formats from a drop down table would be good (I considered that I had asked for a lot (maybe a few days of coding) 2022-07-26_104223_Original versus Format Template pseudo-code_W.pdf (4.9 MB)
and I left drop-down lists out of the request, nice though they would be!?

The table could be embedded in DxPL initially, as shown above from my Python utility, and eventually opened up to users for additional templates e.g. “LIB1 8A-A----” which would “flatten” hierarchical keywords into the ‘dc’ fields for Library use etc.

However, with Item 3 in the above list implemented I will be able to assign “all” easily (rather than photo by photo) which produces the same keyword layout as Capture One, with both the pre and post PL5.2.0 formats, according to my table, in the meantime I will be able to use my own utility!

@Musashi and @CaptainPO what I offered here was something no other software offers but …

platypus · July 29, 2022, 3:12pm

This morning, I tested a hierarchical keywords (again) and found that both Capture One and Lightroom

include the complete hierarchy flattened under the dc:subject tag, no matter if I add the complete hierarchical path or just the leaf keyword
include, in the hierarchical tag, one line per keyword that is shown in the keyword field, e.g.
root|branch|leaf…if only the leaf is visible in the keyword field of the GUI
or
root
root|branch
root|branch|leaf…if all levels are shown.

This seems to be “industry standard” notation and I propose that DPL should go along with that.

BHAYT · July 29, 2022, 3:52pm

Summary:- @platypus if that is all you want then you have it already with the pre PL5.2.0 release with all keywords in a hierarchy assigned. With Post PL5.2.0 you are “light” all but the leaf in the 'dc’fields.

With the proposed PL6 release the feature will be fully available alongside everything else in that release (and in the PL5 releases from PL5.2.0 onwards!

@platypus from my table I did not see that in my tests of LightRoom but did for Capture One, i.e. my tests of LightRoom did not invoke the feature meaning that there is more than one option in Lightroom, one that conforms to PL5.1.4 and another than conforms to Capture One and PL5.1.4 with all keywords assigned. So thank you for adding to the table, a snapshot of how you achieved that would be useful.

But the whole point of my “design” is that you are doing to hierarchical keywords what PL5.1.4 did to ‘dc’ keywords and you are not matching the users system in any way, shape or form. Why is that any better than anything else, there will be groans that there are “too” many hierarchical keys cluttering up their keywords etc. etc,?

In addition it actually requires nothing other than the option to return to the PL5.1.4 layout to populate the ‘dc’ keys, hence, this populating of the ‘hr’ fields is currently achievable with all versions of DxPL5 with all items assigned (but with PL5.2.0 onwards the ‘dc’ keys do not conform) on Windows in particular this is a chore that must be repeated for every image until the changes highlighted by @Musashi are implemented!

Had DxO not changed the format it would be correct right now!

If the table I have suggested is embedded in DxPL with additional entries for Photo Supreme and the alternative for Lightroom that you have described, and any other possibilities that we discover’ then I believe that we are talking about days of work! I have retained the identity of the package rather than distilled the formats down to #1 to #7 because users are more likely to be comfortable using the id. of their own package.

The output you are proposing would fit with your use of Capture One and the PL5.1.4 format would fit with the Lightroom format I encountered but how is a Photo Mechanic user going to react to more ‘dc’ entries and more ‘hr’ entries, given the reaction to this topic there will be no reaction whatsoever, but I don’t believe that for one minute and what happens with ACDSee users I dread to think!

EDIT:-

To get Lightroom to include ALL Combinations is achieved in much the same way as DxPL, i.e. by assigning from the keyword tree, when I was testing I was looking for options in the ‘preferences’, i.e. the “wrong” place!

2022-07-29_183504_
2022-07-29_183523_

and I still don’t believe that “one size fits all” is the right approach when a fully “tailored” solution is a bit of coding away, why swap one problem for another when a better solution is within reach!?

platypus · July 29, 2022, 6:57pm

as far as I’ve seen in my tests, we have to consider four aspects.

how keywords are embedded in XMP
what that embedding does to finding assets
how apps add keywords (user->gui->xmp)
how apps display keywords (xmp->gui)

CapOne does the thing I find most logical: When I add crow in the keywords panel, C1 automatically adds bird and animal. If I wanted to remove animal and bird, I’d have to act in the keyword list.

If I want DPL and LrC to behave like C1, I have to enter keywords using the keyword list panel, because adding crow in the keyword panel will not automatically add bird and animal.

The difference of what is displayed in the keyword panel (on import of updated metadata) is based on what is in the hierarchical subject tag and not based on the dc:subject tag, provided all levels turn up, which was always the case when keywords are added using the keyword list or by C1 respectively.

Other keyword managers might do different things, so some testing would be required by someone having those apps…

BHAYT · July 29, 2022, 11:31pm

@platypus let’s start at the end and work backwards! With respect to the layout of keywords in the ‘dc’ fields and ‘hr’ fields there is no real mystery as you seem to imply!? The formats that I determined are as shown in the tables and I will add" Lightroom ALL assigned" to the spreadsheet, thank you for that.

You seem intent on casting doubt on what is actually a very straightforward process.

I originally stated that I might have missed options and that involving other users was essential but the “rules” that I have provided will work with the packages and options I have tested and “Lightroom ALL assigned” falls into category #3, I tested it a little earlier.

The feature I am suggesting does nothing about improving DxPL keyword entry, it is “simply” about transforming what would have been output into a slightly different format. If the capability existed in DxPL or any of the other packages we would be wondering what all the “fuss” was about, but it doesn’t, either you have to accept what the package provides or you…!?

and what is this statement supposed to “teach” us!? Of course it is to do with the hierarchical keywords, except that @Joanna would suggest that if the correct data is not in the ‘dc’ fields then searching is not going to work. This is the part that the format templates are designed to handle, basically stopping DxPL from using it’s own format but to adopt one that “coincides” with another package or one that achieves something desired by the user.

For example, I don’t like the automatic transfer of simple keys to the ‘hr’ fields but all the packages, except Photo Mechanic, do just that, but with a format of ‘A-AAF-C’ (which should really be ‘A-AA–C’) then I would get all the features of Capture One and other #3 formats but without the ‘dc’ fields being copied to the ‘hr’ fields!

The use of the #1 formats might break some “rules” if hierarchical keywords wind up in the ‘dc’ fields but at least ACDSee etc. would be able to work with them!

With a format of ‘A-A—’ this would convert an hierarchical set of keys to a flat set of simple keys more suited to museums and libraries etc.

While I am glad that Capture One helps you complete your keyword correctly the feature proposed by @Musashi will automatically assign all keywords in an hierarchy rather than just the ‘Leaf’ or ‘Last’ item or allow you to miss a selection in the hierarchy.

However the “C” (shortened to “C” from “AC” in the spreadsheet) in column 6 of the table will automatically do the same thing!

PS:-

@platypus I don’t mind a critique if it actually adds anything to the story and if you had made the following point then I would know that you understand what I have written but it is up to me to make it instead but then it has only just occurred to me! I have been concentrating on the exports and I do not believe that there is an issue with them!

But I have also proposed this for the ‘Write to image’ and therein lies a “problem”. DxPL will potentially be using a transform to reformat the data going to the image which will (potentially) no longer match the database (or will it, mostly an issue with AC?) and the DOP, i.e. there is a possible need to read the reformatted image keyword data after it has been written to keep the image and database (and DOP) in line.

This poses another potential problem which I need to review on Sunday, what happens if reformatted data is passed through the reformatting process again (I don’t think there is a problem but I need to look at that more closely)!

Joanna · July 30, 2022, 10:20am

I’m sorry Bryan but all your spreadsheets do is to confuse me.

I think I have discovered something that might “throw the cat among the pigeons”.

I start by using my app to create an XMP file for an image, which then contains…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Mammal</rdf:li>
     <rdf:li>Bear</rdf:li>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

I then manually edit that sidecar to remove the dc:subject tag (as I see in your table PhotoMechanic can write this)

   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

Next, I open CaptureOne 12, create a session and add just that image. This has the side effect of writing an XMP sidecar for every image in the same folder, which is very annoying.

But, not only does it do that, it also rewrites the dc:subject tag that I had removed back again…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Mammal</rdf:li>
     <rdf:li>Bear</rdf:li>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal</rdf:li>
     <rdf:li>Animal|Mammal</rdf:li>
     <rdf:li>Animal|Mammal|Bear</rdf:li>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

If I strip the sidecar down to just…

   <lr:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lr:hierarchicalSubject>

… then CaptureOne “updates” this to…

   <dc:subject>
    <rdf:Bag>
     <rdf:li>Black Bear</rdf:li>
    </rdf:Bag>
   </dc:subject>
   …
   <lightroom:hierarchicalSubject>
    <rdf:Bag>
     <rdf:li>Animal|Mammal|Bear|Black Bear</rdf:li>
    </rdf:Bag>
   </lightroom:hierarchicalSubject>

So, it seems that CaptureOne takes it upon itself to “normalise” keyword metadata if it deems it is non-standard.

The only problem with the second example is that the dc:subject tag doesn’t contain all of the flattened hierarchical tags.

What are your thoughts on that?

platypus · July 30, 2022, 1:30pm

@Joanna
In my recent tests, I found the following:

As long as there are entries in the hierarchical subject tags, apps can (and will) add entries in dc:subject as you discovered. BTW, Lightroom does it too.

Here’s an archive of XMP files showing how keywords are written by Lightroom with an example hierarchy of AAAnimal>BBird>Crow (silly notation to make it easily visible in my keyword list) and combinations of keywords saved, you’ll find what was added by the respective names of the files. Note that I’ve relieved the files from anything not related to keywords.

Archiv.zip (5.3 KB)

BHAYT · July 31, 2022, 9:44am

You’ve found a “feature” in Capture one!

The original formats (templates) that I “discovered” were all from the use case of entering the data directly into the package and forcing the package to write that data back to the image (i.e. create an xmp sidecar for RAW).

The tests were repeated multiple times even when I was using the extended syntax of animals|mammals|bear|black bear and then yet again when I used the abbreviated syntax of as|ms|b|bb.

Throughout those tests, the original tests used 4 JPGs and 4 RAWS, the results were always consistent; so many tests because I was reluctant to “publish” anything that might be “flawed” or too badly “flawed”.

I was also concerned about my lack of experience with the various packages and certainly missed the Lightroom All assigned situation that @platypus found!

But the results were consistent and appear to work for the principle of matching the output that is equivalent to the given use case, i.e. the creation of keywords afresh by entering the data directly into each package.

So I believe that all is O.K. for using the templates to create exported files that match the outputs that the respective packages would create if the data was entered into those packages, via their own UI.

My intention was to keep users “happy” because their keyword layout was intact in the exports, I was not trying to create an emulator for the other packages, with respect to keyword handling!

It is rather sad that Capture One doesn’t seem to stick to to own “rules” when it encounters the situation where there is an hierarchical field that has no data in the ‘dc’ field. I repeated a number of tests and Capture One mostly but not completely leaves the originals alone, i.e. it adds “bb” to the ‘dc’ fields.

Adding an “x” does nothing surprising and is added to the ‘dc’ and ‘hr’ fields as nearly all the packages do, but which Capture One didn’t do when it decided to “liberate” “bb” from the hierarchical keyword, if it was following any rules shouldn’t the “bb” also have wound up in the ‘hr’ fields!

Deleting and re-inputting the “as|ms|b|bb” keyword into C1 results in the return of the original format!

Similar “anomalies” in behaviour might exist with other packages when confronted by data that has been externally generated. In fact on Friday I put my development of a Python “converter” to one side and started to look at the development of an xmp sidecar generator.

The “generator” would take any keyword combination and generate a “labelled” xmp sidecar file for each of the formats I have currently documented. These could then be associated with any RAW image to undertake the kind of test you undertook using manual intervention, in order to speed up the testing process! I need part of that code for the convertor anyway!

My coding skills are still raw (pun intended) but now that I have abandoned the IDEs for a combination of Hippo Edit and IDLE I am not trying to fix errors flagged by the IDE’s that are perfectly acceptable to Python!!??

That brings me to my biggest concern, or rather biggest concern after the complete indifference on most of the users who complained in the first place, namely using this technique to format the metadata written back to the image.

With AS(OFF) such a ‘Write to image’ would have to be user initiated. The existing DOP and database would continue to generate the same formatted keywords without being updated but if a KFT (keyword Format Template) formatted ‘Write to Image’ is made then that should be followed either by an ‘S’ icon or by an automatic ‘Read from Image’ as appropriate and now we have the situation of a changed database structure in-line with the KFT “rules” then being subjected to the KFT rules again in any subsequent export or ‘Write to image’, which bears a slight resemblance to the Capture One case you cited @Joanna.

Arguably users would only ever need to write back to the image if they had used DxPL to make metadata changes but even a simple ‘Rating’ change that the user wanted to keep will currently change the format of the image keywords, the whole crux of the problem (or so I thought)!

So to summarise

I don’t think this “sinks” the KFT “model” for exports at all.
I am concerned about the lack of consistency with Capture One and what it might tell us about the reliability of software to have rules that they consistently use.
The issue of exporting using KFT formatting back to the image and the implications on the process either immediately or when such images are “discovered” anew by DxPL (I think I am overthinking the dangers but …)

Once again @Joanna and @platypus thank you for your testing, I still believe that KFT “has legs” and do intend to complete the sidecar generator and the format convertor because they will be useful tools should I ever bother to test keywords again and getting back into coding is interesting, but me and IDEs don’t seem compatible!?