This is a long post but hopefully a worthwhile one! It seeks to address the issue of DxPL versus other software with respect to keyword metadata formats !?
A lot of work has gone into this post and into the “research” behind it and, hopefully, it will help DxO provide users with a more compatible product with respect to formatting keywords @Musashi.
Summary:-
The change to the keyword handling made in release PL5.2.0 should be treated as a “bug” @sgospodarenko. The change was not announced prior to releasing that version, it effectively took the keyword handling of DxPL further away from the standards and from what had been the DxO standard from at least PL3, according to my testing, and the change was not optional, i.e. users were “forced” to accept it!
or using “expressed” using the proposed Keyword Format Template table
In the course of discussions in the topic PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords @joanna, others and I discussed what we felt was wrong with handling of keywords in PL5.
After a lengthy discussion, which brought these issues to the attention of DxO, the following statement was made by @Musashi the issue PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #69 by Musashi which prompted further discussion and an additional statement by @Musashi PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #98 by Musashi which sought to bring the discussions to a close.
Firstly, I am pleased that DxO was listening and am encouraged by their commitment to provide an option to reverse the changes made to the metadata handling back to the pre-PL5.2.0 version!
But the complaints made in the forum, that prompted the change that I am flagging as a bug, were actually made about that original release! So the proposal is to return to a more standards compliant version (as an option) but one that actually caused much consternation amongst users in the first place, that doesn’t really make a whole lot of sense to me @Musashi, @sgospodarenko!?
Is there a better way of handling keywords that might go further to help satisfy the complaints of a larger audience of DxPL users? Looking for a potentially better way is what this topic is about!
Essentially, by analysing the nature of the keywords created by a number of packages, PL5 included, I believe that is is possible to create keyword format templates (“presets”) that should offer users selectable options to determine the format of keywords output from DxPL.
These keyword format templates would control the way that DxPL formats keywords output by DxPL to better match the other packages in use by DxPL users, i.e. if my analysis is correct then DxPL output formatting can be controlled to match the output of Photo Mechanics, Capture One etc. or to implement one of @Joanna’s schemes to better match the guidelines etc.
These outputs can be “tailored” to the users’ wishes/requirements and not restricted by any DxO decision. But how useful this change actually is to users might be is governed by how DxO chooses to implement the design.
DxO might:-
- Not utilise this strategy at all, when the current options on offer are the original strategy or the later and worse option, neither of which is a step in the right direction, I feel!
- Limit the number of template options available arbitrarily (possibly O.K. but only as a first implementation and still too limiting)!
- Prevent users from modifying at least some of the templates on offer (why would you have any such limitation - except to simplify the data checking required at some stage in the process))
or - Not allow the creation of new templates to allow users to “tweak” and further refine existing templates (again - why would you have any limitation - except to simplify …)
This design is intended to remove constraints from DxPL so that users are not “locked” into any design that DxO decides is the best compromise, nor one coming from a user or group of users that have “petitioned” DxO to go in a certain direction, but free to choose from the existing formats (templates) on offer, e.g. matching their DAM, or an alternative DAM or a scheme of their own or another forum members “making”.
In fact it should be possible to define export options that would allow DxPL to act as a bridge between applications, reading keyword metadata in one “format” and exporting to one or many other formats!
Please NOTE:- DxPL does not “recognise” any specific keyword format when it encounters a new image (I believe) but treats each keyword in turn and stores them in the database as simple keywords regardless of their origins (i.e. whether from a ‘dc’ or ‘hr’ field). i.e.
in the ‘Keywords’ Table:-
and linked to an image in the ‘Items’ Table by the ‘ItemsKeywords’ Table which points to every “assigned” keyword (“assigned” is discussed later in this post) in the ‘Keywords’ Table for any given ‘Items’ (image) entry.
Nowhere are there any “clues” in the database structures as to the exact format of the ‘dc’ and ‘hr’ keywords as they were stored in the image (embedded or sidecar). Those “clues” are necessary to enable DxPL to exactly reconstruct those fields to write back to the image or to write to an export of the image i.e. that data is simply not recorded and, therefore cannot be used.
There are advantages to this approach because it means that DxPL will analyse and decompose an hierarchical keyword that it discovers in the ‘dc’ fields as well as one in the ‘hr’ fields. The disadvantage is that it will put those keywords in what its formatting rules deem the appropriate fields on output, i.e. an hierarchical’ keyword will be placed in the ‘hr’ fields, arguably correct but now out of reach for a non ‘hr’ aware program!
Therefore, DxPL currently constructs the keyword format according to its rules, as do all the other packages according to their rules (I believe - potentially another set of tests to take all the outputs I passed through PL5.1.4 and PL5.3.1 and feed them into all the other programs and … lose one’s sanity completely in the process).
Hence, my proposal to use keyword format templates (“presets”) that can be user selectable (and changeable) to apply formatting rules when the keyword data is written back to an image or exported.
Effectively, to tailor the output to best suit a users needs by providing DxPL with the format templates of the other packages so that anything that is output matches the chosen package but selected by the user!?
Why were there so many complaints about PL5 keywords?:-
I am sure about only one part of what I believe to be the reason(s) for the furore, namely that the metadata update that resulted automatically with AS(ON) wrote the keywords to the image metadata (or an export) in the DxPL format, i.e. overwriting the keyword format placed there by the users DAM etc…
But I never understood comments from users like the following comment from @jch2103
The problem I have with this statement is that I cannot get either PL4 or PL3 to do anything other than export in the same format used by PL5 up until PL5.2.0 and I have always had the ‘RAW’ option set
and it appears to do nothing special whatsoever to “preserve” the metadata from the RAW image!
Please note that @jch2103 is a Win10 (or Win 11) user as I am and also uses IMatch which I have owned since Christmas 2021.
As shown below in this post, IMatch writes metadata in (at least) one of three formats, one of which matches PL5.1.4 (and PL3 and PL4) and Lightroom, with one particular set of two IMatch options chosen, and another matches PL5.2.0, and the third appears to match Adobe Bridge.
DxPL takes the keywords from the image metadata (embedded or sidecar), “flattens” them and then stores them in the database. If any further keywords are created in PL3/PL4 then with this “RAW” option set all keywords from the external source plus those added by the user to PL3/4 will be output, and all in the PL keyword format (essentially matching the IMatch output when both of the following two IMatch options are set)
If the ‘RAW’ option is left unset in DxPL then no keywords will be exported at all!?
I can find no option that results in PL3 or PL4 leaving the keywords in exactly the same format as it was in the RAW sidecar. Inspecting the keyword format template table below shows that both IMatch (Both options selected) and Lightroom “match” format 4 and that is the format used for the PL3 and PL4 export.
A snapshot of a test with PL3 where an image contains in turn
- A single hierarchical keyword (in the sidecar)
- An export from PL3 of the image (‘Raw’ ‘Preferences’ option selected)
- An export from PL3 of the image with an additional hierarchical keyword added in PL3
- An export with the ‘Raw’ ‘Preference’ de-selected
- The same image tested with PL4
Hence, I believe that much of the furore about there being a change to PL5 so that the RAW keywords were no longer maintained is not substantiated by my tests, until PL5.2.0 was released all exports followed the same keyword format template used since PL3 which matches IMatch (Both options selected) and Lightroom and that continuity was only lost on the release of PL5.2.0!
What do users really want?:-
It is presumptuous of me to assume that I actually know what other users want but I believe that the key (sorry) feature is that many users do not want DxPL to do anything with the keyword format!
Users certainly want the keyword data that has been carefully crafted by their DAM software, that they have been using for years, to escape any “tampering” by DxPL and to emerge from DxPL “unscathed”, i.e. for the keyword metadata to be passed out of DxPL in the same format that it was passed into DxPL, regardless of whatever point in their workflow DxPL occupies!
What does this Keyword Format Template Table look like?
Something like this perhaps
This table was developed from two sources, one was to generate keywords for RAW images in as many packages as I could gain access to, using different options available whenever I realised that such options existed (and I might still have missed some).
These keywords were then fed into versions of PL5 to see what happened to them in a PL5 JPG export, i.e. to measure the “damage” done and to attempt to “quantify” how bad things really were! The results are documented in the attached spreadsheet.
The second source was the PL5 database and the pseudo-code I developed for my own understanding to see how a Table of simple keywords could be turned back into the familiar keywords structures I expect in an image PL5 Searching doesn't work properly for all keywords in an hierarchy - only for "Selected" keywords - #103 by BHAYT.
In that post I stated
The processing cycle that DxPL “must” be using (or so I believe) became clear as did the table that I have attached above. Using the various keyword format templates will, I believe, produce the “desired” keyword output format, regardless of whether that is standards compliant or not @Joanna, either way it should be just what the users want (hopefully), i.e. representing no unexpected keyword “adjustments”.
The keyword combinations used for testing:-
The aim was to try a combination of keywords with every package that I could access and then output the metadata as part of the image or as an export as appropriate.
In fact, the final version used RAWs for everything created by the packages and then JPG exports from PL5, i.e. the various packages were “forced” (when necessary) to create an xmp sidecar file which PL5.1.4 and PL5.3.1 used as input (PL5.3.1 is shown as PL5.2.0 in the spreadsheet because that was the release when the output changed)!
The original keyword combinations were
- animal, mammal, bear
- animal|mammal|bear
- animals|mammals|bear|black bear
- animal, mammal, bear, animal|mammal|bear
Recently these were shortened, to help with pattern recognition (by eye) and space in the spreadsheet and a fifth combination added, so we now have
- a, m, b (representing animal, mammal, bear)
- a|m|b
- as|ms|b|bb (i.e. animals|mammals|bear|black bear)
- a, m, b, a|m|b
- z, y, z, a|m|b
The packages tested were
- ACDSee
- Adobe Bridge
- Capture One
- ExifPro
- Imatch with both of two options available selected
- Imatch with the First Option selected
- IMatch with neither option selected
- Lightroom
- PhotoLab 5.1.4 with default (only the leaf keyword) assignment
- PhotoLab 5.1.4 with All elements of an hierarchical keyword assigned
- PhotoLab 5.2.0 with default (only the leaf keyword) assignment
- PhotoLab 5.2.0 with All elements of an hierarchical keyword assigned
- Photo Mechanic (Option 1)
- Photo Mechanic (Option 2)
- Zoner
Please note that ACDSee, ExifPro and Zoner do not use, nor do they appear to recognise, the existence of ‘hr’ fields (I believe).
Please note that Photo Supreme is missing from this list because I have not managed to find how to input keywords without PS automatically assigning a ‘Category’, e.g. typically ‘Miscellaneous’!?
Caveat:-
The order of the “simple” keywords cannot be guaranteed, i.e. with some packages if you enter “a”,“m” and “b” in that order then they will be stored in the image keyword metadata in that order. With other packages the storage in the image will be in alphabetical order and alphabetical order will always be the case with DxPL outputs.
I appear to have shown the actual order from the package in the spreadsheet but cannot guarantee that to be the case for all cells. Certainly DxPL will not retain that order if it is not alphabetical, so the keyword format will be the same with respect to contents but not necessarily the order of the simple keys!
Standard Disclaimer:-
I have conducted these tests as carefully as I can but (transcription) errors have crept into the spreadsheet in the past, the last one I commented on turned out to be me not allowing sufficient space for the spreadsheet cells before taking a snapshot!
If you decide to conduct the tests with your own DAM and the results do not agree with mine then please inform me and make sure that I get a copy of the ‘preference’ settings etc. that may account for the differences.
If you decide to conduct the tests with your own DAM and different keyword combinations and they seem not to fit my “format rules” then please inform me and please make sure that I get a copy of the keywords, ‘preference’ settings etc. that may help account for the differences and/or allow me to repeat the tests.
What does the output from each package look like:-
Something like this
and with the keyword format template table we have
Please note:-
- The similarity between the PL5.1.4 output and the IMatch(Both Options) outputs
- The similarity between the PL5.2.0 output and the IMatch(Neither option) outputs
Classifying the components that make up a keyword:-
Effectively, all keywords are stored in the PL5db as simple keywords, some with links to other keywords, denoting (a component of) an hierarchical keyword (pointing from Leaf to First) and others without links, either designating a simple keyword or the First keyword of an hierarchical keyword.
So I have “classified” the keywords as ‘Simple’ (S), ‘Hierarchical Components’ (HC) and ‘Hierarchical’ (H). The Simple (S) and Hierarchical Keyword Components(HC) are obtained from the database and the Hierarchical Keywords are reconstructed by following the pointers from the leaf (Last) to the Head (First) keyword.
Using the keyword format table:-
So for the keyword combination of “x”, “y”, “z”, “a|m|b” we have the following
1. Simple = "x", "y", "z"
2. Hierarchical Component (HC) = "a", "m", "b"
3. HC First = "a"
4. HC Last (or Leaf) = "b"
5. Hierarchical (H) = "a|m|b"
6. All Combinations (AC) = a|m|b, a|m, a
and these elements are available to populate the two keyword elements of the metadata, i.e. the ‘dc’ Subject and the ‘hr’ Hierarchical Subject’, as required.
The various packages populate the ‘dc’ and ‘hr’ fields as they “choose”, some coming closer to the guidelines than others, but the users have a considerable amount of money and time invested in their chosen software and want that investment to continue/be protected!
Hence, the spreadsheet snapshot that I have shown attempting to “classify” (codify) the various combinations created by the various packages (hopefully correctly analysed and recorded, I hope!) and these are numbered and coloured accordingly.
So taking z, y, z, a|m|b and scheme 3, Capture One and PL5.1.4 (All assigned) and PL5.2.0 (All Assigned), we have
i.e.
By keyword "Type"
1. A('dc') = a, b, c
2. A('hr') = x, y, z
3. A('dc') = x, y, z
4. F('hr') = a
5. NULL
6. AC('hr') = a|m|b, a|m, a
giving
'dc' = a, m, b, x, y, z
'hr' = a, a|m, a|m|b, x, y, z
arguably the “a” from the F and the “a” from the AC need to be de-duplicated or the F column is actually “redundant” (hence the “ghosting” of certain columns in the spreadsheet).
It should be possible to add entries in the table for all DAM/editing software that can output ‘keywords’.
It should be possible to add entries that are guidelines compliant and add additional entries as those guidelines change!
If adopted I believe this strategy should go a long way to answering the concerns users have made about Keywording in forum posts and create a “future-proofed” architecture for keyword formatting.
This is the spreadsheet as a set of outputs to a watermarked pdf file.
Keyword Format V09-01.pdf (253.7 KB)
“For what “price” can this change be achieved”:-
I cannot answer that exactly and it is made up of a number of elements
-
Designing a table that is provided as a starter but is then amenable to user input (that could actually be a hybrid of an DxO created/maintained table and an adjunct user table).
-
Managing and importing the table (or part of the table) into DxPL which may require a restart to ingest newly changed entries or a ‘Metadata’ command to (re-)load the whole table, or just the user elements
-
Storing the table in DxPL, in an array, in the database etc.
-
Format selection fields in the UI, e.g.
1- Global default keyword format
2- Export keyword format added to the export options, allowing for multiple output options to be created
3- Added to the ‘Metadata’/‘Write to image’ command to dictate the format to be used (versus the global default) -
The actual coding change to implement using the user designated keyword format to construct the keywords. Going by my pseudo-code the actual process is currently very straightforward (by design I believe) and adding the use of the keyword formatting table, even accounting for the AC element discussed below, will certainly complicate it but not by much I believe and is arguably the most straightforward of the implementation elements. I feel the return on the investment should be very good value (but then I am biased)
DeDuplication:-
Typically after all keywords have been created for inclusion in the image (original and/or export) the list of candidate keywords is scanned and duplicates removed!
AC versus ‘Assign All’:-
In the spreadsheet and the format templates (“presets”) I have included an AC item, this stands for “All Combinations” and means that for “as|ms|b|bb” there will be “as|ms|b|bb”, “as|ms|b”, “as|ms” and “as” included.
This is included in the format templates for Capture One, PL5.1.4 (ALL assigned) and PL5.2.0 (ALL assigned) and, as the name for the PL5 formats indicates, it is possible to create this output by assigning ALL items in a tree in PL5.
The current Win 10 PL5 default for assignment is to select only the ‘Last’ or ‘Leaf’ keyword, i.e.
rather than ALL
But it is also possible to select
So relying on the use of assignment to accomplish the task requires all the keywords in the hierarchy to be selected for every hierarchy that exists, either manually as is currently on offer or automatically which might be part of the @Musashi “commitment” referenced at the start of this post.
A “safer” option might be to include the AC in the format definition and then to create all the elements of the keyword programmatically when there is no possibility of missing a step and the entire set of keywords is guaranteed to be generated, i.e. the “rule” is enforced.
There is a potential “clash” if AC has been selected and also levels of keywords have also been assigned (assign ALL) in PL5. This can be resolved in one of two ways
-
De-duping (removing) duplicates before writing the metadata to the image or to an export. This might mean excess overlapping work in PL5 generating excess keywords combinations which will then be removed in the de-duplication phase.
-
Programmatically checking if the keyword is already in the ‘ItemsKeywords’ structure which means that is already “assigned” and a programmatic generation of the keyword is not required because it will happen automatically using the normal PL5 process.