I’m incredibly disappointed with the DxO team. Apple announced their intent to switch to their own chips, announced the Universal App architecture, and released the first Developer Transition Kits to developers in June 2020, which was 16 months ago.
DxO PhotoLab 4 was released at the end of 2020 as an Intel app for Mac — compatible with Apple Silicon under Rosetta 2 — but not as a Universal App, which would have delivered the best performance.
The first Apple Silicon Macs were also released at the end of 2020.
Now, here we are at the end of 2021. Apple has released a number of new Macs and three different incredibly-power Apple Silicon chips, yet DxO has just released PhotoLab 5, still as an Intel app and not as a Universal App.
The improvements in GPU processing and optimizations for M1 are appreciated, but there is no excuse for DxO PhotoLab to not be a Universal App at this point.
During the Apple Event several days ago for the new 14-inch and 16-inch MacBook Pro, Apple showcased Adobe Photoshop and Lightroom, Serif Affinity Photo, and Capture One; all of which are now Universal Apps and offer the best performance on the new Macs.
Of course, they didn’t showcase DxO PhotoLab.
I have been a tremendous fan of DxO’s software for the past four years and have referred a multitude of photographers to PhotoLab. However, seeing the lack of progress with DxO’s software development and their seemingly clouded vision for the future of their apps, I will no longer be purchasing upgrades for my DxO software until major updates are made. There are now too many alternatives from other companies that are actually optimizing their software for new and upcoming hardware.
I totally agree. I have multiple apps that use the Qt framework and all they did was flip a compiler switch and did an M1 version build. So it is obviously not an issue with the framework. Are they just so obtuse that they do not want to offer an M1 installer for Apple Silicon?
What work is a lack of M1 support preventing you from doing? It is using the new silicon in the most important areas. You should compare noise reduction in C1 and Lightroom with PL5. You call that state of the art?
The reason that there is no native support yet is that it’s likely more complicated than everyone here make it to be.
Apple M1, M1 Pro, and M1 Max chips include a 16-core Neural Engine. The Neural Engine is a neural processing unit (NPU) / AI accelerator. DxO PhotoLab includes a feature called DxO DeepPRIME, which DxO describes as using “deep learning artificial intelligence technology.” While that wording makes for great marketing, the feature is simply based on a convolutional neural network, which is precisely what Apple Silicon’s Neural Engine is designed for. DxO is currently handing this off to Apple GPUs as Metal calls, but using the correct methodology and APIs to run this on the Neural Engine would provide speed and efficiency improvements that are orders of magnitude higher than using the GPU.
Amateurs may not mind waiting for images to process, but DxO PhotoLab is targeted towards professionals. Professionals understand that time costs money. If a task can be performed faster by technical means, then it should be made to perform faster. It’s quite a simple concept.
DxO claims to have state-of-the-art features, yet why are they not taking advantage of state-of-the-art hardware by doing things the correct way?
I appreciate you pointing out the obvious that everyone here is already aware of, and you’re right; it is complicated. That’s precisely why users like myself have been purchasing DxO PhotoLab upgrades every year, costing $100–$200 each time. In order to fund DxO’s software development and adapt to new technologies. The problem now is that DxO is not adapting.
You claim to be patient. How much longer than 16 months do you think is appropriate?
That’s not being patient. That’s being complacent.
You are aware that utilizing those matrix units is no easy feat and nothing that comes by itself and they are using CoreML…and not Metal?! (What the driver does in the Background is a different thing)
The code and also the data has to be prepared and tailored to very specific implementations of such Matrix accelerators…and each one is totally different in how they need to be handled.
Be it Apples NPU, Googles TPU or nVs TensorCores etc.
Just take a read how complicated it actually is to use the nVidia Tensor Cores via DirectML…and those have a much greater installed userbase.
And just because there is inferencing going on it doesn’t mean that those matrix units are necessarily really the most sufficient way for that, depending on the data that needs to be processd.
If FP32 datasets are used, which DeepPrime seems to be for the most part hence therefore probably also the dependence on the fp32 throughput, most matrix units are out of consideration anyway.
(The only Matrix Units that i’m aware of that can process FP32 Datasets are AMDs Matrix Cores from their most recent CDNA Professional GPGPUs.)
I still think they must have done some optimizations for at least some steps of the processing in this regard, in any other way the up to 4 times improvements on the Apple Silicon can’t really be explained…
The M1 GPUs are good for GPGPU but not THAT good, especially since they already performed quite well in PL4 according to their theoretical raw processing power…there must be going on something else to get such a large amount of improvement!
Oh and by the way…blame Apple!
Unfortunately Apple isn’t giving third-party developers any guidance on how to optimize their models to take advantage of the ANE. It’s mostly a process of trial-and-error to figure out what works and what doesn’t.
From what I can see, PhotoLab 4 or 5 don’t have serious performance issues when run on hardware with dedicated graphic cards (don’t use DeepPrime with Intel graphics, export jumps from 15 seconds per image to 15 minutes).
The only routines for which there is urgency to rewrite the code to support M1 processors directly is where there are substantial performance gains to be made. That would include calculating previews with PrimeNR or DeepPrime enabled and export of images with Prime or DeepPrime enabled. As DeepPrime outbats PrimeNR all the time, even limiting optimisation to DeepPrime routines should be enough.
Good work, covering all Mac GPU by targeting optimisation to Neural Engine instead of the cards themselves.
I would be absolutely fine with living with PhotoLab 4 DeepPrime speeds on Mojave if it’s impossible back engineer the same speed improvements which come with Catalina or Big Sur. Running faster on more recent OS is fine. Not running at all on OS -2 is not okay.
Hello Steven, and thank you for correcting me. I’m tremendously glad to see DxO PhotoLab 5 is now using Apple Silicon’s Neural Engine. For M1 specifically, the speed improvement over GPU processing appears to be approximately 2x.
To answer your questions, I own several Macs with Apple Silicon chips and have been using the trial of PhotoLab 5 alongside my purchased copy of PhotoLab 4.
Do you have an estimate for when DxO PhotoLab will be a Universal App?
( Marc (macOS Ventura on MBP16" Intel))
Does it mean that when new CPUs comes out with more ANE processing power comes out (probably next year with Mac Pro), PhotoLab will automatically benefit from this extra processing power ?
Is this the way you get the most out of Apple chips ?
Thanks for your explanations.
Yes, if Apple comes up with more powerful Neural Engine cores or simply more cores in the ANE we expect that you’ll see a matching speed increase for DeepPRIME.
Latest M1 Pro and M1 Max look to have the same spec ANE as in M1 so we don’t expect a difference there for DeepPRIME. Other corrections will still benefit from more CPU cores though.
Additionally, till now we saw that the ANE was 5 times faster than M1’s GPU for ML tasks, presumably at a much better power efficiency. Now with M1 Max that has up to 4 times the number of GPU cores of M1, the difference in speed is likely much smaller. So for users who don’t mind heating their laptop, we might want to use both the ANE and GPU at the same time and get something close to twice faster in the best case.
( Marc (macOS Ventura on MBP16" Intel))
Ah thank you for the details.
It helps understand a bit better how PhotoLab works and how Apple is increasing power from one generation to the next.
Combining ANE and GPU to reduce export time could be great for some maybe.
For a more modest user using ANE to export and GPU for real time editing (or the contrary or whatever combination works better)… could be interesting.
I guess we will see some more options in preferences panel of PL in next few years or months
Thank you @StevenL and @Lucas for the info on how DxO is optimising PL for M1 Macs.
I, and I’d imagine others, have read about how Adobe LR and Capture One Pro have been optimised for the M1 chips and their optimisations appear to have encompassed areas wider than just noise reduction.
For example, file import, general rendering, cropping, and exporting being areas mentioned in reviews of the LR and C1P updates following M1 optimisations.
PL is used for many editing processes other than DeepPrime. I use DeepPrine, so the performance enhancement is welcome. But I use it a huge amount more for viewing, cropping, correcting horizons, applying of control points, zooming in to 100% to check focus, and other general editing functions to thousands and thousands of images every year. It is optimisations to these every day functions that are applied to every image that I’d really appreciate PL to be optimised (in any possible way, including M1 chip optimisations) for too.
If the time it takes PL to render a preview could be shortened through optimisations gained from the M1 chip architecture then I would be absolutely over joyed. I cannot over emphasise this enough…
It is a bit hard in these articles to know what part of the speed comes from the native version, from the software update or from the different hardware.
As far as I can see there is no comparison of the same software version with and without Rosetta on the same hardware, so to me it rather looks like a mix of several factors that, when combined, look nice for marketing purpose .
There are opportunities for optimizations specific to Apple Silicon hardware thanks to the Neural Engine or powerful GPU with a large pool of memory. M1 Macs also have faster CPU, faster memory and faster storage. But all of these shouldn’t make a difference between a native or translated application. For now in our tests, native code is about 20-30% faster than translated version. We shall see how this translates at the whole application level.
At the moment I’m not able to tell more about which Apple Silicon specific optimizations we will be able to take advantage of in PhotoLab.