Is PhotoLab 7 (and other DxO software) being optimized for ARM processors like Adobe applications are?

The new ARM processors for Windows (esp. Snapdragon Elite X) are being aggressively adopted by Windows OEM’s, and Adobe is quickly adapting its software to run well on such ARM processors without the need for x86 emulation, which could slow photo-editing tasks. Is DxO gong to stay competitive here if its customers adopt the powerful new Windows ARM processors as quickly as Microsoft and the OEM’s appear to expect?

1 Like

There is no official word on that from DxO but I have not done an exhaustive search.

Mark

I couldn’t find it on the web, either. I really need a cpu upgrade, and I want to consider those new Qualcomm-powered ARM PC’s now that they are being tested in the real world… but I spend a lot of time in PL7 and have to factor that in… Tx for responding.

For what it’s worth, I am still using an i7 6700 Windows 10 machine with 24 GB of RAM from 2016. The power supply has been updated, the C drive has been updated to a 2 TB SSD and the graphics card is now an Nvidia RTX 4060. Response time processing my 21 megapixel NEF files is excellent. Assuming you have a graphics card that is up to the job, in what other way is your machine not meeting your requirements?

I guess what I’m saying is that newer Windows machines even without that technology being used by DxO, should be significantly fast enough for any normal processing requirements.

Mark

Thanks for asking. My graphics card is an MX250, better than cpu graohics on a 5year old cpu, but otherwise not very powerful. Performance is mostly ok on editing, but I come to a screeching halt with each export if de-noising is involved. If I get a head of steam on editing (other than exporting) PL7 can also bog down and freeze up sometimes. I try to do exports in batch mode and either then get a meal or go to bed, but I’d like to work more interactively. I also have a huge backlog of video to edit, which is not a DxO issue, but is relevant to what I buy. I’ve heard the new ARM processors are sometimes paired with an RTX, but haven’t gotten into the details. My first priority is an adequate graphics card, even if I do it with a fixed desktop. But gee, it would be nice if I could cover all bases with a laptop with both graphics power and battery life when used without a power cord. I plan to dig in once there’s more of a track record with the new ARM processors.

As you have found out, the MX250, which as you point out is only 5 years old, was obviously not designed for graphic intense applications and it’s GPC is just not up to the task of handling DeepPRIME and DeepPRIME XD. When you are ready to purchase a new computer get back to us with the graphics card options you are considering and the raw file megapixel count of the files you will be processing. Also supply the computer’s processor specs. We can then provide you with a general idea of how long you can expect export processing of files to take with DeepPRIME or DeepPRIME XD applied .

As an example, with my current setup, DeepPRIME processing takes around 6 seconds and DeepPRIME XD takes around 9 seconds. With my older and less powerful GTX 1050 Ti card DeepPRIME processing took around 20 seconds and DeepPRIME XD processing took around 35 seconds with nothing else changed.

With no card at all selected, DeepPRIME would take between 60 and 90 seconds and DeepPRIME XD would take between 120 to 150 seconds. With no usable card available the processor’s CPU and ram does all the work. and is the limiting factor on performance.

Mark

A friend of mine has a snapdragon chip (the new arm lenovo) and we tried DXO photolab 7. It worked well but export times were a bit slow (58 seconds with Deep prime xd). We did check and the GPU was not being used for this even though we changed it in the settings to use the GPU. But if you are looking for a laptop some options would be buy the arm and wait a bit but still use it as is or maybe a Asus zephyrus g14. These go on sale frequently are a good bang for buck. They are pretty small for a laptop with dedicated gpu and weigh 3.5lbs (macbook pro weight).

How are graphics handled in in this computer? If it’s an on board Intel graphics chip with little or no memory of its own rather than an appropriate Nvidia or Radeon card, that might be the reason.

Mark

I tried PureRAW 4 on a brand new Surface Pro with a Snapdragon X Elite. Currently, the performance is about 10-20 times slower than on a Mac Mini M1 - when processing 40 MP images.

I’d like to switch to the Surface Pro for this task, but given the current DXO performance it’s not an option for me as of now.

I’d love for DxO to optimise PL for the new ARM processors. I remember from when the Mac app was optimised for Apple silicon, the export process was optimised first (thus allowing the NPU to be utilised for DeepPrime), and the rest of the code was optimised later on. Once optimised, it made a massive difference to performance.

My Windows laptop has a Nvidea GeForce RTX 3050 which is used for DeepPrime, but my M1 Mac Mini can export the same images twice as fast using its NPU, despite the laptop having a similarly powerful i7 CPU as the M1 in the Mac, and the laptop hastwice as much memory as the Mac.

Additionally, the Mac’s fan can never be heard, but the fan on my laptop spins up to full speed during export and the battery takes a hammering too.

Many laptops don’t have a massively powerful GPU, but from my experience with DeepPrime and the Mac NPU, utilising any NPU in Qualcomm and future Intel/AMD machines has huge potential and benefits, especially for anyone using battery power :slight_smile:

I’m another person who is very interested in buying an ARM Windows laptop in the next year or so, I’d love to see ARM optimisation for Windows happening sooner rather than later.

1 Like

As a long time Surface Pro User (and Dxo Optics Pro/Photolab user), I’m really interested in the possibilité of Dxo compiling natively their software for ARM Copilot+ PCs…

That could make an awesome combo…

That’s not how ARM stuff is done. It’s a System on a Chip. The graphics are a Qualcomm Adreno like you get in mobile phones. It likely doesn’t support OpenCL or DirectCompute, and definitely not CUDA since it isn’t NVidia. They generally have directx and relatively lousy vulkan support and that’s about it. Right now these are glorified cellphones.

Apple knew their entire PCIe design was screwed and 3rd party graphics would be unstable (from an internal source I can’t name because I don’t want anyone fired), so they paid somebody to design the GPU + the thing they call NPU because it sounded futuristic at the time (the NPU would be low-bitness integer instructions on a normal GPU, apple just split them off so the AI stuff that can be quantized to 8-bit can be done on its own, DxO is almost certainly running on the GPU part). IIRC it’s something like 90% of the chip space because Apple knows the only people buttering their bread either have too much money or are in graphics design and ARM without fast graphics would have been an impossible sell.

From talking to people running Stable Diffusion on Mac Mn chips which is heavily memory bound, there was actually a slight speed decrease in speed running the networks from M1->M2. There’s no M3 ultra for comparison but the memory bandwidths are identical so the trend should continue. The problem is that bandwidth didn’t increase but GPU core count did, so things get stalled waiting for data more and more and it actually hurts performance. This isn’t a thing with something like DxO so there would likely be some slight speed increase but IIRC what the Apple engineer told me the ARM chip is the limiter since most memory access has to go through it to the GPU and NPU and they’re all shared; stable diffusion generates all of its data in-memory and there’s a ton of it since the latent space for a 2D output image is 4th dimensional. If the OS decides to run a language model at you while the GPU is processing the performance will also drop. If there’s heavy CPU activity it will drop. This doesn’t happen with discrete GPUs unless the processor is so overloaded it can’t pass off data to them fast enough. They suffer from the limits of the PCIe bus to get data on to the card which probably explains why the (terrible but not that bad) 3050 someone has is slower than an M1, but once it’s there they generally have more memory speed available than the mac GPU and the chip dies are gigantic with obscene amounts of cache that operates even faster. So a lot of it kinda depends on how much data DeepPrime generates on-card vs. how much it keeps from the original image. Even if it has to redo everything on the card it’s unlikely to be operating in some funky higher dimensional space so you won’t get the level of data synthesis that the neural nets usually produce. The simpler denoising networks available as open source tend to be one-pass and are speed limited by how fast you can shuffle images on and off of the graphics card. DeepPrime doesn’t sound that fast and is probably at least much more complex (in network size if not further factors) based on the data it was trained on.

Qualcomm makes cellphone chips primarily and their approach is vastly different. The GPU is just enough to play simple cell phone games and draw graphics quickly. Then there’s another processor of a different architecture than ARM (called Hexxagon) which is a packetized vector processor that the operating system usually can’t reprogram. They sell feature IP for these that does things like extremely rapid facial recognition, voice recognition, etc. It’s mostly for running semi-hardcoded neural networks and handling the output of the cameras in realtime. It’s the equivalent of the NPU more or less, except way faster but with the downside that you can’t do anything with it.

Unless they stick one of them in desktop with a PCIe bus and somebody convinces NVidia and AMD to write full featured drivers (which based on both of their track records will be stable in about 30 years) you’re not likely to see a chip that can do reasonable speed compute tasks on one of these. People are willing to pay the extra expense for the battery life but that’s about it. Add a real GPU to it and no matter how they design it things begin to tank miserably.

Windows Surface Pro 11 with Snapdragon X 12- Core @3,40GHz and DOX PhotoRaw 7 export (DeepPrime XD) round about 1 h for 3 Fotos…

On MacBook Air with M2 seconds…

Thank you for the results. The next purchase will be a Snapdragon laptop or Snapdragon Surface Pro table, so …

The current desktop ≈ OK w/ 4060 + Intel(R) i7-11370H 3.30GHz.

The current Surface Pro = i7; it does away from home travel editing.

Would I drop Intel completely on the desktop, yes for the power and fan noise issue.

I will try to wait until there’s some market experience with Intel’s new Lunar Lake chips, which may have closed the gap with Qualcomm. I have compatibility questions with Qualcomm, but Intel leapfrogged its last generation of chips early in order to catch up … on the heels of reliability issues. However, if the current gen of chips are fast, have good bat life, AND are reliable, we may have a traditional Wintel solution later this year without compatibility issues. (I grok that not everyone can wait!)