PL5 - DeepPrime and performance gain on Windows?

@uncoy I don’t think it’s a question of whether they want to optimize for those older GPUs or not. Those older GPUs are way past their prime and simply do not have the computational power to keep up.

@Lucas Noticeable speed improvement, but sadly the program still seems to only use my RTX 3090 at about 1/2 capacity, and I’m running the latest Nvidia Studio Driver.

I hear you on the 1080Ti lacking the hardware for further enhancement, it’s just a shame to have a performance hit, at a time when better GPUs aren’t widely available. The former is fine, the latter not so much. It’s good to see things moving forward, though, in the sense that I’ll eventually get my hands on a faster GPU, even if it’s not for a year or two, and then be able to get a really major performance boost.

As for the GPU utilization, that may in part be down to your CPU. When I moved from an Intel 6700K to an AMD 5600X, I saw around a 50% boost in DeepPRIME performance with the 1080Ti, and saw some increase in GPU utilization, although it’s not exactly running flat out all of the time. I think that really fast GPUs can sometimes end up being held back by the CPU, as I’m fairly sure not all parts of the export process are GPU accelerated.

1 Like

I tested 7 Sony A7rIV images and the process time per image went from 17.3s to 10.7s, so an improvement of around 40% which isn’t too bad.
Nvidia 2070 Studio Drivers

1 Like

I’d be surprised if an nVidia 1080Ti is unable to compete. In fact, an nVideo 1080Ti scores 61029 on OpenCL in Geekbench. Ahead of a Radeon Pro Vega 56 and the Radeon RX 5700 XT and a nVidia RTX 2060.

What’s the next excuse for the lack of support?

I didn’t read it as lack of DXO support just whether the GPU routines used by DXO are provided in the Nvidia driver for the 1080Ti?

Different question. What DaveyWilson wrote was:

Those older GPUs are way past their prime and simply do not have the computational power to keep up.

Sadly, I’m starting to feel this is a forum of hardware junkies and not photographers.

The Nvidia GTX 1000 series is based on Pascal microarchitecture. RTX 2000 Series (Turing) and RTX 3000 Series (Ampere) use different microarchitecture. Starting with Turing Nvidia introduced Tensor AI Cores (used to support AI, deep learning,…). E.g. if DXO makes use of them to speed-up the DeepPrime processing, there is no chance to have this on gtx series, since the HW for that is missing here…

@MouseAT I’m running an AMD Ryzen 9 3900XT which is no slouch, and had no issues with my Radeon Pro W5700 which the NVIDIA RTX 3900 Founders Edition replaced.

That’s a far cry from

Those older GPUs are way past their prime and simply do not have the computational power to keep up.

If DxO is basing everything they do on the very latest libraries, using their software is going to quickly become very expensive.

DeepPrime from PhotoLab 4 runs plenty fast on good graphic cards in my opinion so the improved speed is roughly irrelevant. It’s a pity that DeepPrime is a bit slower in PhotoLab 5 for the older cards but the difference is no big deal. Just makes updating for speed improvements in DeepPrime irrelevant. I’m sold on the update for the improvement to local adjustments with variable chroma/luma and Control Lines (instead of existing Control Circles).

Unfortunately due to DxO’s draconian OS -1 support policy on Mac, I can’t buy or use PhotoLab 5. Mojave 10.14 is apparently not enough even though CaptureOne manages to support High Sierra 10.13.

MouseAT I am very late to this “party” but while I welcomed improvements to the Prime processing in PL4 with the introduction of DeepPrime I “resented” having to pay for the privilege of the speed improvements particularly at the beginning of 2021 just as graphics cards had become scarce, prices had risen and even bottom end cards had become expensive!

I “resented” it because I believe, rightly or wrongly, that while developing RAW photos with “big” skies the various PL4 (and PL3 before it) settings were creating noise in the clouds which Prime (and then DeepPrime) managed to remove, i.e. if I wanted the effects I had to buy a card to make render (export) times more palatable!

So I managed to purchase a 4GB 1050TI for my beta test machine (£160) and a 2GB 1050 (£110 second-hand) for my main machine. I just ran a test on the main machine and PL4 took 27 seconds and PL5 took 24 seconds to render the same photo, essentially maintaining the existing speed.

However, the current marketplace appears to have none of the low end cards at all and mostly starts at £430 for an RX6600 rising to £1700 for an RTX 3080 and £2,400 for an RTX 3090; here’s hoping the Bitcoin miners have a “rock fall/roof collapse”, to bring the price back where it should be and help save the planet from the energy being consumed !!!

If DxO are using such high end cards to make the numbers look good that is one thing but if they are completely ignoring the fact that many of the users are not using such “exotic” hardware then that is another!!

Why is so little GPU being used from my “Weedy” Graphics card?

One problem that I do have is the very small amount of graphics processor that PL actually consumes while exporting. The reason for this might have been discussed elsewhere on the forum but I would be interested to know the reason and whether the PL4 testing of graphics cards is going to be rerun for PL5 with a comparison of results between PL5 and PL4.

Make the Noise Option an Export parameter & enable “Contact Sheet” output:-

Possibly because of my physical (eyes) or mental make up (photographic retention) I cannot successfully compare images unless the transition time between them is essentially none, i.e. the time it takes to move from one image to the next in PL is too long for me to effectively compare which of my alternative option choices are better, worse of indifferent!!

This time difference is too great even between images and virtual copies. Hence, the only alternative is to export the photos to enable the comparison in a browser/editor that is not attempting to render at the same time as presenting, plus products like FastStone Viewer and FastRaw Viewer can provide multiple image comparisons, if required. Currently that means changing the Noise reduction from DeepPrime to HQ to speed up the export process and then change it back when I want to export the final JPG!

If the noise reduction process to be applied could be defined in the export process then life would be so much easier. I could create an export profile with HQ selected in addition to the one with either DP selected or where it is left blank and it defaults to the settings for the photo and I get to be able to export for review (my contact sheet if you like) before my production exports when I have decided which options selection is the “best”.

An alternative would be a smart snapshot function in PL5 itself but that is probably going too far?

2 Likes

AI has progressed a lot in the past years, and hardware has followed. Unfortunately this means that GPUs before these improvements are not so much well suited for AI. We optimized for the latest GPUs because we saw that there were opportunities thanks to specific hardware improvements. To improve speed on older GPUs, we would need to change the neural network that is behind DeepPRIME, with a high probability of reducing image quality at the same time.

I might be a bit biased as I like Apple products, but at the moment one of the cheapest options looks to be Apple M1 Mac Mini thanks to its Neural Engine. Base price is 700$ and as you can see in DxO DEEPPrime Processing Times - Google Sheets it performs quite well (we’re still lacking PL5 updated times for high end GPUs though).

Assuming that your GPU is chosen by “Auto” mode in Advanced preferences for DeepPRIME acceleration or that you explicitly chose your GPU there, you need to be careful with the tool you use to check GPU usage. For instance Task Manager displays by default activity for 3D work, while what is used by PL for DeepPRIME is what’s shown under “Compute” entries:
Screen Shot 2020-10-01 at 09.02.47

1 Like

As Lucas said, you have to be careful which metrics you are monitoring.

There are several queues for modern GPUs and especially Windows isn’t really great at displaying them properly.
The default “GPU Usage” is very 3D specific and mostly only useful for Games or General Realtime Graphics.
And to make things worse the queues present also depend on the Vendor, GPU Gen and sometimes the drivers.

The best Tool for monitoring this is still HWInfo.

Another thing to be aware of is that DeepPrime has the tendency to generate rather short “bursts” of load first on the CPU, then on the GPU and then on the CPU again which aren’t necessarily properly displayed either. (the faster the CPU and GPU and the smaller the data to be processed the less obvious this is.)
This is especially true for the fastest Systems and can be really hard to notice if there is a batch running where this behavior is interwoven and overlapping.
This will lead to high peak loads but only for a very short amount of time!
Which might look as if there isn’t much of a load at all, but only because of the sampling rate of the monitoring which isn’t fast enough.

Funfact:
This can also lead to stability issues when the PSU isn’t really sufficient for such kind of loads.
I had to completely change my be quiet 600W PSU for a 800W one to get rid of the triggering of overcurrent protection which only happend when processing batches with Deep Prime!
Which was way harder than any other synthetic or non synthetic load that i’m aware of, because most games aren’t nearly as demanding on the CPU and most synthies or “Powerviruses” for testing the PSU tend to generate a steady load.

If DxO are using such high end cards to make the numbers look good that is one thing but if they are completely ignoring the fact that many of the users are not using such “exotic” hardware then that is another!!

Well i don’t know what you are expecting them to do.
PL4 might probably already runs the NeuralNetwork almost at its best with older hardware (pre 2017/2018 stuff)

Optimizing Consumer Hardware towards the execution of ML Algorithms is a quite recent thing and we’re still somewhat at the beginning here.
It’s just like any other novelty.
You also don’t expect a 10+ year old PC to be able to display 4K Video or even HDR Video etc. without any problems or do you?

2 Likes

Lucas Than you for your feedback. I understand the need to embrace new technology particularly when it helps solve or actually solves a specific problem.

In addition DxO is in competition with other manufacturers of editing software and the “arms” race dictates using every means at your disposal to elevate the position of your product with respect to speed and quality, particularly with respect to areas such as noise reduction.

Unfortunately such an “arms” race can leave some of us “out in the cold”. Graphics cards are a particular area where this has happened big time, partly because the use of the technology they contain for AI and partly because the cryptocurrency boom has caused a huge price hike.

I programmed my first computer in 1964 (an ICT 1301) at the start of my degree in ‘Computing and Data Processing’, went on to do some research then teaching before joining Burroughs Machines (later Unisys) where I worked for 36 years before retiring in 2009. I have first hand experience of the changes that computing has undergone. but retiring means mostly careful husbanding of resources and, hence, I am typically behind the curve with respect to the hardware at my disposal…

I have never been a fan of Apple for various reasons (that should generate a load of comments), although my youngest son uses Macs in his freelance videography business and recently bought an M1 laptop while my other son built a heavyweight Ryzen at the beginning of 2020, fully armed with a heavyweight graphics card, so that he could do architectural modelling and rendering at home as well as at work (came in handy given the lock downs we experienced)!!

I have only ever bought 1 manufactured machine in my lifetime (excluding laptops, Ataris and an Alan Sugar CPC or two) all the rest have been home (own) builds and that effectively precludes using the Apple operating system (I have no desire to try to build a Hackingtosh). Currently my systems consist of 2 x I7-4790K, 1 x I7-3770 and 1 x I7-2700K all running Windows 10 with 12TB of storage on all but the I7-3770 which is effectively in retirement. I am now looking to build a new AMD system of some description and retire the I7-2700K but that won’t happen for some months yet and buying a manufactured system may be the only way of getting a new graphics card included without paying all of the current premium.

So my gripe is affordability, the only reason I currently have for any graphics card at all is for DeepPrime. In your response you indicate that to use some of the older graphics cards more effectively the algorithms would have to change and might harm the rendering quality. From a DxO standpoint such a (sub) project may not be acceptable, from a users perspective I might well be amenable to paying a bit more for an additional license if it means not spending £500 - £1,000 just to buy a better card which still won’t be the pinnacle of excellence and will be “yesterday’s” technology even before I buy it!!

If you made it this far, thank you, if you didn’t I fully understand but of course you won’t get to read that bit!!

1 Like

Abgestumft:
Yesterday I made a test when I was preparing the material for a digital portfolio and blog. I batch processed 200 DNG-images with a Protolab 5 Deep Prime export to JPEG and and compared the result with old figures doing the same when Photolab 4 was new.

With Photolab 4 it took my machine (a five year Intel i5-6400 with 8 GB and NVIDIA Geforce 960) an average of 33 seconds per image with Deep Prime. With Photolab 5 the average was 24. It is absolutely faster.

Photolab 4 needs 37,5% more time in average with Deep Prime than Photolab 5 on my ASUS-machine. It´s not so much but might make up for my anticipated upgrade from Sony A7 III (24 MP) to A7 IV with 33 MP. It will exactly on the decimal preserve status quo which I am glad for :-). So thanks for that DXO :-).

Alec: For what it´s worth: My machine and graphics card is about 5 years old. Still got some gain.

Stenis I am glad you experienced a performance boost in going from PL4 to PL5. As I stated in an earlier post my results were less dramatic than yours.

I repeated the tests on my two machines that are very similar, I7-4790K @ 4.4, 24GB of memory and Sata SSDs for the operating system etc and HDD for storage. The main difference is that one has a GTX1050 2GB graphics card (A) and the other a GTX1050TI 4GB graphics card (B).

The TI card is similar in performance to your GTX 960 I believe, the GTX 960 is just in front of the TI cards in the Google spreadsheet column for processing D850 photos.

The versions of software installed differ with PL 5 trial on both machines but PL 4.3.1 on B and 4.3.3 on A.

The tests were individual exports (repeated a number of times) and then a group of 11 photos all from a Panasonic G9 (Olympus lens) and the RW2 files are 20 megapixels (23MB) in size.

Export on B (GTX 1050TI 4GB) took 20 secs, 21secs for PL5 and 23s, 22s for PL4 (V4.3.1).

Export on A (GTX 1050 2GB) 28s, 29s for PL5 but then 54s, 31s, 32s for PL4 (V4.3.3). At this point I realised that the photos were on a SATA SSD because I had been testing for any difference SATA SSD versus HDD (a little difference), but still cannot account for the “high” times encountered.

Reverted back to HDD on A and results PL5 - 27s, PL4 - 28s.

So a very small difference between PL4 and PL5 (1 second) but a bigger difference between the two graphics cards (6 seconds).

For the group tests exported 11 photos on A which took 3mins 43s (20.27sec/photo) for PL5 and 3mins 49s (20.82 sec/photo) for PL4(V4.3.3). On B this took 3mins 9s (17.18sec/photo) for PL5 and 3mins 13s (17.55 sec/photo) for PL4(V4.3.1).

Less that a second difference between the two releases but about 3 to 3.5 seconds per photo between the two machines and a better performance when pushing many copies through the machine in a batch. Is the difference in the nature (type/scene) of the photos being taken? As I indicated in a previous post DeepPrime helps me reduce noise in skies after my (possibly over enthusiastic) application of ‘ClearView’ amongst other PL edits.

1 Like

I feel the same as you … the U point is fun, but the level of extra gained control you have over it vs PL4 is modest … to top it off for me the upgrade software with my MacBook air has made processing in deep prime in both pl 4 and 5 so freaking arduous…

1 Like

The last week I have been postprocessing old dia positives of pretty poor quality that I repro photographed earlier. When I processed them I had to do all sorts of tricks to get them usable. It took the dubble time and sometimes more to process them than digitally born images and I wonder if that´s because I emulated Kodak t-max 100 in Photolab in order to destroy the really teribble grain these ORWO images have, especially in skies. I bought the film in Syria 1973 (that had ties to Russia and East Germany at that time) and some of the film was depleted because of to hot improper storing I believe.

I took the images when I visited Petra in Jordan. You can look for yourselves, the image quality is far from todays average digital image quality but they get som other strange qualities instead. The images are very soft with low dynamics. I have been very close to throw them away before i started to use Photolab intensively.

Petra - Den glömda staden - Fotosidan

Lucas, I was curious about this, but my experience on my machine is that the real test is not the speed to process one image. but the speed to process 100 images. I don’t believe the processing time spreadsheet really handles that full load and potentially throttled system issue.

Anyway, I’m running on an RTX3070 based laptop (Ryzen 5900HS 8 core CPU) and the performance is really good.

Stenis (@Stenis) I replied to an earlier post of yours in this topic and congratulated you on finding an improvement in the DeepPrime processing time between PL4 and PL5, not present with my 1050Ti & 1050 cards. I also came across your post above about the processing of your old images of Petra and the fact that the processing of photographs you took with your current A7 III took double the amount of time than “normal” images taken with that camera.

With respect to the performance improvement, I am glad that you seemed to have gained an improvement but on reflection that leaves me confused! I am confused because the first post in this thread (topic) described performance from a GTX1060 card and referred to a post elsewhere involving a GTX1080 card which actually showed a slight decrease in performance!?

The GTX 960 you used is a generation before my 1050Ti (and 1050) and faster than the 1050 but very slightly slower that the 1050Ti although in the Google DxO spreadsheet the 960 is just above the 1050Ti!? My processor(s) (i7 4790K) are a bit faster than yours so I cannot understand where your performance improvement comes from!? In fact the PL4 versus PL5 entry from @Savay for the GTX970 GPU shows a very slight improvement only (1 second for processing the test batch).

It is possible that Lucas (@Lucas) can help because he has stated that the performance improved with the RTX2000 cards onwards, i.e. with cards equipped with Tensor cores.

With respect to your photos of Petra I am glad that you managed to salvage your old images using PL5 and Filmpack (for the Kodak t-max 100 emulation). I ran a test with an image I took with a very high ISO last year (a very high ISO of 20,000 in good light because I had left the camera with the wrong settings from an “experiment” the night before) with and without the emulation and the times were essentially the same. I have attached the photo and PL5 DOP (in a zip file) with a Virtual Copy (for the Kodak Emulation) so that you can test that image on your machine if you wish.

I would guess that the increased processing time is caused by DeepPrime having to process the noise from the original media alongside the noise from the A7III. If you would like to share a single image (+DOP) via the forum or via a direct message then I will run a test on my system and see what that shows up.

In the blog you describe the process more fully and indicate that ‘Bicubic’ may well have helped, implying that the images were resized on export. I have not applied other fixes to my tulip image as your blog suggests nor have I resized the output.

Personally I always export from PL full size and then use FastStone Image Viewer resizing to batch resize (I maintain a 1920 x 1443 library alongside the original images, they are now the only images on our NAS and provide faster access for tablets and smartphones and a much more portable library). FIV offers Bicubic, Lanczos 2 (Sharper), Lanczos 3 (Default) plus other algorithms for resizing and can be left unattended to resize large numbers of images; my personal choice for resizing has been Lanczos 2 (and given that all my images prior to 2018 were JPG I generally refrained from applying the PL ‘Lens Sharpness’ fix because that plus Lanczos 2 tended to provide “oversharp” images!) I need to revisit my strategy with respect to RAW processing e.g. use ‘Lens Sharpness’ but change to L3 or Bicubic or …for the Library images. I am sure other forum members have their own resizing favourites.

@MikeR and @Lucas if you want some “boring” general landscape photos taken while wandering a golf course with my wife during the UK lockdown earlier this year I should be able to provide 50, 100, 150 etc. RAW (20 megapixel MFT ORF) images from an Olympus EM1 MKII showing landscape shots with sky, trees, wind turbines etc., I will remove any images with people and dogs etc… This would then require a new column or additional sheet for any that want to run long batch tests.

My issues with this would be actually getting the photos uploaded “somewhere” when I have an almost non-existent upload speed (though that could be done e.g. via Flickr and then “publishing” a Flickr “album”). This then poses the logistics issue of the downloads required by those (if any) wanting to take part etc’

However, it provides a consistent batch of images (others may have better ideas with respect to how representative such a group of images would be) to compare and contrast both the GPU and processor performance and additional elements like storing such images on NVME, versus SATA SSD versus HDD and other “nerdy” issues that might well be important for tuning current hardware and when selecting future upgrades!

PL5 Release 5.0.2.zip (21.3 MB)