PL4 GPU benchmarking?

That sounds reasonable. How about people download these selected images::

https://www.photographyblog.com/previews/nikon_d850_photos:

  • Images #4 through #8 (wedding couple - 5 RAW NEF - NOT JPG - images): nikon_d850_04.nef through nikon_d850_08.nef. These images have ISOs between 1800 and 12800, which should be a good test of DeepPRIME noise processing. NOTE: DxO PL reported ambiguity in which lens was used; I guessed it was an AF-S NIKKOR 105mm f/1.4E ED.

https://www.dxo.com/media.dxo.com/photolab/deepprime/raw/Egypte—copyright-Corinne_Vachon.raf

  • This is a 111MB image.

Report the total DxO DeepPRIME processing times (IN SECONDS) on this spreadsheet: https://docs.google.com/spreadsheets/d/1Yx-3n_8D3OreyVwQLA-RVqiQCAgNdXDkxScpXTur1Gs/edit?usp=sharing

(I added times reported by dma.)

5 Likes

Excellent spreadsheet. I went in and update my numbers and added some of the wedding photos. I also reran my Egypte photo runs a couple of times and found that each run was coming out the same at 29secs so I update the spreadsheet to reflect that.

1 Like

jch2103 googlesheets spreadsheet is starting to get populated with data. It really looks like DXO has tuned PL4 for a Nvidia 1060 class GPU. I tried a RTX2070 I had here and it was only 10% faster. That is a big jump in cost for only 10% performance improvement.

1 Like

What output file is being produced jpg,dng or Tif? Does this influence result.

I guess it may depend in the file write time. Choosing a full size export should not depend on filetype until DeepPRIME is working.
I used jpg, as my drive with the test photos is a hdd.

My data in the spreadsheet is for jpg output. Read from SSD. Write to SSD.

I have tested several combinations (TIFF/JPEG, with/without resize) on 3 different PCs and none had any significant impact on the numbers.
But was always from and to SSDs. (SATA and M.2 depending on the System)

All my export times were for jpg output. I amended the spreadsheet to suggest that.

I added results with my RTX 3080.
Used NVME SSD to be sure it has no impact on numbers.

Some general observations.
Canon 6DII file of the night sky ISO 6400.

Export a 16 bit Tiff. Prime at 60. PL3
Export a 16 bit Tiff. Deep Prime at 60. PL4
PL3, 28 seconds, PL4 17 seconds. Fans rev with PL3 not with PL4.
The screenshot shows the external GPU usages during export.

I’ve added my DeepPrime times to the spreadsheet.

So far, results have been interesting - it looks like most of my CPU cores are being used, albeit at less than 100% utilization, and the GPU is definitely in play due to my fast render times (approximately 8.5 20 megapixel Sony images per minute on at 1080Ti), but weirdly Task Manager is showing my GPU utilization as basically idle for the duration. My laptop, with a 2060, is a bit slower (approximately 6 images per minute), and I can see similar levels of CPU usage, with a GPU that, again, shows as idle most of the time, but occasionally spikes to 100% a few times per minute.

I have no idea if any of the utilization numbers I’m seeing are accurate. I’m guessing I’m missing some of the GPU spikes on the 1080Ti if they’re really short, but given that I’m not seeing either CPU or GPU running flat out, RAM is plentiful, and all of my data is stored on a SSD, I have to wonder where the bottleneck is. I’m inclined to think GPU, but it’s certainly not obvious from the numbers that I’m seeing.

interesting…
only 36 sec (DeepPrime, Egypte) on Nvidia P2000 - so that’s around 3Mpix/sec.

It should not be that fast.
CPU - i9 7900
64Gb RAM (DDR4).
I guess the amount of RAM made all the difference here??

DXO performance settings:
Maximum cache size: 20000Mb
Maximum number of simultaneously processed images: 16

P2000 is a humble performer, something is not adding up here.

Would developers be in a position to comment?

Task Manager in Windows is not accurate for GPU usage, it also show near 0% when running deepprime.
Afterburner which is more accurate show around 50% usage for Rtx 3080 but power draw is around 80% of the limit.
I also need at least 2 simultaneously processed images to reach max Mpix/sec.
I’ll do some more test to see if I can reach higher speed changing some parameters.

That makes sense. I’ve just fired up a test run with Afterburner monitoring, and that’s showing GPU spikes up to around 65% in places. GPU usage definitely looks like it’s in short bursts, so I’m guessing there’s still room for optimization somewhere, whether in terms of hardware or software.

I was planning on upgrading from the 6700K when Zen 3 ships, and going up from 16 to 32GB RAM as well, so I guess it’ll be interesting to see if a faster CPU with more cores can help keep the GPU loaded, or whether I’m purely GPU bound by the 1080Ti at this point.

I added times for my Surface Book 2 notebook w/ GTX1050, both for running on AC and on battery. Not surprisingly, times are better on AC (D850 images 196 sec, Egypte image 64 seconds) than when on battery (D850 images 252 sec, Egypte image 89 seconds). The integrated Intel 620 GPU processor is marked with an asterisk by DxO (i.e., don’t use); sure enough, it creates errors when trying to run.

This was mentioned in the German DSLR-Forum. So some of the forum names are from there.

Thanks; interesting. Google Translate is quite useful.

A Mac check:

I have an iMac Pro (Base) with a Radeon Pro Vega 56 (8GB) and an Intel Xeon W CPU (8 core, 3.2GHz). Processing a very noisy high ISO Sony A7RM4 image (61.7 MB) it took 20 seconds - about 3 mb/sec. I am used to running batches with Prime so I just walk away and let them run. But the GPU performance for Deep Prime is substantially better than what I used to see.

One note: in my experience, Deep Prime is valuable for high ISO noisy A7RM4 images only - no improvement at all in good light at ISO 100. But, I also have lots of old images shot with Canon Digital Rebel cameras, and these really benefit from the Deep Prime processing.

David

1 Like

Hi
It can also help in recovering shadows

2 Likes

I added my results to the spreadsheet. Surprised to see that no other Mac user has shared results so far.