Did you use PRIME for your tests? Based on my own tests I’d have expected 8 to be optimal for 32 cores. But PRIME does more multi-threading by default so could saturate your cores quicker.
But right, the GPU does nothing for export. The only case when I see more GPU utilization on MacOS is when I do local adjustments.