PL4 GPU benchmarking?

The results you mentioned are not conflicting!
One run is performed on the GPU…the other one on the NPU (aka AI Accelerator aka the 16 Core Apple Neural Engine or ANE! with it’s ~11TOPS which equals ~11TFLOPS for fp16) Hence also the difference in the coloured cells in the “GFLOP” Rows.

All M1 (and also the A14) have the same 16 Core ANE. The Max and the Pro even the same CPU.
Only for the memory controller there is a slight difference…so it’s no wonder those are basically performing the same while being a tad faster than a regular M1.

In fact the presence of the same NPU in each A14/M1 derivative suggests even a iPhone 12 or an iPad Mini could theoretically perform almost in the same ballpark. (lower Powerlimit and fewer CPU cores and weaker memory subsystem aside…)

In fact this also suggests that DeepPrime itself could also in theory run reasonably well on some ARM SoCs used for Android Devices or Chromebooks or Windows on ARM, since those have pretty strong dedicated AI/ML Accelerators too. (Especially the Snapdragons and Exynos)
Some of the newer ones are almost twice as fast as Apples. (24-26 TOPS)