Subtractor meaning:Between stage1 and pmos output pass transistor, insert a common source stage with diode connected load. Ensure that gain of this stage is slightly less than 1 (gm1/gm2)<1 ). The diode connected pmos will replicate the supply noise on the pass transistors gate and hence PSRR will be better. Use an external capacitor to compensate the LDO.Try and let me know if you see any issues.
Agree with extra power dissipation , but this seems to be the best way since it only increases in proportion to the load. With the standard LDO architecture you will need to burn not less than 300uA in the second stage (pmos pass stage) to achieve 30dB at 1Mhz and still be stable.If you use other architectures such as using an amplifier to cancel out the noise on the gate,it would work only for a specific range of load currents and even this method consumes not less than 70u-80u extra current.