Set time step to 10p (step size) and max step=100ps. Use option .traponly. Simulate the negative resistance of the amplifier along with the load caps(-gm/omega^2*C1*C2). If needed increase the negative resistance and verify it across corners. It should work. Then optimize for startup time (try a parametric analysis where negative res is maximum and it should minimize startup time also.