I understand what you are saying but I prefer different approach. First, you need to change all parameters slightly and see if it still works with similar performance. Second, run it on different markets and see if it still works.
Those are excellent ideas as well although depending on the system type the fact that it doesnt work across different markets may not be a problem but just the result of the "edge" that the system exploits. But why choose? You can use the two techniques you listed *and* do partitioned backtesting. Its your money.
don't see how "changing all parameters slightly" validates anything - seems to me you would just be tweaking the curve - re. running on different markets - you need to check for correlations, among other things - you gave examples of er2 and qqq - these are correlated markets depending on the timeframe you evaluate.
By checking strategies with similar parameters you are making sure that you are not picking up random noise. By noise I mean strategies that accidently did well in the past.