Say, for example, you are testing a system on US equities with NYSE trade data. There are tons of other transactions that are OTC, OTC derivative transactions not hitting an exchange, or even trades from dark pools. All of which not being traded on the exchange. Therefore the data you are analyzing does not show the full picture. Therefore, in theory, there is data out there that you are not accounting for, which could have huge impacts on price movements and lead you into thinking there is an edge when there really isn't. Am I thinking of this the right way, or am I wrong here? Thanks!
You are thinking of it wrong. If you have an edge on one exchange then this can't be changed by trades that happen elsewhere. To the extent that such information has any effects, that is already accounted for in the results you are getting.
All trades executed on stocks listed on an US Exchange are reported to an exchange (and print on the tape), even trades filled on dark pools. OTC stocks (so called pink sheet) are not listed, hence not reported.