Use of Sporadically Incorrect Data for Historical Simulations
I'd like to share some thoughts about concern over data vendors whose data contain errors. I do not know how bad some data is, but in general, I believe the following principles apply.
For historical testing, although using perfectly correct data might be ideal for perfect optimization, I do not believe historical data needs to be this good to develop a dependable and profitable trading system for real-world use. Assuming the errors in data are relatively infrequent and not obviously absurd, there should be little concern in using this kind of data for system development and testing.
Since markets tend to look "continuous" on charts with occasional "common-sense" appearing discontinuities, you can generally spot data that is grossly in error, and estimated corrections to this kind of error can be made. Errors of a few ticks on either highs or lows of a daily range (or other sampling period) may be considered "noise." Errors in opens or closes within high-low ranges may or may not be "common-sense" detectable, so this data "noise" may be of most concern. However, although market behaviors do tend to repeat, rarely, if ever, do patterns of behavior duplicate themselves tick-for-tick.
Therefore, to assume that a trading-decision strategy must be based on high precision tick-range patterns or indicators is asking for trouble -- this would be indicative of over-optimization. Shallow-sensitivity "robust" optimization, in my opinion, is quite desirable, but steep-sensitivity optimization is likely to be disastrous. (Here, "sensitivity" refers to the change in simulation results as the characteristics of a market change over time, and shallow/steep refer to abruptness of the change.)
My presumption also takes into account the total number of trades that a strategy may generate over its useful lifetime. Fewer trades imply longer trending durations and, therefore, the "noise" relative to the magnitudes of the moves will be relatively small and insignificant. As the number of trades increases for a given lifetime, the trend durations shorten, moves generally become much less, and relative "noise" becomes more significant.
However, assuming that a sufficient number of trades are generated both in historical simulation and in real-trading so that, statistically, no single trade dominates the overall results, a robust strategy that produces consistent "small advantages" will by design, be inherently "noise immune."
Bottom line: So what if the historical data is somewhat in error -- the future is likely to produce data that differs somewhat from the past anyway, so a profitable trading strategy for any given market should be tolerant of some reasonable variation in data, whether that data be historical or yet-to-occur as the future develops. A "good" trading system should be reasonably "noise" immune, and data that is somewhat "noisy" can be quite adequate for trading strategy development purposes.
Having said all this, would I, or could I, trust using potentially flaky recent data to create real-time trading orders, for either day-trades or position trades? If I did not want to take time to look over data for obvious gross errors before mechanically (blindly) generating trading orders, using unreliable data for this purpose could likely result in some very expensive losing trades. (There could also be some serendipitous profitable trades, but I wouldn't hold my breath!) So, in this context, having reliably accurate data is imperative and I would definitely want to use a vendor whose data I could trust.
Understanding the strengths, weaknesses, and underlying design of one's trading strategy coupled with the emotional considerations of trust, confidence, and belief in that trading model would dictate the comfort level of using data that could have sporadic errors in certain ways. Even if I were willing to take time to carefully examine all data for "common-sense" correctness, I might not be too comfortable using data that would require my constant vigilance, even though my "noise immune" trading strategy would probably produce reliably profitable results over the longer term. Bottom line: In real-time trading, for peace-of-mind, get the most reliable data available.