Products cap rows to protect memory. Sampling is fine for schema checks but risky for tail anomalies or rare event rates, issues may live beyond the first N lines.
Balance
- Use full load when hardware and limits allow.
- Stratified sample (head, middle, tail) when forced to sample.
- Push exhaustive checks to the warehouse when files are huge.