One of the pitfalls to developing production-ready machine learning solutions is the failure to identify the appropriate data assets. In evaluating the data assets to be used for your project, use the Data Iceberg Model approach to determine the underlying (i.e. not visible) structures that triggered the creation of the dataset.
The Iceberg Model is a good tool for discovering the underlying patterns, structures, and behaviors that cause an observable event. We know that approximately 90% of an iceberg is underwater. The 90% of the iceberg that exists below the surface is what creates the “event” seen by the 10% that exists above the surface.
The following Data Iceberg Model can be used to evaluate the quality & limitations of the data used to train your machine learning models.