Data Modelling
Data modelling refers to the process of combining data possibly from different sources, having as end result a new model which would be easier to use, and would facilitate further usage.
There are different ways stages when the data can be modelled and depending on the situation the strategy may vary.
1. Modeling during the [etl] process
- The ETL framework usually has access to multiple database instances, and a higher level of access so the part of Extracting is easily solved by the nature of the process.
- Depending on the implementation of the framework, there is the possibility to have in the same pipeline multiple data sources.
- An advantage to do data modelling during the ETL process is that you could potentially create better data structures at the [data-lake] level.
- Some level of modelling is also if during the process in the Transform part the initial data is heavily changed.
2. Modeling within the [data-lake]
- Once the data has been made available in the data-lake, further models are created to be used by different stakeholders.
- Data can be modelled with the purpose of further aggregation and to be moved into the [data-warehouse].
- Even though it is not advised to do much modelling at this stage, it can happen in the cases of lacking a proper data-warehouse.
3. Modeling from [data-lake] into [data-warehouse]
- This is the most common use case of data modelling.
- The raw data ETLed in the data-lake is in an unstructured state which can be really cumbersome to use; so, it can be used a source of different modelling processes loaded inside the data-warehouse.
- During this process also [data-cleansing] strategies can be applied, in order to enhance the quality of the data.