ETL Testing - How to validate complete data in Source and Target tables
Best Techniques to validate Source and Target
data in ETL
So first question that comes in mind that why do we need to validate source and target data in ETL if data is already transformed and loaded into the Datawarehouse?
To answer this question, we have to first understand what is ETL, so its a process of "Extracting" source data from multiple places (heterogeneous source ) that could be a flat-file, database, apps, databrick or any other source system and to take a meaning full information from it we have to process the data as per the business rule that could be provided by the client. To make data meaningful we have to "Transform" the data as per the rule, and once the data transformed we "Load" it into the target table that could be a database, Datawarehouse or cloud system.
Now, once the data loaded into the final tables, testers has to ensure that whatever data we have taken from the source is correctly transformed and imported into the final tables without any data loss. So, here source and target data validation comes into the picture.
List of checks that we have to validate in source and target tables -
1- Validate that the counts should match in source and target
2- Validate that data should match in source and target
3- Validate that their should be no duplicate data
4- Validate that all the transformation logic applied correctly
5- Validate that there should be no incomplete data
Now, come to the techniques to validate source and target data in ETL -
1- MINUS Query - By using minus query source to target and target to source, we identify if there any record that are not present in either source or target.
For Example -
Source having 5 records -
Now, use MINUS query to check the data in source and target
First do Target minus Source
select * from Test_Traget
MINUS
select * from Test_Source
This query will return the row of data that are not present in source but available in target, so we have 1 extra record in target that is missing in source
6 Rintu 200
Secondly, Source minus Target
select * from Test_Source
MINUS
select * from Test_Traget
here, this query will return rows that present in source but not in Target, so we have 5 records in source and all are present in Target, hence 0 records this query will return.
2- By comparing Source data and Target data in Excel format or in CSV format by using VLOOKUP
Such an informative read! Your insights into organic gardening have inspired me to rethink my approach. Looking forward to creating a more sustainable garden space with your expert guidance.Business data validation Services
ReplyDelete