ETL Testing - How to validate complete data in Source and Target tables

 Best Techniques to validate Source and Target

 data in ETL

So first question that comes in mind that why do we need to validate source and target data in ETL if data is already transformed and loaded into the Datawarehouse?

To answer this question, we have to first understand what is ETL, so its a process of "Extracting" source data from multiple places (heterogeneous source ) that could be a flat-file, database, apps, databrick or any other source system and to take a meaning full information from it we have to process the data as per the business rule that could be provided by the client. To make data meaningful we have to "Transform" the data as per the rule, and once the data transformed we "Load" it into the target table that could be a database, Datawarehouse or cloud system.

Now, once the data loaded into the final tables, testers has to ensure that whatever data we have taken from the source is correctly transformed and imported into the final tables without any data loss. So, here source and target data validation comes into the picture.

List of checks that we have to validate in source and target tables -

1- Validate that the counts should match in source and target
2- Validate that data should match in source and target
3- Validate that their should be no duplicate data
4- Validate that all the transformation logic applied correctly
5- Validate that there should be no incomplete data

Now, come to the techniques to validate source and target data in ETL -

1- MINUS Query - By using minus query source to target and target to source, we identify if there any record that are not present in either source or target.

For Example -

Source having 5 records -

etl-testing-data-validation

Target having 6 records - [In Target we do not need emp-address as per the rule]

etl-testing-data-validation

Now, use MINUS query to check the data in source and target

First do Target minus Source

select * from Test_Traget

MINUS

select * from Test_Source

This query will return the row of data that are not present in source but available in target, so we have 1 extra record in target that is missing in source 
 
6        Rintu           200

Secondly, Source minus Target

select * from Test_Source

MINUS

select * from Test_Traget

here, this query will return rows that present in source but not in Target, so we have 5 records in source and all are present in Target, hence 0 records this query will return.




2- By comparing Source data and Target data in Excel format or in CSV format by using VLOOKUP

Comments

  1. Such an informative read! Your insights into organic gardening have inspired me to rethink my approach. Looking forward to creating a more sustainable garden space with your expert guidance.Business data validation Services

    ReplyDelete

Post a Comment

Popular posts from this blog

Test cases and Test scenarios for Full Load and Incremental data load in ETL Testing

ETL Testing Interview Questions Mphasis, TCS, IBM, Wipro