How to do manual data reconciliation using python and SQL

Manual data reconciliation using python and SQL
Manual data reconciliation using python and SQL

Data reconciliation is an important part of the data migration. After the completion of data migration we have to compare data from the source data to destination data and here we are going to compare the record count of data which is one of the part of data reconciliation

Whenever we are doing data migration we have to building manifesto file which will have records of when, where and how much data got transferred. Here In this blog we are going to do manual data reconciliation using this manifesto file with these steps before starting reconciliation you have to migrate the data to BigQuery

  • We will create manifesto table in BigQuery using terraform,
  • Load manifesto CSV data to BigQuery using python script
  • Compare Manifesto record count to BigQuery record count data

1. Create manifesto table using terraform

Here we are going to use the same dataset which we have in created previous blog Create BigQuery dataset and table using terraform we will deploying our manifesto table in the same dataset by using dataset id

main.tf

main.tf

variable.tf

variable.tf

terraform.tfvars

terraform.tfvars

2. Load manifesto data from CSV file to BigQuery using python

Here python script we can use to load data from CSV file in GCP bucket

load data python script

In this python script need to add your project id and your bucket name

3. Compare BigQuery table record count to manifesto record count

compare record count using SQL

We need to run this script on BigQuery

Output after running the SQL query

Record count output

Conclusion

In this blog we have created BigQuery manifesto table, load CSV data to table and then SQL query to compare manifesto record to BigQuery table. Here you can get all the script from GitHub account and Excel sheet from google sheet.

Links

Reference

Load data using python

--

--

--

I’m a Data engineer | learning, writing about data engineering | currently at Accenture

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Monads as Practical Functionality Providers

How a Small Charity is Upsetting the Trillion-Dollar Silicon Chip Industry

Canary function — the synthetic monitoring approach

Installing AWX on CRC

Apache Kafka and Low-Code: An Introduction and a Practical Tutorial on Integrating IBM Event…

Do you want to build a web app?

Web Scraping Behind Authentication With Python

Apple has made the first move, but the checkmate could be given by Microsoft.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shaikh Vazid

Shaikh Vazid

I’m a Data engineer | learning, writing about data engineering | currently at Accenture

More from Medium

HADOOP — The Big Data Framework

Azure SQL Database — dynamic data masking

Getting Started with BigQuery SQL Persistent Functions

Picture by @yekalb

Steampipe-analyze data from cloud, files, platform, IaC using SQL queries