Data Wrangling

November 8, 2022 by Julian.Kellett

Data wrangling—also called data cleaning, data remediation, or data munging—refers to a variety of processes designed to transform raw data into more readily used formats. The exact methods differ from project to project depending on the data you’re leveraging and the goal you’re trying to achieve.

Some examples of data wrangling include:

Merging multiple data sources into a single dataset for analysis
Identifying gaps in data (for example, empty cells in a spreadsheet) and either filling or deleting them
Deleting data that’s either unnecessary or irrelevant to the project you’re working on
Identifying extreme outliers in data and either explaining the discrepancies or removing them so that analysis can take place

Data wrangling can be a manual or automated process. In scenarios where datasets are exceptionally large, automated data cleaning becomes a necessity. In organizations that employ a full data team, a data scientist or other team member is typically responsible for data wrangling. In smaller organizations, non-data professionals are often responsible for cleaning their data before leveraging it.

Data wrangling seeks to remove that risk by ensuring data is in a reliable state before it’s analysed and leveraged. This makes it a critical part of the analytical process.

Leave a Comment Cancel reply

You must be logged in to post a comment.

Recent Comments

Julian.Kellett on Person.Item_Codes_Absence
Changes to Person
Julian.Kellett on Global.Company
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Regional View"}
Julian.Kellett on Global.Organisation
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Regional View"}
Julian.Kellett on BusOpp.Dates
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}
Julian.Kellett on Supplier.PurchaseOrderline
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}
Julian.Kellett on Supplier.Invoice
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}
Julian.Kellett on Supplier.Details
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Stable 09 View"}
Julian.Kellett on BusOpp.Dates
Expected Condition The DataMart View BusOpp.Dates is in the style of 202409 As Found Condition The unversioned (beta) edition of…
Ali Ahmed on How to Find the SQL Definition of a View in Oracle
Double check your WITH READ ONLY — it’s typically only used on views, not raw SELECT queries unless you're defining…
Julian.Kellett on Resource.Details
{"Current stable version":"202409","exec":"@version = ‘202409’","Development" : "@version = ‘beta’","Change":"Cross Apply version"}