Cleaning Data
Getting rid of errors
Why clean the data?
2. Cleaning
- The Crux: Raw data is messy! Cleaning is imperative for accurate analysis. This involves:
- Fixing Errors: Handling typos, incorrect values, and inconsistencies.
- Addressing Missing Data: Filling in gaps (imputation) or removing incomplete entries.
- Removing Duplicates: Keeping only unique data points.
- Normalization: Standardizing formats (e.g., dates, units of measurement).
- Outlier Detection: Identifying unusual data points that might skew results.