Understanding Outliers in Data Analysis
Understanding Outliers
- Outliers are data items/objects that deviate significantly from the norm.
- Identifying outliers is crucial in statistics and data analysis as they significantly impact statistical results.
Causes of Outliers
- Measurement errors: Errors in data collection or measurement processes can lead to outliers.
- Sampling errors: Issues with the sampling process can lead to outliers.
- Natural variability: Inherent variability in certain phenomena can lead to outliers.
- Data entry errors: Human errors during data entry can introduce outliers.
- Experimental errors: Anomalies may occur due to uncontrolled factors, equipment malfunctions, or unexpected events.
- Sampling from multiple populations: Data is inadvertently combined from multiple populations with different characteristics.
- Intentional outliers: Outliers are intentionally introduced to test the robustness of statistical methods.
Program-1: Visualize outliers using box plots and scatter plots.
Dataset Used For Outlier Detection
The dataset used in this article is the Diabetes dataset and it is preloaded in the Sklearn library.
0 comments :
Post a Comment
Note: only a member of this blog may post a comment.