Technqiues to missing values in dataset

** single imputation

** regression imputation

** multiple imputation

LETS DISCUSS MORE

  1. SINGLE imputation is the replacing the missing value either by mean or by mode. If the missing variable type is character the mode is used else if it is numeric then the mean is supplied.

Note .. The variable is imputed for now with the missing values with about 20%. If it the missing proportion more such as 70 80 the imputation has no usefulness.

one more thing check std the distribution should not be changed more enough.

2. Multiple Imputation

multiple imputation fills in estimates for the missing data. But to capture the uncertainity in those , values are imputated multiple times.

eg.

a strategy for imputing the missing values by modelling each feature with the missing values as a function of other features in a round-robin fashion.

CASES for imputation

….. first of all if missing due to random case go for single

……….if the missing value is related to other feature go for multiple imputation

………………..if output itself is missing go for advanced techniques of imputation.

IMBALANCED DATA

Under sampling

Over sampling

SMOTE _ combination of both under and over sampling,

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store