Technqiues to missing values in dataset
** single imputation
** regression imputation
** multiple imputation
LETS DISCUSS MORE
- SINGLE imputation is the replacing the missing value either by mean or by mode. If the missing variable type is character the mode is used else if it is numeric then the mean is supplied.
Note .. The variable is imputed for now with the missing values with about 20%. If it the missing proportion more such as 70 80 the imputation has no usefulness.
one more thing check std the distribution should not be changed more enough.
2. Multiple Imputation
multiple imputation fills in estimates for the missing data. But to capture the uncertainity in those , values are imputated multiple times.
a strategy for imputing the missing values by modelling each feature with the missing values as a function of other features in a round-robin fashion.
CASES for imputation
….. first of all if missing due to random case go for single
……….if the missing value is related to other feature go for multiple imputation
………………..if output itself is missing go for advanced techniques of imputation.
SMOTE _ combination of both under and over sampling,
synthetically(S) creating minority (M)class for oversampling(O) using this technique (TE) and under sampling the majority class get the good ratio.