Growing and innovating as an experienced statistician, my studies focus on developing methodologies in improving prediction results with big imperfect datasets. The merit is adaptable to big data, requiring no statistical assumptions. My dissertation designs experiments to evaluate regression consequences due to large missing data for the guideline to avoid biased inferences. The major implication is to increase the prediction accuracy and computational efficiency with fewer costs. With enhanced datasets, shallow learning models (e.g., linear regression) are widely expected to perform better than deep learning models (e.g., CNN) with big data.