Researcher(s)
Yao Du, Zehua (David) Wang, Victor Leung, Cyril Leung
Date of Publication
Description
Data quality assessment is critical for distributed machine learning (DML). Data collected from heterogeneous Internet of things (IoT) devices may contain biased information that decreases the prediction accuracy of DML models. To address these challenges, we propose a blockchain-based approach to assess the quality of data that are not independent and identically distributed (non-IID). A blockchain running atop mobile edge computing (MEC) is helpful to protect privacy, security, and integrity of healthcare data when IoT devices are connected to MEC servers. Therefore, it is critical to integrate data quality assessment module on blockchain when building a blockchain-enabled DML system. In this paper, we jointly consider information loss and marginal utility of non-IID data samples. Specifically, we use Kullback-Leibler (KL) divergence to evaluate the information loss between IID and non-IID data samples and apply the reciprocal of data quantity to model the marginal utility of data samples. Human activities and handwritten digit recognition data sets are used for performance evaluations. Experiments show that our proposed scheme outperforms benchmarks regarding model test accuracy on various non-IID data samples.
External Link