论文标题
空气污染热点检测和源特征分析使用跨域城市数据
Air Pollution Hotspot Detection and Source Feature Analysis using Cross-domain Urban Data
论文作者
论文摘要
空气污染是全球主要的环境健康威胁,特别是对于在污染源附近生活或工作的人们。污染源附近的区域通常具有较高的环境污染浓度,这些区域通常称为空气污染热点。检测和表征污染热点对于空气质量管理至关重要,但由于空气污染物的空间和时间变异性很高,因此具有挑战性。在这项工作中,我们探讨了移动传感数据的使用(即安装在车辆上的空气质量传感器)来检测污染热点。移动传感数据的一个主要挑战是采样不均匀,即数据收集可能因空间和时间而异。为了应对这一挑战,我们提出了一种两步方法来检测移动传感数据的热点,其中包括本地尖峰检测和样本加权聚类。从本质上讲,这种方法通过根据样品的空间频率和时间命中率加权样本来解决不均匀的采样问题,以识别健壮和持久的热点。为了使热点相关化并发现潜在的污染源特征,我们探索了各种跨域城市数据并从中提取特征。作为提取功能的软验证,我们为具有或没有移动传感数据的城市建立了热点推理模型。使用现实世界移动传感空气质量数据以及跨域城市数据的评估结果证明了我们方法在检测和推断污染热点方面的有效性。此外,对热点和源特征的经验分析产生了有关邻里污染源的有用见解。
Air pollution is a major global environmental health threat, in particular for people who live or work near pollution sources. Areas adjacent to pollution sources often have high ambient pollution concentrations, and those areas are commonly referred to as air pollution hotspots. Detecting and characterizing pollution hotspots are of great importance for air quality management, but are challenging due to the high spatial and temporal variability of air pollutants. In this work, we explore the use of mobile sensing data (i.e., air quality sensors installed on vehicles) to detect pollution hotspots. One major challenge with mobile sensing data is uneven sampling, i.e., data collection can vary by both space and time. To address this challenge, we propose a two-step approach to detect hotspots from mobile sensing data, which includes local spike detection and sample-weighted clustering. Essentially, this approach tackles the uneven sampling issue by weighting samples based on their spatial frequency and temporal hit rate, so as to identify robust and persistent hotspots. To contextualize the hotspots and discover potential pollution source characteristics, we explore a variety of cross-domain urban data and extract features from them. As a soft-validation of the extracted features, we build hotspot inference models for cities with and without mobile sensing data. Evaluation results using real-world mobile sensing air quality data as well as cross-domain urban data demonstrate the effectiveness of our approach in detecting and inferring pollution hotspots. Furthermore, the empirical analysis of hotspots and source features yields useful insights regarding neighborhood pollution sources.