|Abstract:||With the ever-increasing urbanization process, systematically modeling people's activities in the urban space is being recognized as a crucial socioeconomic task. This task was nearly impossible years ago due to the lack of reliable data sources, yet the emergence of geo-tagged social media (GTSM) data sheds new light on it. Recently, there have been fruitful studies on discovering geographical topics from GTSM data. However, their high computational costs and strong distributional assumptions about the latent topics hinder them from fully unleashing the power of GTSM.
To bridge the gap, we present CrossMap, a novel cross-modal representation learning method that uncovers urban dynamics with massive GTSM data. After extracting activity-related tweets by measuring the dispersion degree of each keyword, CrossMap first employs an accelerated mode seeking procedure on all the extracted activity-related tweets to detect the spatiotemporal hotspots underlying people's activities. Those detected hotspots not only address spatiotemporal variations, but also largely alleviate the data sparsity of the GTSM data. With the detected hotspots, CrossMap then jointly embeds all spatial, temporal, and textual units into the same space using two different strategies: one is reconstruction-based and the other is graph-based. Both strategies capture the correlations among the units by encoding their co-occurrence and neighborhood relationships, and learn low-dimensional representations to preserve such correlations. Our experiments show that CrossMap not only significantly outperforms state-of-the-art methods for activity recovery, but also greatly benefits downstream applications like activity classification. Further, CrossMap is capable of processing millions of GTSM records within minutes, making it suitable for monitoring large-scale GTSM streams in practice.
We also further extend our model in two ways. Firstly, we adopt a novel semi-supervised learning paradigm that leverages the activity category information to guide the embedding learning process to generate higher quality embeddings. Secondly, to overcome the existing models' incapability of dynamically accommodating the latest information in the GTSM stream, we propose a method that processes continuous GTSM streams and obtains recency-aware urban activity models on the fly, in order to reflect up-to-date urban activities.