Urban Sensing Based on Mobile Phone Data:Approaches, Applications, and Challenges

2020-05-21 05:42MohammadhosseinGhahramaniMengChuZhouandGangWang
IEEE/CAA Journal of Automatica Sinica 2020年3期

Mohammadhossein Ghahramani, MengChu Zhou,, and Gang Wang

Abstract—Data volume grows explosively with the proliferation of powerful smartphones and innovative mobile applications. The ability to accurately and extensively monitor and analyze these data is necessary. Much concern in cellular data analysis is related to human beings and their behaviours. Due to the potential value that lies behind these massive data, there have been different proposed approaches for understanding corresponding patterns. To that end, analyzing people’s activities, e.g., counting them at fixed locations and tracking them by generating origindestination matrices is crucial. The former can be used to determine the utilization of assets like roads and city attractions.The latter is valuable when planning transport infrastructure.Such insights allow a government to predict the adoption of new roads, new public transport routes, modification of existing infrastructure, and detection of congestion zones, resulting in more efficient designs and improvement. Smartphone data exploration can help research in various fields, e.g., urban planning, transportation, health care, and business marketing. It can also help organizations in decision making, policy implementation, monitoring, and evaluation at all levels. This work aims to review the methods and techniques that have been implemented to discover knowledge from mobile phone data. We classify these existing methods and present a taxonomy of the related work by discussing their pros and cons.

I. Introduction

SMARTPHONES are rapidly developing in recent years,and are becoming the central devices of communication and computing in people’s daily life. This tremendous growth of usage has impacted the lives of people economically and socially for the better. Along with its development, mobile phone sensing has also achieved much popularity due to its convenience. These sensor-based devices can record conversations, movements, and activity states of individuals. Due to the widespread availability of smartphones and other mobile sensing-capable devices, sensor information has become very commonplace. Large data sets of human behavior are being collected and used to gain many insights into human interactions. They are utilized to target social activities, guide traffic,post advertisements, and support health care. For instance,they can be used in real-time monitoring of population density in urban areas or understanding the spread of diseases and accordingly provide procedural guidance. Furthermore, a smartphone has become a tool for economic growth and development. The extensive use of mobile applications has provided opportunities such as financial transactions through mobile devices (i.e., mobile payment), and entertainment applications. Reality Mining is a name coined for this data type exploration. It can be defined as a system’s ability to regulate and extract a set of meaningful users’ behavioral pattern [1].

Smartphone data have been exploited in different directions,such as mobility path, city-wide sensing applications, traffic planning, and route prediction. Previous work on the ir utilization accentuates the ir high potential in reading finegrained variations of human movements. However, there is a disconnection between high-level mobility path information and low-level location data. Hence, proposing an appropriate method to deal with low-level location data and access meaningful users’ mobility patterns is crucial. It is worth mentioning that there is a common assumption among all proposed methods in the literature: a definition of a mobility/interaction path to achieve cell phone users’mobility/interaction pattern at an abstraction level has been introduced [2].

Smartphone data have distinctive characteristics that attract researchers and organizations to exploit them. The research undertaken in the past has resulted in different types of mobile sensing methodologies. They are based on position tracing or mobile positioning, i.e., tracing location coordinates of cell phones. Many location-based services (LBSs) integrate geographical information systems (GIS), global positioning systems, and the Internet to suggest social activities and promotions. LBSs record people’s movement, their flows, and events. Smartphone positioning can be categorized into active and passive approaches. The former is considered for handset tracking in which the device location is distinguished with a specific query by using radio waves (i.e., network-based methods such as cell ID tracking and triangulation method)known as pinging. The latter analyzes data that is already stored via regular operations, i.e., billing data. This method needs the ability to carry out distance-based billing. The calls and SMSs sent or received generate records and containing cell IDs where they take place, allowing the phone’s approximate location to be determined. By retrieving and analyzing such positioning information generated from mobile networks, mobile operators then gain significant insights for designing effective strategies.

Various data/service provisioning approaches and applications have also been employed in mobile health,collaborative learning, and context-aware/location-based computing. We can categorize data/service combination into two distinctive directions, i.e., bottom-up and top-down. The former consists of an executable workflow, including known services, while the latter includes a non-executable scheme and a service selection phase. Given the highly heterogeneous characteristics of mobile computing, i.e., pervasive access to mobile services and ubiquitous communication among mobile devices, analyzing/tracing mobile data is not a trivial task.Consider the situation where cell phones are located outside of the communication range or when they are in offline mode. In these situations, mobile positioning and service provisioning are impossible. Therefore, an effective architecture for mobile service provisioning to address the challenges of service selection, e.g., avoiding frequent service recomposition,should be considered [3].

In this work, our goal is to review various techniques and methodologies that have been undertaken in the literature concerning smartphone data exploration. All the solutions that have been proposed by researchers to analyze people’s behaviors and their consistent patterns are studied. We provide a typology of mobile phone data utilization in urban sensing domain, compare different analysis approaches and end-uses for decision-making systems. Providing a taxonomy of challenges and issues that require strict attention and careful considerations in the data acquisition and analysis phase is our concern. This work scrutinizes different proposed approaches/strategies and assesses existing challenges. We investigate their advantages and drawbacks and discuss various barriers that need to be addressed.

The remainder of this paper is organized as follows: in Section II, different strategies for tracking and exploring mobile phone devices are classified. Section III discusses the utilization of cell phone data for urban planning. Existing strategies and approaches for collecting and analyzing mobile phone data are introduced. Some case studies and empirical application of mobile phone data are provided in Section IV.Section V presents potential challenges. Finally, Section VI concludes this paper.

II. Classification of Strategies

Most of the studies focusing on mobile phone data exploration aim to investigate human’s positioning. Unlike other movement tracking techniques, such as road sensors,ticket tracking, and filling surveys, the collection of cell phone location data provides widespread coverage of the population in real-time. There are many methods for locating a mobile phone’s position, e.g., using built-in components. The most renowned is satellite positioning using GPS. Other technologies such as Wi-Fi and Bluetooth can also be employed [4]–[6]. Mobile phone positioning can be divided into two main categories: 1) network-based positioning; 2)handset-based positioning. Given different characteristics of the two mentioned strategies, e.g., line-of-sight, and network coverage, the accuracy of a positioning technique can vary.

A. Network-Based Positioning

This method includes cell activity and active/passive network querying. Inferring positions based on cell activity is a simple method to implement. However, because of uncertainty in spatial accuracy and the fact that this technique only counts handsets on a call, it is not a practical approach,and tracking populations can be biased. Therefore, to address this concern, active network querying methods (e.g., round trip time, angle of arrival (AOA), and triangulation) have been considered [7]. Although the population’s accurate locations can be polled in such methods, there are still some drawbacks,e.g., generating additional traffic to the network. Each phone should send information to a monitoring system, which could potentially increase the communication load on the network and the energy consumption of the handset. Since cellular networks are designed to deal with normal loads, there is probably inadequate capacity to handle the sequential pinging of all phones. Hence, it is impractical for tracking the entire population. Thus, it is only suitable for locating a small subset of handsets. Because of these kinds of problems, passive network querying techniques are needed.

When a phone is in its active mode (either calling or sending/receiving SMSs), its corresponding base station is logged continuously. In its idle mode, the information is stored once an hour. These data include the cell ID of the base station a handset is connected to and a time stamp. By passively scanning all of them, it is possible to track the locations of handsets in the network. This method is accurate to the nearest cell ID, can track journeys, and works wherever there is coverage. As mentioned above, the sample rate is around once per hour in idle mode, but can easily be increased by the carrier at the cost of additional network traffic. In other words, passive scanning can be used in conjunction with an active scanning method, in cases where there are handsets whose location information is needed more frequently. Table I summarizes various network-based positioning methods in terms of their strengths and weakness.

Since network-based strategies are applicable in the operator’s side, most of the work in the literature has been focused on handset-based data sets and their relevant strategies. In the following sections, we study the methods that have been implemented based on hand-based data sets for exploring urban dynamics.

B. Handset-Based Positioning

Typically, handset-based data include handover records,location data, and call detail records (CDRs). Handover data are logs of a user’s movement from a cell tower to another in an active call process. Location data include periodic location updates of cell towers. A mobile station controller (MSC)initiates a transition update in either the location register databases, i.e., home location register (HLR) or visitorlocation register (VLR), when a location variation is detected.Due to the lack of incentive for long-term storage, it is difficult to obtain HLR and VLR data from operators. In contrast, CDRs are easy to obtain as they are required for legal compliance [8] and thus stored for a long period. They contain information about all interactions between a mobile network and its subscribers that are needed for billing purposes.Among these data, there is also information on which base station subscribers are connected to. These data can be used to obtain valuable information about movements.

TABLE I Comparison of Network-Based Techniques

Although mobile phone data is available at an operator’s side, there are some difficulties for researchers to acquire them, most notably due to privacy concerns and business confidentiality issues [9]. As a result, some approaches have emerged that aim to address these issues by placing either embedded applications/sensors on a handset to log data, or by constructing data monitoring platforms [10]. Among the prominent is the widely cited reality mining data set, an effort conducted at the MIT Media Laboratory. It follows near hundred subjects whose mobile phones are pre-installed with the applications that record and send data about call logs,Bluetooth devices in proximity of approximately five meters,cell tower IDs, application usage, and handset status. Subjects,including students and faculty, are observed by using these measurements over nine months. It also collects self-reported relational data from individuals [11]. In [12], the authors have utilized MIT data sets to present a visualization system for exploring the spatial and temporal data set. They have introduced a heterogeneous network to explore social-spatial data in a 2D graph visualization. A visual interface for performing semantic and temporal filtering is then proposed to support a large-scale cell phone data investigation. Ficeket al.[13] have proposed a method for locations data retrieval using the MIT dataset. They have conducted statistical analysis for such location measurements, i.e., people mobility patterns,spatial trajectories investigation and spatial-temporal data analysis. It should be mentioned that collecting data from embedded applications require the cooperation of handset owners to install applications to enable the logging procedures, which cannot be widely accepted, primarily owing to privacy concerns.

III. Urban Management

A better conception of when, where, and how individuals behave, particularly in populated regions, can lead to better urban infrastructure design. To that end, the dynamics of urban space and transportation should be explored. For example, understanding the flow of people and where they live is essential for urban planning. Such insights can help organizers to manage traffic flow and plan public transportation services. Innovative ways for assessing urban dynamics and human behavior analysis with the use of mobile phone data have been considered. Smart cities incorporate pervasive and ubiquitous technologies to deal with environmental challenges. A multi-tier architecture for smart cities, consisting of various layers, e.g., human, service,infrastructure, and data layer, can be considered. All these layers should be interrelated. In this regard, relative efforts have been performed and different smart city perspectives,e.g., mobility and intelligent transportation, have been studied.In this section, the application of mobile phone data in achieving sustainable urban development is discussed. The aim is to explore whether and how research can support operations in cities by using a fine-grained data set. We intend to highlight various ways that smartphone dataset have been utilized for urban development and to understand the increasing complexity of people movements while considering the limitations and potentialities.

A. Urban Dynamics

The increasing penetration of mobile phones has made them attractive as urban monitoring sensors. When a mobile phone is handed over from one cell tower to another, an area in which the mobile phone is located can be traced. This capability/advantage of smartphones, e.g., spatial coverage,together with their high penetration in population, can provide an opportunity to obtain valuable information cost-effectively.Both network-based and handset-based data sets can be utilized for analyzing urban dynamics. The former can help estimate the population within a cell’s coverage area. And a pre-recorded database of signal strength fingerprints can be queried to trace the mobility using handset-based data. The latter is more accurate but much time-consuming than the former.

Understanding urban structure has meaningful applications in various fields, including public transportation and locationbased recommender systems. Thus, it is necessary to identify the relevant characteristics for a better understanding of such spatial structure. Research in this area aims to investigate dynamics by revealing the locations and intensities of urban activities and analyze spatial mobility patterns. Human movement tracing is also needed to analyze how such activities can affect urban geographical space. Therefore,monitoring human movement is essential. In [14], the authors explain how data mining methods can be combined with large-scale multimedia storage. Their proposed approach can be helpful to mine massive amounts of user-generated content(UGC) and gain insights into different perspectives of urban reality. They have presented three cases where UGC is employed to discover a citizen’s perspective: city attractions,city issues/problems, and major events in the city. Chenet al.[15] have proposed a popularity index of a channel to identify the hot-lines based on a CDR data set. The density of users that travel across one channel and the diversity of travel behaviors are combined to infer each channel’s popularity level. In [16], the authors propose an analytical procedure intended to extract interconnections among different zones of a city, which emerge from highly correlated temporal variations of population local densities. First, a method to estimate the presence of people in different geographical areas is presented; then, they propose a method to extract spatial and temporal constrained patterns to obtain correlations among geographical areas in terms of considerable covariations of the estimated presence. They have combined these two methods to deal with realistic scenarios of different spatial scale. Some work have proposed a set of models for inferring the number of vehicles moving from one cell to another using anonymous data [17], [18]. These models contain the terms related to a user’s calling behavior and other characteristics of the phenomenon such as hourly intensity in cells and vehicles. A set of inter-cell boundaries with different traffic background and features have been selected for the field test.

Regardless of the benefits of these approaches, due to inherent characteristics of the mobile network geolocation,two consecutive spatial points to be measured might be separated by long distances and long periods. Then,corresponding trajectories may not be reliable and cannot be considered as a precise representation of individuals’ real paths. To overcome these concerns, Calabreseet al.describe a real-time urban monitoring system that uses the Localizing and Handling Network Event Systems platform. This system is developed for a real-time evaluation of urban dynamics based on anonymous monitoring of mobile cellular networks[19]. Through the use of several probes, it extracts all traveling signals and stores measurements made by all active mobile phones. They have also visualized urban dynamics by using the developed real-time control system for cities.

B. Understanding Mobility Flows

Mobile phone data allow visualizing the flow of people throughout the entire urban system. They can be used to develop predictive models in a city-scale as a low-cost estimation for traffic. These data sets can help one perform urban management, route planning, traffic estimation,emergency detection, and general traffic monitoring.Moreover, mobile data can be regarded as operational information on cities’ administration by aggregating people’s traces and collecting mobile phone traffic as a result of their behaviors. To capture mobility flows, some researchers[20]–[22] have used handover data collected from cellular towers. After pre-processing the data, they have studied flows via visualization software (e.g., GIS) and by using machine learning methods (e.g., classification algorithms). A qualitative interpretation of how the handover data can be useful in highlighting the flow of people in urban infrastructures is then provided. It has been demonstrated that there is a significant association between cell towers (with a high number of incoming handovers) and a high presence of people in their vicinity. Moreover, a high number of handovers denotes a greater movement. Notwithstanding the presence of associations between handover and traffic volume, however,there is a primary limitation associated with this analysis:handover data is limited to mobile phones that are actively making calls, and the duration of the associated calls must be long enough to traverse the boundaries of two cells. Thus, it is not possible to make a direct correspondence between handover and traffic counts. These data sets are also coarse in space because they record locations at the granularity of a cell tower. Hence, analysis can be biased by temporal or spatial variations. In [23], by utilizing a set of signaling events generated by active and idle devices, the authors have tried to overcome these drawbacks. While idle mobile phones provide a large volume of coarse-grained mobility data, active devices contribute with a fine-grained spatial accuracy for a limited subset of devices. The combined use of data from active and idle handsets enhances congestion detection efficiency in terms of accuracy, coverage, and timeliness.

In [24], the authors analyze different characteristics of human mobility by using billing data of more than one million anonymous users stored for seven days. They have proposed a method for recognizing the location of employment based on the regularity of individual trajectory. The residents’ mobility is analyzed based on active cell phones to observe different modes of mobility, i.e, partial and overall. Iqbalet al.have proposed an approach to implement OD matrices using traffic counts and CDRs [25]. First, they analyze CDRs, including time-stamped cell tower locations and callers’ IDs. Then, they use trips occurring within specified time windows to conduct tower-to-tower transient OD matrices for different periods.These matrices are associated with the corresponding nodes and transformed to node-to-node transient OD matrices. The actual OD matrices are estimated by using a microscopic traffic simulation platform. An optimization-based method is then implemented to specify the scaling factors that result in the best matches with the observed traffic counts. A methodology for passengers’ demand estimation is presented in [26]. The inhabitants’ trajectories have been extracted and utilized to build OD matrices. Given the achieved insights, the authors offer strategic locations for public transport services.In [27], Tooleet al.have presented algorithms to create routable road networks, generate verified OD matrices and trip summaries. They have routed these trips through road networks by using a paralleled incremental traffic assignment algorithm. Aguiléraet al.[28] show that specific conditions under which a cellular phone network is operated underground can make the passenger flows estimation possible in an underground transit system. They have conducted some experiments in an underground transit system to assess the potential of mobile phone data in the transportation context with the help of a mobile network operator. They have also explored the dynamic nature of a transportation system, i.e.,travel time, OD flows, and train occupancy levels from their cellular data set. The derived results are compared with some other methods that are implemented based on automatic fare collection data and direct field observations provided by the public transport authority.

Utilizing mobile phone data to reveal insights, e.g., OD matrices, is much faster than traditional surveying methods.However, there are serious concerns regarding employing them.

1) Origin-destination matrices are generated by aggregating the identified trips at the cell tower locations at a given time.Consider a situation where a single cell tower covers a large area. In such circumstances, the intra-area movement cannot be traced. Hence, low sampling and penetration rates can negatively affect the validity of OD estimation. The integration of additional mobility information ideally can be considered to validate the revealed pattern.

2) Identifying the location where mobile phone owners live and work can be beneficial to infer their trips, behaviors, and consequently improve the validity of the analysis; however,there are privacy concerns.

3) Assuming some hypotheses such as uniform distribution and duration threshold can yield biased results since parameter-based models are highly sensitive to them.

4) Mobile network coverage depends on traffic and local topography. Defining the boundaries of the coverage area and taking the impact of them on constructing OD matrices into account are not trivial tasks.

5) Handset-based data are generated when a subscriber is active, i.e., making or answering a call. Thus, the location of subscribers might not be updated and the analysis can be biased by frequent users.

6) Uneven distribution of mobile phones in a geographical region can negatively affect the analysis results.

Exploring the use of mobile phone data indicates that such utilization is a promising way of analyzing urban dynamics and can be regarded as a complementary approach to the detection of movement patterns.

C. Intelligent Transportation

The data collected with travel questionnaires have been used to provide primary information for public transport providers,traffic planners, and infrastructure authorities [21]. These datasets are the basis for routing, transportation modeling, and optimization. Obtained data can be processed and valuable insights about travel behavior in different areas can be revealed. A traffic information system (TIS) has two monitoring forms: sensor-based and cellular network monitoring. The former is expensive to deploy and maintain.It covers a small fraction of roadways. The latter can alleviate some issues like high cost and limited coverage but lacks accuracy. Traffic sensors, e.g., inductive loop detectors,magnetic sensors, video cameras, microwave radars, and infrared sensors, can be embedded in pavements and collect data from all vehicles as they pass over them [29]. These fixed devices can count the number of people and vehicles passing a given point. They allow an operator to see and measure how traffic is flowing at a particular location. Their performance can be degraded by pavement deterioration, improper installation, and weather-related effects. The main drawbacks of these technologies include their cost (e.g., installation,maintenance, operation, and repair cost) and their restricted spatial coverage. To gain a realistic and complete view of traffic conditions, they must be installed in large quantities.Therefore, they cannot be deployed globally at an acceptable resolution. Radio-frequency identification (RFID)transponders, GPS receivers, and mobile phones represent a novel way to monitor traffic data provided by vehicles.

Recently, intelligent transportation systems take a vital role.Its use can reduce traffic congestion and pollution. An intelligent traffic information system (ITIS) can provide individuals with valuable traffic data and help them to find the best route. It takes advantage of rapid advances in computers,sensors, and communication technologies. Driven by the fact that individual drivers are potential users of a mobile network,therefore, it is natural to consider them as the source of road traffic information. As a mobile network knows the approximate locations of active handsets, the stored data has the potential to revolutionize the study of city dynamics. Thus,the use of cellular data for intelligent monitoring of traffic has become popular. Understanding the mobility could take measures to better traffic management and provide governments with convenience to forecast the traffic demand.It could also lead to more precise decision making in the city context and transportation planning process. There have been several studies on the use of mobile data to monitor road traffic leveraging intelligent approaches. A typical mobile phone comprises several built-in micro-electro-mechanical systems (MEMS) sensors, e.g., accelerometer, magnetometer,GPS, and approximate network positioning that can be used for human mobility classification. GPS-based approaches have been commonly used to collect mobility-related information within a mobile network [30], [31]. GPSequipped devices can compute the positions and instantaneous velocity readings of vehicles with high accuracy. They can either transmit their location data in real-time or store them in memory for later retrieval. In [32], by utilizing data obtained from smartphones, the authors present an approach to support travel surveys. They have classified the extracted features from the motion trajectory recorded by the positioning system and signals of an embedded accelerometer. Although the accuracy level of using these methods is high, their main drawback is the low penetration of the mentioned technologies in the population. Furthermore, vehicles equipped with a GPS device represent an added cost. Using it requires each phone to send information to a monitoring system, which could potentially increase the communication load and increases the energy consumption of the handset. Finally, it requires line-ofsight access to satellites, hence, unable to determine the accurate location while it is indoor.

The majority of literature in traffic monitoring via cellular networks targets non-real-time applications, such as the extraction of traffic flow statistics and origin-destination matrices for urban movement. Only a few studies [19], [23],[29] address the specific problem of real-time road traffic estimation from cellular network signaling. Google Traffic is added as a feature on Google Maps to display traffic conditions in real-time on major roads and highways. But it works by analyzing the transmitted GPS-determined locations.As discussed earlier, there are some drawbacks regarding applying GPS-enabled technologies. It seems that an integrated system, one with consolidated phases comprising different layers such as traffic controllers, mobile communication systems, and the in-vehicle terminal, can ameliorate monitoring efficiently. By implementing an effective real-time monitoring system, the information required to alert drivers to problems can be provided.Surveillance over a road, incident detection, and classification of vehicles are supplemental features that enable authorities to implement an efficient and convenient transport system which can detect threats and respond to security incidents to minimize risk.

Table II summarizes the techniques and methodologies that have been utilized and reviewed in this section regarding various movement tracing methods, traffic sensing, and urban planning and compares them in terms of their pros and cons.

IV. Empirical Applications of Mobile Phone Data

Mobile phones are among the technologies that high-value solutions can be created from them. Significant changes in regular patterns of human manners could signal a quick response to an urgent situation. Thus, monitoring behaviors could be taken into account to identify when and where an event has occurred. Given our discussion about positioning methods, we can divide mobile phone data into three main categories: 1) CDRs; 2) LBSs’ data; 3) handover data. These datasets contain the spatio-temporal information of users.These features enable us to represent the intensity of different human behaviors through space and time. As illustrated in Fig. 1, we have located different cell towers based on a CDR obtained from a Telecommunication company in Macau.These spatial objects can be considered as points referenced by latitude and longitude and can be used to describe geographical pattern of interest. Different strategies regarding spatio-temporal clustering are discussed in more detail next.

A. Spatial-Temporal Analysis

Much of the worldwide data can be geo-referenced and consist of measurements or observations that are taken at specific locations, which indicates the importance of geospatial big data handling. Such data can be points referenced by latitude and longitude or within particular regions, so-called areal data. Their related studies aim to describe geographical patterns of interest. Positioning techniques can be used for obtaining the spatio-temporal distribution of smartphones as the resolution of geo-location has been improved recently. These investigations have attracted significant attention, specifically in urban planning and transportation studies. Mobile phone interaction can be considered as a function of the overall population and observed spatial and temporal stationarity of different areas in a city. Given such data sets, we can identify mobile phone spatial and temporal pattern and its corresponding transformation based on population and density. By exploiting spatial and temporal data, i.e., coordinates of cell towers and their interactions, we can present a spatio-temporal analysis model to capture the effect of urban density on transportation mode choices or evaluate trends of human behavior.

In [51], we have utilized different correlation analyses to scrutiny the dynamics of a city. A descriptive spatial autocorrelation analysis (a global approach) is carried out to illustrate the relations among different areas. A local correlation measurement is then conducted to predict significant areas among cell towers. By determining spatial objects’ clusters given the temporal characteristics of CDR,we have predicted the location of hotspots. A kernel density estimation (KDE) method is then applied to the calling behavior dataset to depict these hotspots on the map. This mapping technique identifies the areas where there is a high level of activities in terms of calling patterns. Fig. 2 illustrates the results. We have considered the cell towers as the spatial objects and frequency of calls as variables.

The spatio-temporal analysis is more sophisticated than relational data processing in terms of algorithm efficiency and the complexity of possible patterns. In such data analysis,interrelated information at a spatial and temporal scale have to be considered. Mobile phone data can be used to interpret patterns embedded in the interaction flows of people. We canconsider the geographical context of subscribers/cell towers to discover structures of interactions. Let us take the mobile phone interactions as a network graph with cell towers as its nodes and interactions as the edges. When coordinates of nodes are available, such networks can be considered as geographical networks, and the relationship among their components can be analyzed. We can defineG=(V,E) be a call-network withNnodes, whereV= {V1,V2,...,Vn} is the set of vertices (cell towers), andE⊆N×N, is the set of connecting edges, i.e.,

TABLE II Comparison of Different Approaches for the Mobile Phone Data Analysis

Fig. 1. Distribution of cell towers in Macau (Source: Google Maps).

Fig. 2. Spatial-temporal analysis of mobile phone data in Macau.

whereiandjrepresent cell towersiandj. In line with this definition, we have implemented a hierarchical agglomerative clustering (HAC) method on a CDR to detect interaction communities [52]. A HAC starts with each object (cell tower) in its cluster and then repeatedly merge similar clusters into broader ones. We have explored significant interaction patterns given the spatial heterogeneity of a mobile phone network. By implementing similarity measures, the proposed algorithm calculates the distance among clusters. These clusters are then merged until there is only one cluster remaining, or a certain termination condition is met. The spatial characteristics of nodes, together with an optimal hierarchy level, are also considered in our partitioning method. These kinds of analyses that can describe a phenomenon in a certain location and time can help organizations in decision-making and policy implementation. Fig. 3(a) illustrates the mobile phone network in Macau and interactions of cell towers. Fig. 3(b) reveals the community patterns detected through mobile phone interaction exploration.

Fig. 3. Mobile phone network analysis.

Donget al.have analyzed social interactions by spatial modeling of the interplay between mobile phone subscribers’demographics and their social behavior [53]. According to the results of the experiment demonstrated in their work, it is possible to predict users’ gender and age by analyzing their calling behavior. By implementing a double-label classification model, they have shown how to infer subscribers’ demographic information. They have defined two dependent variables, i.e., gender and age, and the correlation between those and other dependable features are modeled. In another work, Qiaoet al.have implemented a spatio-temporal model based on a hidden-Markov model (HMM) to monitor the traffic [54]. They have modeled urban road network as a graph. To that end, a junction intersecting roads are taken as the nodes while roads themselves regarded as the edges. Fig 4 reveals different road segments which are considered as the graph components. The Markov model is then adopted to infer hidden underlying structures of sequential traffic data on that road network. They have also defined the trip trajectories almost the same as the definition of the sequenceDipresented in Section III.

Fig. 4. Segmentation of a road graph.

V. Challenges

We have presented how cell phone data can be utilized to gain intuitions into the complicated process of urban dynamics. We have outlined the mobile phone data applications with a particular focus on human movement,traffic sensing, and urban planning. The strengths and weaknesses of various approaches are given in each specific subsection and consequently are summarized in two tables to provide recommendations on different methods for different applications. Besides reviewing existing processing methods,their advantages and drawbacks are fully discussed. Some other generic challenges are summarized next.

1) Data Access:accessibility is probably the most remarkable hurdle to exploit mobile phone data because of the limited interest of governments and organizations to make them available as caused by privacy concerns. However, this can be changed by creating data standards that ensure data privacy. Providing network-based data can be costly to generate, and Telecom companies treat it as a commodity.Moreover, sharing mobile phone data sets can be a threat to private companies’ business. Data deprivation can make sustainable development impossible.

2) Data Quality:the quality of data can be defined as the fitness of a data set for use in a specific domain. Take the spatio-temporal analysis as an example. In such studies, finegrained location data should be provided for applications such as location-based services, route planning, and transportation development. However, in rural regions, the spatial resolution may be poor. Data quality issues, e.g., lack of integrity constraints, inconsistent aggregating, would lead to reduced reliability and validity.

3) Privacy Issue:as discussed, the location awareness ability of mobile phones can make the geographical position of these devices available. Positions can be determined either independently through built-in components or externally by networks with which mobile phones connect to. Together with the benefits that this ability brings, there are myriad privacy implications. Interaction logs can be stored and analyzed for multiple reasons (e.g., billing purposes, real-time routing assistance, destination guides, environmental condition, and wireless advertising) and might be disclosed. Such disclosure has non-technical and technical aspects [55], [56]. For example, traffic interactions can be intercepted by unauthorized parties. However, the sensitive information of people’s communications must be preserved. People’s mobility patterns can consist of private data that one does not want to be revealed. Hence, mobile phone data sets must be anonymized (i.e., using unique IDs or hashing techniques)when publicly available by removing names/numbers to preserve privacy.

4) Computing Issue:processing large amounts of mobile phone data may exceed the capacity of traditional analytic tools. Extracting meaningful insight from a massive data set can cause a processing issue. Traditional data architectures cannot handle a large volume of mobile phone data since they are not able to deal with different characteristics of massive data sets (e.g., velocity, variety, and veracity). This inability has led to the development of Big Data analytics platforms,and Cloud-based and Edge Computing [57], [58]methodologies seem to be perfect solutions for not only hosting big data workloads but also for analyzing them.

VI. Conclusions and Future Work

Cell phones can be viewed as effective sensors to help collect rich spatio-temporal data about human mobility patterns. Accessing these anonymous data enable us to study people’s movement, measure the similarity of their travels,and track their mobility behaviors. In this work, we have studied the ways that mobile phone data can be treated and the existing applications and methods are reviewed. We have investigated these approaches, their relevant advantages, and drawbacks to present a taxonomy of capabilities.Predominantly, the mobility of people has been considered within mobile networks domain in order to decrease management cost. Nonetheless, in recent studies, most researchers have focused on human mobility and its impact on various social issues. They have also concentrated on users’routines and their movement habits in order to improve mobile location-based services. Typically, in such services,academic research has been focused on a single user, while human mobility research has considered human groups and their consequence mobility patterns. Perception about regularities of groups can be valuable in the fields of urban infrastructure planning, travel forecasting, and social relations.When it comes to the monitoring of mobile phone location data, the data representation is a relatively immature area and the implemented techniques for displaying/exploring routes,velocities, directions, and volumes are limited. In the context of traffic management, considering mobile phone data can be of great importance. The visualization of unconstrained movements within a region, as opposed to movements between pre-defined regions or along pre-defined routes,should be more explored. More research should be undertaken on the application of mobile phone data in infrastructure planning, public transportation, and disaster/rare event management [59]–[61]. Although the exponential growth of sensors’ data has initiated a myriad of new opportunities,exploring them requires much computational power to maintain and process large-scale datasets. A remarkable challenge is that this expansion rate of data production surpasses the ability of data processing methods. The application of big data frameworks and analyzing mobile phone data in real-time can open up ranges of opportunities to understand diverse social activities [62]. They have the potential to improve evidence-based responses to various events, natural disasters, disease outbreaks, and emergencies,and better management of these circumstances. To address such challenges, e.g., storage and processing concerns, Cloud computing [63]–[67] can be considered as a way to fulfil the requirements. It is capable of providing a dynamic, flexible,resilient, and cost-effective infrastructure, not only to provide sufficient infrastructure for processing but also for analysis purposes [68]–[71]. Moreover, the recent Fog and Edge Computing paradigms promise to provide the benefits of Cloud without incurring its problems (e.g., high latency).Future work should focus on the application of such frameworks to perform analysis to study the characteristics of mobile phone data to retrieve knowledge in an intelligent manner [57], [58], [72]–[77].