Recommending Personalized POIs from Location Based Social Network

Haiying Che, Di Sang and Billy Zimba

(School of Software, Beijing Institute of Technology, Beijing 100081, China)

Recently, online social networks (OSNs) service providers have integrated a paradigm by adding geographical location as a new dimension to their framework. Combined with geographical information online social networks have evolved into location based social networks (LBSNs)[1]. Therefore, through LBSNs, users not only post, reply and forward messages but also declare their presences in specific locations (an activity often referred to as “check-ins”). As such, users can establish social relationships and share their experiences of visiting some points of interest (e.g. museums, mountains, coffee shops).

A recent survey[2]makes a classification about recommendation systems in LBSNs. Among the categories they mention, our interest lies in location (i.e. restaurants, bars, tourist sites, etc.) recommendation. Most location recommendation systems are based on the concepts of collaborative filtering[3]and apply analysis algorithms[4]to determine similarities between objects in LBSNs. In addition to these concepts, studies have been done on LBSNs to find correlated elements and their influences in making good recommendations. Some previous research work explored users’ personal static check-in preferences through geographical check-ins for location recommendation[5-6]. Others found that social friends tend to have similar check-in behavior based on social influence theories, and researchers have also investigated explicit social friendships on LBSNs[7-8]. These research results have shown that user preference of locations, social influence and geographical influence play an important role in making recommendations.

In this paper, we propose a user social geographic personalized location recommendation algorithm (USGP), which combines user preference, social and geographical influence factors together. Our idea is illustrated as follows: similar users share similar check-ins; influence of check-in frequency and users’ common friends number; using a non-parametric estimation method a model based on the influence of geographical distribution for each user is developed. We use a collaborative filtering based approach in determining similarities between users and present a function incorporating distance between users to adjust the weights of user preferences and social influence, then use a kernel density formula to determine personalized geographical influences of locations on a user’s check-in behavior.

The rest of the paper is organized as follows. In Section 1, we highlight related work. Section 2 describes the datasets and analyzes the user check-in behavior, social and geographical influence. We propose a recommendation algorithm on location based social networks that considers geographical influence and social information in Section 3. Section 4 experimentally evaluates the proposed algorithm by comparing with the other five baseline algorithms using the described datasets. Conclusions are made in Section 5.

1 Related Work

The rationale behind collaborative filtering is that if users show similar behavior in the past, they will continue to do so in the future[9]. Therefore, collaborative filtering has been used for recommendation in LBSN (e.g. Ref.[10]) in order to suggest location based on user similarity according to check-in records. However, to achieve more accurate results, it requires more than just user similarity drawn from user preferences thus additional information in LBSNs has been studied, such as social impact and geographical aspects for recommendation.

In Refs.[11-13], they made recommendations by matching user profiles and metadata of the locations such as description, tags, semantics, prices and categories information of location. Because any user or venue with complete profiles can be matched with a similar profile, the algorithms solve the cold start problem in recommendation systems. However, the performance of these algorithms suffers from the fact that a suggestion can match place with poor quality information just from the social opinion. Some systems are based on the social influence theories that social friends tend to exhibit similar preference and activities[14-15]. Ference et al.[16], combining user’s social network information and the collaborative filtering, proposed a collaborative filtering algorithm based on user preferences and social networking. These algorithms have demonstrated the importance of social influence in making recommendation in LBSNs. Ye et al.[10]suggested friendship based recommendation according to the idea that solely using user’s friends’ ratings is more efficient and just as effective as using the ratings generated by the top-kmost similar users. They showed that one user’s friends share more similar preferences than strangers, but not all social friends share a lot in common. Further, in Ref.[4] authors argued that not only direct friends but also indirect friends (a friend of a friend) have influence on user check-in behaviors geography thus it is necessary to include non-friends as they have some influences. Some research results have shown that considering geographical proximity derived from users’ location is important in LBSNs recommendations. Levandoski et al.[17]suggested that spatial items closer in travel distance to a user should be given precedence as recommendation candidates. In Ref.[18], they found that a significant number of the user’s check-ins occur within a 10 km radius, and a small proportion occurs between 10 km and 100 km leaving only a very small percentage extending beyond the 100 km radius.

Analyzing the correlations between the locations and users forms a good basis for making quality recommendations. Jiang et al.[19]combined the user location information and the collaborative filtering algorithms. Wang et al.[20]proposed two algorithms based the random walk method[21]. One of the algorithms proposed to choose top locations based on geographical distances from previous visited locations. The other determines user similarity by combining similarity based on friendship and similarity based similar location preferences, introducing a weight parameter. They showed that incorporating geographical and social influences in one algorithm gave better results and the social influence had a greater influence.

Recently, researchers have shown that combining the aspects of user preference, social influence and geographical influence gives more superior results by proposing unified frameworks. Cheng[22]captured the geographical influence of users via modeling the probability of a user’s check-in on a location as a multi-center Gaussian model (MGM) and further included social information and fused the geographical influence into a generalized matrix factorization framework. The most closely related works are Refs.[23-24], which combined user preference, social influence and geographical influence except they fused these factors based on different principles.

Ye et al.[23]got their final recommendations by applying weights to three factors; they were social connections amongst friends (i.e. integrating social connections and similarity of check-in activities with their friends by applying weights), and geographical distribution of users visited location. However, they did not have a way to explicitly determine weights in their model. It needs a great effort to assign optimal weighting parameters and usually suffers from over-fitting. Further, they argued that the geographical influences to locations of interests are important in user check-in behaviors and it can be modeled as a power law distribution. But, they did not treat geographical location influences on users’ check-in behaviors personally, which did not make the geographical influence effective enough.

Zhang and Chow[24]further explored the geographical influences in location recommendation, from the perspective of a user’s personalized travel pattern. In their model, each user in the system has a personalized travel distance preference, which has the biggest impact on choosing the location recommendation. They used a kernel density estimation approach to personalize the geographical influence on users’ check-in behaviors as individual distributions rather than a universal distribution for all users. But they extracted user similarity only from a set of friend users and consider distance between these users (based on concepts, users in the same locality are likely to visit same places). However, it has been found that non-friends also have an influence as they may share similar interests and activities so they may miss some quality recommendations. Further, they did not directly incorporate weights in integrating these factors properly to reflect the extent that different factors’ influences.

In summary, existing research efforts have shown the importance of user preference, social influence and geographical influence in LBSNs recommendations. Additionally, more recent works have combined these aspects in one unified framework, however, to our knowledge, adding non-friends impact and quantify the check-in frequency and weighting function among user preference, social influence and geographical influence have not been incorporated in one algorithm, therefore in this paper we attempt to touch upon these problems.

2 Dataset Descriptions and Analysis of the Influence of Geolocation

2.1 Dataset

Foursquare is a popular LBSN service created in 2009. We use a foursquare dataset[25]that contains 11 326 users from January 2011 to December 2011(full year of 2011) attendance records. It provides information for each user, including the user’s social network and historical attendance record, and residential information. The statistics of the datasets are shown in Tab. 1.

Tab.1 Statistics of foursquare dataset

2.2 Influence of check-ins

We assess the value of suggesting a new check-in to a user and the influence of check-in frequency for a location. We study our datasets and look at trends in first-check in & follow up check-in over a year.

In Fig. 1, a plot of this data (black line: total check-in and gray line: first check-in) is shown. As can be observed, as time goes by, the percentage of first check-in data is less and less, whilst the recurring visits grows and remains constant. This indicates that the longer a user uses the social network, the harder for the user to find a new place. It can be derived that it is important to help users to discover new POIs by recommending new locations. Another indication is that as a user becomes familiar with some locations they make recurring visits to places they find interesting. Therefore, user check-in frequency is useful in extracting user preference as it implicitly reflects user interests and activity. These two aspects can help improve the quality of a location recommendation algorithm.

Fig.1 Daily number of check-in records

2.3 Influence of distance

To assess what kind of friends will have more influential power in a recommendation in terms of distances among friends, we analyze the dataset to find the common visit proportion and the relationship of distances between friends’ residences. According to the distance between the friends’ residences, we classify users into groups, every 10 km as a group (i.e. the distance between 0 to 10 km friends is the first group, 10 km to 20 km friends is the second group, etc.). Fig. 2 shows the average percentage of each group of users’ visiting the common place. As shown in Fig. 2, with the increase of distance between friends, the proportion of common visited place goes down. It confirms that the closer friends’ residence is the more likely the two friends will have a common visit.

Fig.2 Common visit of friend residential distance

2.4 Personalization of location distribution influence for different people

In real world, check-in data shows that the influence of location distribution is unique for every user. For example, people who likes staying at home tends to visit points of interest near the house. On the contrary, outdoor enthusiasts like traveling around the world, to find their points of interest[24]. Therefore, we need consider a personalized approach in addressing the user location history distribution.

3 User Social Geographic Personalized Location Recommendation Algorithm (USGP)

This paper wants to improve the accuracy of current recommendation algorithms in location based social networks by providing a way to assign suitable weights of different factors according to the extent of their real impacts. According to the analysis done on our dataset, we observe that nearby friends tend to share more commonly visited locations (see Fig. 2). Thus, instead of the traditional similarity computation method, we combine the collaborative filtering and the friend collaborative filtering that depends on user residence distance. Since the influence of location distribution on check-in data is personalized, we also add location distribution influence into our method.

Fig. 3 shows the proposed algorithm’s structure including three primary steps: building user similarity matrix, suggesting top candidate locations based on user similarity, and making final recommendation.

Fig.3 Flowchart of the algorithm

3.1 User similarity computation

User similarity mainly includes two aspects, namely, user similarity about history of check-ins and user similarity about social relations. We get a user-location-frequency matrix from user check-in records and get a user-user-common friend number matrix from social relations.

The similarity of users who check in the same location is higher. In addition, the more frequent they check in the same location, the more similar they are. So this paper considers user check-in frequency when computing user similarity. The method that computes user similarity based on check-ins used is shown as


whereSu1,u2,lrepresents useru1andu2similarity about check-in data, l represents collections of locations.fu,lrepresents the check-in frequency of useruat locationl. If useruhas a check-in record at locationl, thenfu,lis userfcheck-in frequency at locationl. If userudoes not have a check-in record at locationl, thenfu,lis 0.

The method that computes user similarity based on social relationship is shown as


whereSu1,u2,frepresents useru1andu2similarity about social relationshipf,Urepresents collections of users.Nu1,urepresents how close two user social relation is. If two users have common friends, thenNu1,uis the common friend number for these two users. If two user don’t have common friends, thenNu1,uis 0. The same is forNu2,u

The closer users’ residence distance is, the more possibly users visit the same location. So we generate the final user similaritySu1,u2based on users’ resident distance as demonstrated as


Suggesting top candidate locations

We compute the possibility of a user visiting a location based on a similar user’s preference (referred to as a neighbor in terms of close distance of similarity) by



We use the final user similarity of useruandu’s neighbor,Su,neighborandfneighbor, which is the frequency user neighbor visit locationl, to calculate the probability ofuvisiting locationl, which isf(u,l). We choose top 50 candidate locations based on the probability value.

3.2 Final recommendation

It is analyzed that check-in data shows that the influence of location distribute is unique for every user. In this paper we address the influence of geographical distribution of each user by using a kernel density formula to determine the probability of a user visiting a new location with reference to the users visited locations. The final recommendation includes two parts, computing the geographical influence and generating the final results.

3.2.1Computation of geographical influence

Prior to computing the geographical influence of each user candidate location we require a set of distances for each pair of user check-in locations, we call this setD, and candidate locations distances between each user check-in locations. We compute the geographical influence as shown in follows, noting that this is unique for each user, hence personalized.


In Eq.(5), given a useruand a suggested locationl, wherenrepresents the total number of locations the user has visitedL={l1,l2…ln} anddiis the distance betweenlandli. We calculate the summation off(d) given by


This is for computing the probability of a user visiting this suggested location based on his unique check-in geographical distribution given by setD.

In Eq.(6),diis the distance betweenlandliL,d′ is a distance from the users setD,nDis the total number of elements in the above mentioned setD. Note thatf(d) is a non-parametric density estimation method (Kernel density estimator) therefore we use a gaussian distribution with bandwidth and kernelhandK(x) respectively.




In Eq.(7),σis the standard deviation of the sample inD.

3.2.2Generating final result

To get our final recommendation list we rescore the top-50 candidate locations probability score, which are calculated using Eq.(4), by adjusting through multiplying the result from Eq.(5) and re-rank them. We then choose a user’s top-Nrecommended locations from the initial top-50.

4 Experimental Results

In this section, we experimentally evaluate the proposed algorithms USGP (user social geographic personalized location recommendation algorithm) using the foursquare dataset (described in Section 2). A comparison is made with the following baseline algorithms: User CF (U), Location CF (L), and Friend CF (S), i.e. independently considers Users, Location or Friendship, respectively[20]; User Social CF (US) and User Social Geography CF (USG)[23], i.e. which combines user, social and geography aspects, in order to demonstrate not only the importance of combining social and location information, but also finding the appropriate weights to combine them. We used 80% of the check-in data as a training set and the other portion as a testing set. We generate the recommendation results using the training set and compare it with the testing set. When evaluating the location recommendation algorithms, one of the most important things is to determine what extent that the prediction calculated from the training set really reflects what happens in the testing set.

We choose two basic indexes, Precision and Recall to evaluate the algorithm. Precision means the percentage of the correct recommended locations in the recommended result. Recall means the percentage of the correct recommended locations in the user visited location set. We calculate the average Precision and Recall of all users when recommended numberNis 5, 10 and 20, to get the overall performance of the algorithm. We further show how this algorithm performs in terms of the sparsity problem related to recommendation algorithms.

4.1 Influence of recommended number

We apply USGP together with the other baseline algorithms in experiments when the recommend numberNis 5, 10 and 20. The precision and recall of the algorithms are shown in Fig.4 and Fig.5.

The results show that no matter whetherNis 5, 10 or 20, USGP always perform better than the other algorithms. The precision of USGP is 10% higher than other algorithm. The recall of USGP is 8% higher than the others. The performance ofUandLis the least, as there are few users visiting recommended locations. This is because these two methods don’t take social and geographic influences into account. Combined withUandL,Sgives a better result. This is becauseSconsiders the influence of social relationship in light of this, it shows that adding the social influence can get more accurate recommendation results. USGP considers both social influence and geographic influence, consequently, it performs best.

Fig.4 Precision of recommended number

Fig.5 Recall of recommended number

4.2 Data sparsity

Fig.6 and Fig.7 show the USGP algorithm‘s performance when the data is sparse. 10%, 30% and 50% of original data was deleted to test the accuracy and recall rate of proposed algorithm for sparse data. The experiment results based on three different reduced data sets are shown in Fig.6 and Fig.7, the top 5-recommendation precision and recall rate still perform better than the other 5 algorithms.

The higher the rate of deletion is, the moresparse the user-location matrix is. We can tell in the three-deletion rate data, USGP performs better, especially in the most data sparsity with a deletion rate of 50%. When the user and his friends’ check-in records are few; the similarity based on user preference calculated from the sparse data can be misleading. In that situation, the location influence plays an important role.

Fig.6 Data sparsity accuracy rate(Top5)

Fig.7 Data sparsity recall rate(Top5)

5 Conclusion

In this paper, a location recommendation algorithm inlocation based social networks (LBSN) is proposed, which takes advantage of geographical influence drawn from users’ check-ins and social network information to recommend new POIs to users. We first studied the influence of social network on the user check-in behavior and the influence of geographical factors on each user’s check-in activity in LBSNs. We then proposed an algorithm that calculates the user similarity on check-in records (considering check-in frequency) and social relationships (considering both friends and non-friends), and used a weight function explicitly calculating weights of these two kinds of similarities based on the geographical distance between users. In addition, we used a non-parametric density estimation (kernel density formula) method to determine the influence of personalized geographical distribution for each user, and then apply it to adjust the probability score of candidate locations. Experimental results, using a foursquare dataset, have shown that our algorithm gives more satisfactory results compared to UserCF, LocationCF, FriendCF, UserSocialCF and UserSocialGeographyCF, and solves the sparsity problems in LBSNs to some extent.

