Blockchain and Machine Learning for Intelligent Multiple Factor-Based Ride-Hailing Services

2022-03-14 09:22ZeinabShahbaziandYungCheolByun
Computers Materials&Continua 2022年3期

Zeinab Shahbazi and Yung-Cheol Byun

IIST,Department of Computer Engineering,Jeju National University,Jeju-si,Jeju Special Self-Governing Province,63243,Korea

Abstract:One of the common transportation systems in Korea is calling taxis through online applications, which is more convenient for passengers and drivers in the modern area.However, the driver’s passenger taxi request can be rejected based on the driver’s location and distance.Therefore, there is a need to specify driver’s acceptance and rejection of the received request.The security of this system is another main core to save the transaction information and safety of passengers and drivers.In this study,the origin and destination of the Jeju island South Korea were captured from T-map and processed based on machine learning decision tree and XGBoost techniques.The blockchain framework is implemented in the Hyperledger Fabric platform.The experimental results represent the features of socio-economic.The cross-validation was accomplished.Distance is another factor for the taxi trip,which in total trip in midnight is quite shorter.This process presents the successful matching of ride-hailing taxi services with the specialty of distance,the trip request,and safety based on the total city measurement.

Keywords: Taxi trip; blockchain; machine learning; ride-hailing; prediction;trip distance

1 Introduction

The increasing usage of online channels in the COVID19 pandemic fastens the business models and gives passengers the potential to have comfortable and safe choices.Increasing the Online-to-Offline (O2O) service increases the transportation sector for ride-hailing service, e.g.,based on personal mobility in china, the transportation system expands into different passengers [1].Ride-hailing and ride-sharing are two different topics.The ride-hailing is the requested taxi, which pick-up the passenger from a known place and drop-off at the requested destination,but ride-sharing is sharing the ride between passengers who don’t have the same destination.This means ride-sharing is different from personal service, and it’s not a private ride.Kakao taxi and T-map in Korea are popular apps for ride-hailing requests.

Regarding the passenger’s convenience by using the taxi apps, some complaints are related to the drivers if they accept the passenger’s requested location or deny it.Thus, the possibility of not matching the balance between driver and passenger demand causes the ride-sharing service to be well-fit to passenger balance.This aspect makes the drivers protest against the taxi-sharing service in Korea.

Blockchain technology comes with huge changes in the taxi demand service with securing the database and transactional records [2].Ride-haling platforms based on blockchain technology can contact the passenger with the driver who accepts the requested location [3].A Ride-hailing system based on blockchain reduces the mentioned problem by adding a manager between the passenger and driver.Passengers share the transactional data in nodes instead of the need to trust any authority.This process causes removing any intermediaries who carry out the role of gatekeeping.The transactional information keeps the distributed ledger possible to access through the defined nodes to make the blockchain network more transparent.In this system, the main power of blockchain is to record the data based on the time-series information to aggregate and share the transactions processed into the blockchain network.Every block in this process is not accessible,and it’s encrypted, which is connected with a hash code.The transactions are the constant information which shared, unalterable and secure [4].Fig.1.shows the simple architecture of the proposed system overview.The machine learning process is to clarify the passenger and driver information based on the requested location and their pick-up and drop-off distance.The total data analysis and classification is evaluating the quality of ride-hailing and the satisfaction of passengers.The blockchain input analyzes and extracts data from the machine learning section to avoid fake transaction information and secures the driver and passenger profile information to avoid fake information.

Figure 1: Simple overview of the proposed integrated taxi demand service based on blockchain and machine learning approaches

The main contribution of this paper defined as:

• Identifying the match factors in ride-hailing service using machine learning techniques.

• Minimizing the overfitting and bias selection based on the cross-validation process.

• Connecting the gap between passenger and driver.

• Improving the ride-hailing service in big cities.

• Using blockchain technology to improve the security and transparency of the ride-hailing service.

• Deploying the smart contract prototype to explore the expenses between passenger and driver.

• Linking the problems in transportation to machine learning state-of-art.

• Exploring the taxi demand prediction in terms of network architecture.

• Capturing the demand service prediction based on temporal and spatial information.

The remaining of this paper, divided as Section 2, presents a brief explanation of the related study in taxi demand ride-hailing service.Section 3 presents the proposed taxi demand service based on the blockchain and machine learning integrated system methodology.Section 4 presents the implementation of the proposed system.Finally, Section 5 presents the generated results, and we conclude this paper in section conclusion.

2 Related Work

This section presents a brief explanation of taxi demand service background and related applications and methods.There are three parts: service operation platform, taxi demand service based on machine learning, and taxi demand service based on blockchain.

2.1 Service Operation Platform

One of the useful sharing economy operations is platform operation [5-10].Employing platform-based technology is known as a more convenient way for companies [11-16].Mantin et al.[17] process a platform for peer-to-peer marketplace based on uncertain values with consumer possessing.In [18], time establishing and optimal versioning proposed.In [19], the operation platform investment was reported with considering the network externals.Car sharing is presented in [20] and becomes a key role for the product design line.There are many research areas which focus on pricing decision in service of platforms.E.g., exploring ride-sharing optimal pricing platforms [21], supporting the taxi-hailing service based on optimal policies [22-27], and controlling the self-scheduling based on the capacity of service [28].

2.2 Taxi Demand Service Based on Machine Learning

In recent years, it is enhancing the trip duration dedicated with many kinds of research for forecasting.The four-step process is one of the taxi demand prediction applications for spatiotemporal factors [29].Moreira-Matias et al.[30] proposed the demand service based on spatial distribution considering real-time forecasting data for the passenger spatiotemporal demand distribution.Zhang et al.[31] proposed the forecasting of hotspot locations based on the adaptive method.Davis et al.[32] presented the taxi demand prediction based on the time-series data using the mobile application of taxi for regulation mining.In [33], New York City taxi transportation data used contains 1 billion uses of taxis and the complaints related to this system and Google Trends’information.The analysis is based on the competition between ride-sharing and ridehailing.Seow et al.[34] present the taxi dispatching system in Singapore based on simulation techniques to reduce passenger waiting time.Comparing this technique with other state-of-arts shows it successfully reduced 33.1% of waiting time and reduced 26.3% of idling taxies.

2.3 Taxi Demand Service Based on Blockchain

Different studies in the field of blockchain are appointed for taxi demand to secure the data-sharing service.The blockchain characteristic gives trust, security and establishes the transportation system with the intelligent ecosystem.The data-sharing issue gives better utilization for transportation resources [35].In [36], feature exploration for reward-based communication of vehicles was explored.This mechanism is standing based on the Proof of Work (PoW) consensus.Liu et al.[37] applied the Byzantine Fault Tolerance algorithm to design their consensus phase based on blockchain framework to secure the vehicle announcement network using aggregation protocol.In [38], data sharing security processed based on DPoS consensus and applied blockchain for the Internet of Vehicles (IoV) to enhance the proposed system.Lu et al.[39] proposed the mechanism of trust points in vehicles based on a blockchain framework for securing the communication between vehicles.Applying public blockchain in this system contains the high cost and limitation for vehicle resources; therefore, this system is not suitable for Peer to Peer (P2P) data-sharing networks.

3 System Model

This section contains the proposed system detail model.This research aims to improve the taxi demand service performance and security by applying machine learning and blockchain integration.This section has six parts: machine learning framework for taxi demand service,decision tree and XGBoost performance in the proposed system, the optimization algorithm and cross-validation process, the blockchain framework, smart contract deployment for driver and ride requesting phase in this system.

3.1 Decision-Performance in Taxi Demand Service

The decision tree is used to create and specify the matching factors and build a suitable model that fits this process.The decision tree is a type of algorithm which contains various decisionmaking rules to classify and predict the information into smaller groups.These small groups contain sub-models that avoid the overuse of data splitting, including automatic intersection detection, regression, classification, etc.This model is easy to understand, and the presented system is non-parametric, which has no requirement for normalization and assumption.Thus,it’s not absorbent for outliers, and the comparison result of this system with other state-of-art shows the superior of this algorithm in terms of prediction [40].In this process, the Classification and Regression Tree (CART) executes the binary split, which differs from other splitting areas.The classification tree is a separate dependent variable, and the regression tree is the continuous dependent variable.Eq.(1) shows the probability of the two elements related to various groups.

where A represents the Gini index which is the measurement of each node diversity.R(i) is each node that belonged to objects probability in the ithcategory.The number of parent node observations present as m andmiis the target variable of ithcategory.

3.2 Optimization and Cross-Validation Process

Machine learning algorithms contain better outputthan conventional ways in terms of prediction and represent the overfitting problem.The cross-validation process distributes the defined model without overfitting and similarly optimizes the classification and complexity balance.This technique presents how well the defined model performing and tests the accuracy.The decision tree errors have also taken using this algorithm.The tree’s growth contains more terminal nodes and fewer errors, which shows the system has no good output with new data.To overcome this problem, cross-validation uses the cost function as shown in Eq.(2).

C(I) defines the errors of misclassification in the tree.The measure complexity defines with the I andβ(I~)that is based onI~Which are the tree total node records andβdefine as a parameter.Machine learning data processing is in two train and test sets in this system.The data process in tree form and theβparameter defines regularization, which shows the training set input observations based on the defined model.The rest of the data is random for the test set.The depth of each node sample size was selected based on an exhaustive brute-force search to optimize the model.This method evaluates the minimum size between 1 to 10 nodes and the depth of the tree.The main reason for applying optimization and cross-validation in this system is overcoming the overfitting and balancing the decision-tree errors.Feature importance in this process is shown in Fig.2.The feature importance is based on the population and distance of the trip according to the travel purposes and means.

Figure 2: Feature importance of proposed taxi demand service

3.3 Blockchain-Based Framework for Taxi Demand Service

The blockchain provides tamper-resistant transaction records.Blockchain service reading is based on timestamp records which is difficult to tamper.In [41-44], the consensus protocols of blockchain are presented which shows that the blockchain server is strong and reliable in terms of the consensus system.Hyperledger Fabric uses the consensus of practical Byzantine fault method appropriate for the private blockchain.One of the important modules of blockchain is the smart contract.In this system, we have used smart contracts as a decision-making inference engine.Smart contracts are used for verification of run-time and actuating tasks.Task management based on IoT devices is also presented.The recommendation from sensors sends to the sensing function submitted by calling the smart contract.In this case, the smart contract verifies the sensor’s license if it is available on the blockchain server.If it’s not registered, then there is no access allowance.The generated data tracking analyzes with the smart contractor.The transaction proposal shows the detailed analysis of sensing information based on set rules.Fig.3.shows the inference engine data flow within the smart contract.Once all the conditions are matched, the smart contract gives the information to the client to perform the task.In the opposite case, the client gets a notification that the rules are not met.

Figure 3: Inference engine data flow

3.4 Driver Smart Contract Deployment

Driver identification is based on the driver ID in the blockchain network.The driver registration information expands the smart contract and saves it as primary information in smart contract storage.Fig.4.defines the smart contract detail process in taxi demand service.As explained above, the transaction information between the passenger and driver saves into the blockchain network.The first step is the data service registration which contains the driver information.Next is requesting the driver information, selecting the driver based on provided data, and sending the request to the driver.Next, the driver responds to the passenger if the request for a defined location is acceptable or not and payment records between passenger and driver for a specified location.Finally, upload the records of transactions and investigate the block into the blockchain network based on the defined contract in this system.The smart contract’s main role is to set the rules for taxi drivers and passengers based on conditions for identifying the transactions and details of payments for the defined location.

3.5 Phase of Ride Requesting

Ride requesting is the process that the passenger is doing based on the nearest options.The passenger who is registered and looking for a ride has the identity number or passenger ID,reputation value, ride request, current location, requested location, and travel distance.Eq.(3)defines the mentioned process.

When the request is received, the ID holderIDptries to execute the matching option based on the passenger location and requested destination for the possible driver.Then, the interested driver checks theRequestpof the passenger and sends the detail of the accepted response, such as price.Finally, the response of the driver is defined in Eq.(4).

Figure 4: Smart contract detail process in taxi demand service

4 Implementation

This section provides the detailed implementation process of the proposed system.The environmental setting applied dataset information, the performance analysis of machine learning, and blockchain techniques explained.

4.1 Environmental Setting

In this section, a detailed explanation of the environmental setting is discussed.We have presented the system implementation environment detail specification in Tab.1.The main components of the system are Integrated Development Environment (IDE) that is the composerplayground.The used memory for this process is 32GB.CPU is Intel(R) Core (TM) i7-8700@3.20 GHz.The programming language is python version 3.6.2.The suitable operating system is Ubuntu Linux 18.04.0 LTS.We have used the Docker engine and composer with version 18.06.1-ce and version 1.13.0.The used environment of Hyperledger Fabric is with the version of 1.2.The applied CLI tool is composer REST server, and the node version is 8.11.4.Tab.2.shows the development environment of the machine learning algorithms in taxi demand service.

Table 1: Development environment of the proposed system

Table 2: Dataset overview

4.2 Data Description

Collected data for the proposed system is from T-map, which focuses on identifying the conditions that taxi drivers reject the passenger drive request.There are in total 22.101 cases in Jeju Island.10.000 cases were used in this study for processing.Tab.2.shows the overview of the applied dataset.

Type of matching shows the acceptance or rejection of request based on 1 and 0.If the request is accepted is one, and if it’s rejected is 0.There are many aspects of the destination’s place of origin and location, recorded as independent variables in Open Data (ODs) named time, day,weekdays, weekends, land users, socio-economics, etc.The variables related to socio-economics contain the density of population, density of business, and employee density related to destination and origin.This information is captured from the OD portal in Jeju.

4.3 Performance Analysis of Machine Learning Framework

In this study, the prediction process is based on supervised learning to predict the dependable targets of input data.The prediction process is based on the train set and test set.80% of data used in the train set, and 20% used in the test set.The analysis of machine learning models summarized in Tab.3.The results are applying cross-validation and without cross-validation and recorded as 0.82% without cross-validation and 0.83% with cross-validation in Jeju Island.Based on the results, the distance was the main reason in Jeju.Cross-validation prominent the importance of variables and non-important ones.There are in total five significant variables in Jeju, named as: A(6) Distance, A(8) Mid-night, A(9) pick-time, A(5) the density of D_employee,and A(2) the density of O_employee.

Table 3: Machine learning-based analysis records

4.4 Performance Analysis of Blockchain Framework

In some scenarios, the Blockchain framework needsprivate data processing that just authorized users can access, endorse, submit, and query.The Hyperledger Fabric can apply special APIs on data series, e.g., put private data, get private data, etc., accessible for authorized peers.Fig.5.shows the design of the Hyperledger Fabric process.The composer permits generating the admin card for every network.Based on the generated admin card, it becomes possible to define the participant and assets.Rules are defined in the smart contract and packaging process in the business network archive (BNA).The front-end interconnection is processing based on bna files in REST API, and various programming languages can build the client-side.The main reason for using the Hyperledger Fabric in this system is the advantages, which fit well to the presented goal.The need to provide permissioned membership, high level of scalability and trust, need to define the data basis to provide privacy, sensitive data protection and digital keys for protection of the system, etc.Hyperledger Fabric provided IDE integrated development environment (IDE) with a Hyper Ledger Composer name that allows developers to develop customized applications on the blockchain.

Figure 6: Transaction flow sequence diagram

4.5 Transaction Process in Blockchain System

Transactional process of the blockchain environment presented in Fig.6.sequence diagram.The client information contains the ClientID, ChaincodeID, Transaction (tx), a payload of tx,time stamp, and client sign.EP presents the endorser peer to check the signature and perform the transaction.The formation and redundancy of transaction processes by EPs.The signature validity also checks by the membership service provider.After the checking process, the EP adds the sign into tx and sends the response.After assembling the client, the application-level inspects the responses of the proposal, the transaction reply to the orderer peer.At last, the validation of the transaction is decided by committing peer and update the ledger.Hyperledger Fabric has the highest transaction per second.

5 Results

This section predicts the taxi-hailing service based on machine learning and controlling the transactional records and system security based on blockchain.

5.1 Taxi-Hailing Demand Forecasting

Taxi-hailing forecasting and various indicators processed by XGBoost algorithm.The main factors of this process defined as time, environment, and taxi demand based on XGBoost.Tab.4.shows the various indicators prediction in trip demand service.There are three columns named as a type of model, input, and output.Tdefines the time, andEdefines the environment.The system output is the next week’s demand of the presented system.

Table 4: Trip demand predictive indicators

The gride search algorithm was applied to regulate the XGBoost-based hyperparameters.Tab.5.shows the records of defined hyperparameters.There totally five defined hyperparameters as gamma, learning-rate, max depth, min_child_weight, and n-estimators.

5.2 Performance Analysis of Taxi Demand Service Based on Machine Learning Algorithms

In this process, Jeju island selected for our research object because it contains various tourist spots and has lots of visits and trip demands.The taxi-hailing information is divided based on the time, as shown in Fig.7.Taxi demand and ride-hailing service are defined as regular and based on increasing the ride-hailing service, taxi demand service decreases.Detail records are presented in Tab.6.

Fig.8.presents the taxi demand prediction results based on the XGBoost algorithm in the train and test set.The prediction results of both train and test sets are recorded.

The experimental results present the XGBoost performance in defined hyperparameters and optimal value and clearly show that XGBoost model inputs and outputs in terms of demand in the coming week.Similarly, the MAPE and RMSE of the presented system were evaluated based on various situations in terms of time, environment, and taxi.Evaluating the performance of XGBoost prediction in various hours defined based on MAPE and RMSE.Fig.9.Presents the MAPE of XGBoost in three different time stamps as eleven in the morning and five in the afternoon, which in this time the model performance is better than other times.

Table 5: Model hyperparameter settings

Figure 7: Online-taxi hailing and taxi demand service in Jeju

Table 6: Proposed system resource utilization analysis

Fig.10.The RMSE records show that the model performance in XGB + T + E + TX is good except eleven and twelve in the morning, four and five in the afternoon.

Figure 8: Taxi demand service based on XGBoost

Figure 9: Average MAPE of XGBoost prediction model in various hours

Figure 10: Average RMSE of XGBoost prediction model in various hours

5.3 Blockchain Performance Analysis for Secure Taxi Demand Service

The security analysis was processed against the probability of attacks for the taxi demand environment.Four factors were covered for this process: key attack, false data injection attack,replay attack, and man in the middle attack.The key attack secures the taxi demand service based on encryption to make it difficult for the attacker, which requires high computational power that is tough for the intruder.The false data injection performs before validating records, and every node necessary verification and authority for the successful consensus mechanism.The replay attach requires the private key for agreement between nodes and man in middle attack authorize nodes based on the temporary private key for all sessions to avoid a middle attack.The presented framework carries out the secure framework for protecting data, locking access points of various devices, enabling fault tolerance, data encryption, and decentralization.The functionality of smart contracts in this system is to scale down the costs.

5.4 Simulation Results

Fig.11.presents the transaction latency based on three different user groups following the Hyperledger Fabric.Each group contains 700, 1200, and 1700 users.The statistical measurement,e.g., Min, Max, Avg, also evaluates the performance of the blockchain platform.For example, the 700 users become stable with the Min of 37 TPS and Max of 40 TPS.

Figure 11: Hyperledger fabric transaction records per second

Similarly, the 1200 user groups contain the Min of 57 and Max of 63 TPS.Finally, the 1700 user group contains the Min of 79 and Max of 95 TPS.Based on the presented records, it is visible that the throughput increase based on increasing the number of users.However, the user numbers do not affect the performance of the system in terms of transactions.

Figure 12: Query transaction latency

Fig.12.shows the query transaction function of the proposed taxi demand case study in terms of latency.Three groups performance designed in the blockchain platform.The latency is based on getting a response from the blockchain.Based on increasing the number of users, the query transaction latency also increases in the blockchain platform.

The performance of the presented system was analyzed based on the resource utilization records, such as CPU usage in terms of Max and Min.The usage of memory (Max and Min).The traffic in and out.Tab.6.Shows the presented blockchain-based taxi demand service improves the performance of the network with efficient system utilization resources.Tab.7.shows the result comparison of the machine learning algorithm with other state-of-arts.It is shown that XGBoost has quite higher accuracy than others, with an accuracy of 92%.

Table 7: Proposed system resource utilization analysis

6 Conclusion

In this study, the factors show the reliability of the taxi apps in Jeju island.Weather conditions, variables, distances, time of the day, etc., are the applied factors to measure the success of the proposed system.The key approach of the proposed system is an integration of machine learning and blockchain.Accomplishing XGBoost and Decision tree algorithms for analysis before cross-validation and after cross-validation.The performance analysis of the blockchain framework is evaluated in this system.Hyperledger Fabric framework shows the maximum efficiency for securing the taxi demand service using smart contracts and other factors.The overall result presents the good performance and power of the system in the training and testing set.Consequently, using the machine learning algorithms found the importance of distance for the acceptance rate of drivers in terms of passenger ride requests.The main focus of this manuscript is to improve the security and intelligence of the ride-hailing system based on the integration of machine learning and blockchain framework.We have analyzed this process based on the taxi usage records in Jeju Island, South Korea, with the passengers rating and satisfaction from the kakao taxi and t-map.Ride-hailing requests from passengers and probability of the acceptance from the driver for the defined location were analyzed.The designed framework improves the passenger’s transaction records’security and reduces the waiting time of passengers for the requested location.

Funding Statement:This research was financially supported by the Ministry of Small and Mediumsized Enterprises (SMEs) and Startups (MSS), Korea, under the “Regional Specialized Industry Development Program (R&D, S3091627)” supervised by Korea Institute for Advancement of Technology (KIAT).

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.