Adversarial Attacks on Featureless Deep Learning Malicious URLs Detection

2021-12-14 10:28BaderRasheedAdilKhanAhsanKazmiRasheedHussainMdJalilPiranandDougYoungSuh
Computers Materials&Continua 2021年7期

Bader Rasheed,Adil Khan,S.M.Ahsan Kazmi,Rasheed Hussain,Md.Jalil Piranand Doug Young Suh

1Institute of Data Science and Artifcial Intelligence,Innopolis University,Innopolis,420500,Russia

2Institute of Information Security and Cyberphysical Systems,Innopolis University,Innopolis,420500,Russia

3Department of Computer Science and Engineering,Sejong University,Seoul,Korea

4Department of Electronics Engineering,Kyung Hee University,Yongin,Korea

Abstract: Detecting malicious Uniform Resource Locators (URLs) is crucially important to prevent attackers from committing cybercrimes.Recent researches have investigated the role of machine learning (ML) models to detect malicious URLs.By using ML algorithms,frst,the features of URLs are extracted,and then different ML models are trained.The limitation of this approach is that it requires manual feature engineering and it does not consider the sequential patterns in the URL.Therefore,deep learning(DL)models are used to solve these issues since they are able to perform featureless detection.Furthermore, DL models give better accuracy and generalization to newly designed URLs;however,the results of our study show that these models,such as any other DL models,can be susceptible to adversarial attacks.In this paper,we examine the robustness of these models and demonstrate the importance of considering this susceptibility before applying such detection systems in real-world solutions.We propose and demonstrate a black-box attack based on scoring functions with greedy search for the minimum number of perturbations leading to a misclassifcation.The attack is examined against different types of convolutional neural networks (CNN)-based URL classifers and it causes a tangible decrease in the accuracy with more than 56% reduction in the accuracy of the best classifer (among the selected classifers for this work).Moreover, adversarial training shows promising results in reducing the infuence of the attack on the robustness of the model to less than 7%on average.

Keywords: Malicious URLs; detection; deep learning; adversarial attack;web security

1 Introduction

Recent cyber-attacks have spurred an increased interest in devising security solutions to circumvent the threats posed by cyber attackers.It is, at least in part, due to the critical information leakage as a result of attacks such as identity theft, Denial of Service (DoS), masquerading,impersonation, and so on.The attackers attempt to impersonate authorized users to steal important and critical information such as passwords, secret keys, and other personal information including bank account details.These attackers use any possible and available mediums to attract victims such as distributing impressive ads on the Internet, including malicious URLs in informative emails, or hacking a website.Such threats are collectively referred to as phishing which is a type of threat to sensitive information or data where attackers intentionally attack a victim [1].The attacker lures the victim to a phishing webpage using different mediums and waits for the victim to access the phishing webpage (collectively referred to as social engineering approaches).According to the anti-phishing working group (APWG)’s newest report [2], in the frst quarter of 2020, 165,772 phishing sites were detected on the Internet.Besides, 75% of these malicious sites use secure socket layer (SSL) protection which implies that it is not enough to rely only on SSL against such attacks.

Malicious URLs detection is a highly challenging task since there are no rules for generating URLs and the behavior of the URL must be studied to detect potential malicious URLs.Most of the existing traditional detection systems use database-oriented solutions such as blacklists,or heuristic-oriented solutions such as content or visual-based detection [3].The URL-based techniques are safer and more realistic from three perspectives, i.e., no need to access the malicious webpage for performing dynamic analysis, the ability to perform Zero-Hour threat detection (i.e.,for newly created websites), and reducing the amount of work and time to process a webpage compared to other existing approaches.

With the tremendous development in the feld of ML in general and DL models in particular,these models are used to solve a wide variety of tasks in computer security and other felds [4–6].ML helps in detecting any offensive events including detection of spam content, DoS attacks [7],attacks on industrial control systems (ICS) [8], attacks on Internet of Things (IoT) devices [9],malware detection, and malicious URL detection [10,11].During the training process, ML models learn a prediction function that is used to classify a URL and then the trained model is used to classify new unseen URLs.For this purpose, we need a training dataset consisting of benign and malicious URLs.Furthermore, the training dataset should have informative features such that they adequately characterize the URL and that the benign and malicious URLs have different distributions.These features are usually extracted by domain experts and may include lexical features (such as the length of the URL, the existence of some words, bag of words, n-grams, etc.),and host-based features (Domain name properties, the geographical location of the host, etc.) [3].To this end, different classifcation algorithms such as support vector machine (SVM), logistic regression, and decision tree classifers can be used over the training data to learn the prediction function.However, extracting informative features is essential for the success of any classifcation model training.On the other hand, DL does both feature learning and prediction within the same model; hence, deep neural networks (DNNs) have the ability to discover the required secret features and use them to fnd a model that maps the input data to the desired output without explicitly defning the features by the domain experts.The frst layers of the networks discover informative features and the latter layers use these features to make decisions [12].In natural language processing (NLP), for instance, the model classifes a sentence according to the existence of some keywords.Hence, featureless malicious URL detection uses DL to classify a URL as benign or malicious [10].The URL should be considered as a sentence and use the same approach used in NLP to classify it.This approach can eliminate the limitation of typical blacklist-based solutions [11] which cannot be generalized to novel URLs that do not exist in the blacklist.Another advantage of DL-based featureless malicious URL detection systems is that they do not require feature engineering because they can extract the features automatically.

Nevertheless, despite remarkable results in providing intelligent solutions in different domains,ML and DL systems have shown more susceptibility to adversarial attacks in the form of small purposely created perturbations leading to misclassifcations [13].These attacks can cause dire consequences, especially when developing deep learning-based solutions for security-related problems where attackers work hard to discover new attack vectors.Evaluating the performance of these systems from only accuracy standpoint is not enough to decide if they are suitable for real-life applications or not.Since the purpose of applying these techniques is to protect against malicious activities, they cannot be applied until the robustness is considered to prevent the attacker from developing adversarial samples easily with small changes to the input sample.

To fll the gaps, the goal of this paper is three-fold:To highlight the existence of a vulnerability in featureless DL-based malicious URL detection systems that can be used by attackers to launch adversarial attacks on such systems, to develop a proof-of-concept algorithm to launch three attacks (character-based, segment-based, and full attack) and test these attacks on three kinds of classifers (character-based, word-based, and full joint classifer), and to show how adversarial learning can be used to mitigate the threat of such attacks by augmenting the training data to effectively defend against such attacks.

The main contributions of this paper are summarized below.

• A novel attack on featureless deep learning-based malicious URL classifcation systems is introduced.The attack exploits the sensitivity of these systems to small input manipulation causing malicious URL misclassifcation as a benign URL.This attack works in a greedy mode providing the required perturbation with the least possible number of steps.

• CNN-based URL classifers (character-based, word-based, and optimized joint classifer) are implemented and the performance of these classifers is evaluated under our proposed attack.

• The ability of adversarial training is examined to defend against the proposed attack where the usability of this attack is demonstrated as a way for data augmentation to get more malicious URLs for training both accurate and secure DL-based URL classifcation systems.

The remainder of this paper is organized as follows.Section 2 presents related works in malicious URL detection with deep learning and adversarial attacks on text data.In Section 3, the problem of malicious URL detection is presented as a binary classifer along with a background of adversarial attacks.In Section 4, the attack is presented to fool URL classifers and then in Section 5, the results of experiments on the robustness of three classifers against the proposed attack are presented as well as the effectiveness of the proposed adversarial attack along with adversarial training as a defense against such attacks, are discussed.We discuss why these systems are vulnerable to adversarial attacks in Section 6 and provide some suggestions on how to make them secure.The limitations of this work are described in Section 7 and we conclude this work in Section 8.

2 Related Work

In this section, we review the existing works in malicious URL detection using deep learning and adversarial attacks on text data.

Malicious URL classifcation is a well-studied problem as malicious websites are a primary source of undesirable content and their timely detection is a crucial task.Recently, deep learning has been extensively used for malicious URL classifcation.In this regard, NLP-based deep learning models have been successfully applied for this task due to their ability to recognize semantic features from unstructured text data.For instance, eXpose [14] used character-level convolutional networks to classify a URL sample.The convolutional network tries to locate informative patterns of specifc characters appearing together in the URLs.Because the model does not use word encoding, it does not have an explosion problem when increasing the number of features.The explosion problem appears when using word encoding because the vocabulary size of URLs words is unlimited with the ability to add new unseen words in each new URL.In addition to solving the manual feature extraction issue, eXpose outperforms manual feature extraction based URL classifcation models.Similarly, URLNet [10] applied convolutions for both characters and words in the URL sample to learn the URL fnal optimized embedding.Therefore, the model can have the ability to discover several types of semantic features of the URL.To solve the problem of large vocabulary size, URLNet uses an additional character-level word embedding where the fnal word embedding is created from the word itself and the characters present in that word.Furthermore, Shima et al.[15] used an advanced embedding method by embedding the combination of two characters appearing sequentially then CNNs are applied to classify the sample.On the other hand, in this work, we apply a full featureless approach for URL classifcation without any feature engineering, and we deal with rare unknown words by using word-embedding techniques.

The research in adversarial attacks on DL is more active on image data [16] than on texts.Consequently, most of the works in the adversarial text feld try to use methods from the image feld and apply them to texts.Samanta et al.[17] used the concept of fast gradient signed method(FGSM) to fnd and replace the important words or salient words that signifcantly affect the resulting class of the text when they are removed.Similarly, Sato et al.[18] operated in the embedding space instead of discrete space of the input.They saved the semantic meaning of the sentence by restricting the directions of perturbations to fnd a substitutional word that is in a pre-defned vocabulary instead of any unknown word.The previous two methods [17,18] are white-box attacks.On the other hand, Gao et al.[19] proposed DeepWordBug which is a blackbox attack.First, they determined important words to change using scoring functions, then they created perturbations on these words causing a misclassifcation.To preserve the readability of the perturbed sentence, the authors used edit distance.In another work, Ebrahimi et al.[20] proposed a white-box attack called HotFlip working against character-level neural network classifers.The authors used the gradients of the input vector to fnd the manipulations needed to fool the classifer.Inspired from DeepWordBug [19], in this paper, we propose a black-box attack using scoring strategies.Our attack is possible at the level of characters by changing specifc characters of the malicious URL to fool the classifer or at segments level by replacing the full URL segment.The goal is to change the predicted label of the URL by introducing minimum alterations.

There also exist some studies on adversarial attacks on URL detection systems, but they were designed to introduce perturbations to the URL input features.Chen et al.[21] used a differential evolution algorithm to fnd the required minimum bytes to be changed to create an adversarial attack.Similarly, Shirazi et al.[22] tried to measure the number of features that should be changed to create adversarial attacks and the cost for each manipulation.On the other hand,Aleroud et al.[23] used features perturbation for adversarial attack generation using generative adversarial networks (GAN).The main difference between our study and previous studies is that we perform attack and defense on featureless-DL systems and it works directly on the raw data while considering the URL as a sentence or text.

3 Problem Formulation

In this section, frst, we provide a real-life scenario of how the attack could happen.Then,the problem of malicious URL detection is formulated as a binary classifer.After that, a brief introduction to adversarial attacks and adversarial training is provided.Lastly, we discuss typical URL parsing as it determines the restrictions on the modifcations in the input URL.

3.1 Real-Life Scenario

Fig.1 shows an example of how the proposed attack could happen in a real-life scenario.

Figure 1:Real-life attack scenario

1) The attacker designs and runs a malicious website with a URL generated using our proposed attack.This website could ask the victim to download some malicious fles or to fll online forms with user credentials (such as password and account details) mimicking some original service (such as bank website).

2) The attacker uses different techniques to lure or redirect the victim to the malicious website,such as social engineering, phishing, email spoofng, website spoofng, and exploitation of browser vulnerabilities.In our case, the victim could be using a machine learning-based classifer to detect malicious URLs, but since the attacker uses the proposed attack for generating the malicious URL, the classifer is not able to detect it.

3) Once the victim browses the malicious URL, the victim will be redirected to the webserver on which the attacker has hosted the malicious webpage website.

4) The victim executes what the attacker wants, and the attacker gets the information he/she needs.

5) The attacker uses the information provided by the victim to access the original website using the victim’s credentials.

3.2 Malicious URL Detection

Consider a dataset consisting ofNURLs with their corresponding labels, {(u1,y1),...,(uN,yN)}, where(ui:i=1,2,...,N)∈Urepresents a URL,Uis the URL space which preserves that the input has a URL parsing, andyi∈{0,1} is the label of the URL, withyi=1 for a malicious URL, andyi=0 for a benign URL.First, we need to convert each URL to its feature representationui→xiwherexi∈Rnis an n-dimensional vector representing the features of the URLui.In the case of deep learning, this can be done automatically.Thus, all we need is a prediction functionf (ui):Rn→Rthat can perform both feature learning and URL classifcation.This functionfis represented as the neural network architecture of the classifer such as a CNN and recurrent neural network (RNN).The result of this function is the probabilities of each class being benignPf (ben|ui)or maliciousPf (mal|ui)and the fnal outputis the maximum of the probabilities and denoted as:

The goal is to learn the parameters or weights of the prediction functionfthat can minimize the number of prediction errors in the entire dataset.To accomplish this, we need to choose and minimize a loss function, where different loss functions can be used, such as the mean squared error and the cross entropy.

3.3 Adversarial Attacks

Adversarial attacks are security vulnerabilities in ML and DL models.Adversaries can utilize these attacks to fool DL models by altering samples with a small perturbation invisible to humans.Formally, for a given DL classiferf, a small perturbationΔxperformed on an input samplexresults in a new samplex′as an adversarial sample:

The classifer here is denoted asf:X→Y, whereXis the input sample space andYis the output classes’ space.‖Δx‖pis theLp−normof the perturbationΔxand it measures the degree to which an adversarial examplex′is imperceptible from its originalx.εis the permitted perturbation so that thex′stays in the input space and that it is indistinguishable fromxby a human observer.The constraint in (2.2) means that the class of the adversarial sample can either be any other class different from the original sample in untargeted attacks or a specifc one(t)in targeted attacks.

In our case, the input sample is a malicious URLuand we target fnding a perturbation that changesuintou′so thatu′is classifed as a benign URL:

The constraints in (3.1) are(u′∈U)mean thatu′should have a typical URL parsing style and(‖Δu‖p<ε)means thatΔushould be as minimum as possible and less than a predefned permitted perturbation parameterε.The constraints in (3.2) and (3.3) mean that the input sampleuis classifed as malicious and the perturbed sampleu′should be classifed as benign.

Adversarial attacks are divided into white-box and black-box attacks according to the amount of knowledge that the attacker has about the model [24].In white-box attacks, the attacker knows the architecture and weights of the model which makes the attack easier to launch.Whereas in black-box attacks, the attacker has no or limited knowledge about the model which is diffcult but more realistic in some cases.If the attacker has no knowledge of the model such as the training process and weights but can query the model for investigating input samples, it is called adaptive black-box attacks.The attacker constructs adaptive queries to the target model by changing the queries according to the labelyobtained from the target model for a samplex.The attacker then builds a surrogate model and trains it on(x,y)obtained from querying the target model [24].This surrogate model replaces the target model and white-box attacks can be constructed on it.In this study, we consider adaptive black-box attacks as these attacks are more realistic since the model could be deployed somewhere on the Internet as a browser extension or as a spam email detector.

3.4 Adversarial Training

Since it is essentially important to mitigate the effects of adversarial attacks, many robust DL methods and models have been proposed.Some of these methods are adversarial training,defensive distillation, and using GANs as defense mechanisms against adversarial attacks [13].It is important to mention that each of these defense methods is used to defend against specifc attacks, and none of them alone can mitigate all kinds of adversarial attacks.

In this paper, we examine the effect of using adversarial training as a defense for the classifer.In [25], the authors introduced adversarial training to secure the model by augmenting the training dataset with perturbed data received from attacking the original dataset.Different adversarial attacks are used to generate the adversarial examples then augment these perturbed data to the model’s training data.Formally, frst, for each input sample, all samples that cause maximizing the loss function should be found using the proposed attack.Then, during the training of the model, instead of updating the parameters depending only on the loss of original samples, the pre-treated input is included as follows.

Iis an Indicator function,fis the classifer,Lis the loss function,Δis the allowable perturbation,δis the perturbation that leads to maximizing the loss function of the classifer for an input samplex.In (4.2) this perturbed sample is included in the training process of the model to fnd the optimal parameters of the modelθ.

3.5 URL Parsing

URLs have a specifc structure (as shown in Fig.2) that determines how to forward the request from the user to the end servers.A URL includes the following segments:

• A protocol or scheme identifer (e.g., HTTP, HTTPS),

• A Domain name or netloc registered in DNS server,

• A path which identifes where the page is located in the server,

• Query:name and value pairs as parameters provide information that the resource can use for some purpose, and

• Fragment:used to direct the browser to a reference or a function.

URL parsing breaks it down into its partial components or segments allowing it to be treated as words in a text sentence.All URL components can be replaced with any other suitable alternative except for the domain name which is the only part the programmer cannot replace arbitrarily as it is registered in DNS servers.For example, changing the path will not change the content as long as the programmer is able to place the page on a different path.We will try to exploit this feature to build the attacks.

Figure 2:URL structure

4 Proposed Attack on Featureless URL Classifers

We propose an adversarial attack that is possible at the level of characters by changing specifc characters or at the segments level by replacing the full URL segment.In this study, we refer to characters or segments as ‘token,’ depending on the attacker’s purpose.We also note that the previous defnition for adversarial modifcation given in (3.1):Δu=u′−ucannot be directly applied here because the input URL sample is symbolic andLp−normworks for continuous image data; yet, it is meaningless for text data.Therefore, replacement modifcations of characters or words into the text are proposed to alter a malicious URL into a benign URL.The defnition of adversarial modifcations here is the edit distance between input URLuand perturbed URLu′that is defned as the minimal edit operations required to changeutou′.The goal is to change the predicted label of the URL by introducing minimum alterations.Moreover, we apply scoring strategies inspired by DeepWordBug [19] to fnd the important URL segments or characters that,if changed, can cause the misclassifcation needed for (3.1) and give an incorrect prediction as in (3.2), (3.3).The proposed attack follows a black-box scenario as it is more realistic to assume that the model is deployed somewhere as a part of a security system on cloud servers as a service.Such service may receive the input from the users and returns corresponding outputs so the architecture and parameters or gradients of the neural network used for classifcation are not available.Therefore, the required adversarial modifcations are created for the chosen tokens of the input without considering the gradients or weights of the model.With the numerous choices of potential input changes (among all the URL segments/characters changes), we design an approach that consists of three steps to choose the tokens and replace them:

• Step 1:Determine the important malicious URL tokens to change,

• Step 2:Determine the important benign URL tokens or candidates pool, and

• Step 3:Introduce the potential attacks that can evade the deep learning classifer with more than one suggestion if possible.

To fnd tokens for the frst two steps, we design scoring functions.However, the proposed changes should preserve the structure of the URL.The scoring functions frst fnd the important token in the malicious input URL and potential candidates to replace them and then the attack executes a specifc modifcation in those tokens that causes the classifer to misclassify the URL.Moreover, to fnd important tokens, we defne scoring functions (discussed in Subsections 4.1 and 4.2), which are designed to evaluate which tokens affect the decision of the target model more in both malicious and benign cases.After detecting the tokens to be changed in the malicious URL and the best candidates to replace them from the benign URL, the modifcation is applied to form an adversarial sample.We start by fnding the score of each input URL’s segment towards determining the maliciousness of the input URL with respect to the classiferf.Next, the URL segments are ranked according to their score in decreasing order.Thus, we ensure that the task is accomplished with the least possible number of steps.Then, we iterate through the URL segments in decreasing order of importance and transform the token to a new token.Unless the segment is a domain name, it is possible to replace it with another segment, otherwise, the domain name must be tackled by characters to create a new domain name that is not registered in the DNS server.

4.1 Determining the Important Malicious URL Tokens to Change

The ultimate goal is to fnd a perturbation that leads to misclassifying a malicious input URL as a benign URL.For this purpose, frst, we defne a scoring functionSCRm(tokeni):Rn→Rto determine the important tokens used by the classifer for measuring the maliciousness of an input(malicious) URL.Then we calculate the importance of tokens in an input URL according to the contribution to the classifcation confdence or the class probabilities resulting from the classifer and after that rank the tokens in the decreasing order of their importance.Formally, the score of a malicious URL’s token is calculated as follows:

To calculate the importance of a token, we measure the effect of this token on the probability of this URL belonging to a benign or malicious class.This is accomplished by subtracting the difference in prediction probability for the benign class and the malicious class before and after deleting the token.In the second line of (5), we use the fact that the sum of the two output probabilities (the probability for an input URL to be malicious or benign) is equal to one since we use a binary classifer.Large score values mean that deleting this token would lead to a larger benign-ness probability or less maliciousness probability.Hence, exchanging this value with another value can lead to a higher benign-ness probability.

Figure 3:An example of calculating the score of the path in a given URL

Perturbing tokens that have high scores leads to less maliciousness and higher benignness scores.Finding the scores can be achieved at the segment’s level for all URL segments except for the domain name where the score is calculated at the character’s level.This is due to the attacker’s ability to replace any segment with another segment inside the site; however, for the domain name,the attacker should create a new unregistered domain name.As shown in Fig.3, the score of the path in the given URL is positive and has a relatively high value; therefore, replacing this path with another one from a benign URL could increase the benignness probability of this URL.Once we estimate the importance of each token in the input sequence, they can be ranked and the top n tokens are selected for perturbation to create an adversarial sequence, where n is the number of allowed perturbations.

4.2 Determining the Important Benign URL Tokens

Here the main goal is to build a benign candidate pool for each token in the malicious URL.The candidate pool for each token consists of the scored benign tokens that can replace this token.The scoring function in this step fnds the important tokens used by the classifer to determine the benignness of a URL.Formally, the score of a benign URL’s token is calculated using the following formula:

The concept is the same as in the previous step, i.e., tokens that have high scores are chosen,and changing them leads to higher maliciousness and less benignness.We start by calculating the scores of all tokens in all benign URLs in the dataset, then rank the tokens in each URL segment according to their score in descending order.Again, the calculation is at the segment level for all URL tokens except for the domain name where the calculation is at the character level.

4.3 Token Transformer

To this end, we have sorted tokens of the malicious URL from a scoring function in Step 1(Section 4.1.) and have built the candidate pool by choosing the n important candidates for each token in Step 2 (Section 4.2.).The next part of creating the adversarial sample is to transfer or modify the tokens.The modifcation of a token can be done directly by replacing tokens from Step 1 with corresponding candidates from Step 2 recursively until the URL is predicted as benign by the model.This simple mechanic for perturbing malicious URLs is summarized in Algorithm 1.

?

?

Since tokens are considered in the order of their contribution score SCR (tokeni), constructing adversarial samples is achievable with the least possible changes using the greedy method.For domain name, the attack is character-based because new domain names must be created that are not registered in the DNS server.We replace a portion of the input URL with its correspondent in the candidate pool with maximum lengthm.The attack starts with the most important character from the hostname and replace it with its corresponding candidate and recursively replacing until it reaches an adversarial sample.If the number of characters is small and not enough to construct an attack, new characters are added to the hostname at the end.An example of characters-based domain name attack is shown in Fig.4.

Figure 4:An example of changing a malicious input URL to a benign URL by changing three characters using the proposed attack

For other segments of the URL, both the character and segment level of the attack are available as long as the crafted URL has the standard URL parsing.We chose not to change the protocol (or the scheme identifer) if it is http or https; hence, the changes are executed on all other segments.The protocol affects the decision of the classifer only if the dataset implicates this.For example, if all benign URLs in the dataset start with https and all malicious URLs start with http, then any URL starts with http will be classifed as a malicious URL with high probability.Below is the algorithm of the full proposed method for converting an input malicious URL u into a benign URL.

?

5 Experiments and Results

In this section, we discuss the implementation details and the obtained results of testing different versions of our proposed attack against three kinds of DL-based URL classifers.Moreover, we evaluate the performance of the proposed attack by testing the accuracy of the model while increasing the number of allowed permutations.Finally, we examine adversarial training for designing more robust models against such attacks.We test our attack on three kinds of DL classifers:(i) character-based classifer (ii) word-based classifer, and (iii) full character-based and word-based classifer.To train the DL binary classifer, we need to build a dataset containing labeled benign and malicious URLs.There are many open-source datasets for this purpose.We consider a dataset that was created to address the problem of malicious URLs classifcation on the Internet [26].The dataset contains malicious and benign URLs that can be used for analysis or building classifers.The dataset was acquired from various sources, such as PhishTank [27], etc.and it contains 450176 unique URLs of which 77% are benign and 23% are malicious.

As was mentioned earlier, in the DL approach, feature engineering is not required before feeding the URL to the model.Nevertheless, some pre-processing of the raw URLs is still needed.The characters and words of the URL need to be expressed as integers.This requires building a dictionary containing all possible characters or words in the URL.The featureless DL approach considers the problem of malicious URL classifcation as a text classifcation problem and uses techniques from NLP to solve it.After preprocessing the raw URL, an embedding layer is used to move characters or words that occur in the same context closer to each other in an n-dimensional space.This layer will be used as part of the model where the embedding is learned along with the model training process.

For the character-based classifer [28], each character is considered as a word and the position of a character within the n-dimensional vector space is learned from the URL.This position is based on the characters that surround this character as a context as shown in Fig.5 for example.

Figure 5:The character ‘?’ indicates the end of the path segment and the beginning of the query segment

The vocabulary size, which contains allowed characters for embedding, here is 100 and it contains all unique printable characters in python.As a result, each character has its own embedded vector.Character-based classifers decide the class of a URL depending on a specifc set of characters appearing together.On the other hand, for word-based classifcation, all unique words in the training dataset are considered.Because there are no restrictions on the words in the URL, and especially in the domain name, the vocabulary size here extends unlimitedly as the training dataset gets bigger.To accomplish that, we replace words that appear less than two times with “unknown.”Word-based classifers determine the class of a URL depending on a specifc set of words appearing together.

For a complex word-level and character-level classifer, we maintain the same embedding concept proposed by [10].In this embedding, two matrices are used, one for words and the other for characters, whereas the fnal URL representation is the sum of these two matrices.The input URL u to a classifer is converted into a 2D matrix:u→x∈RL∗Kwhere L is the maximum length of the component that can be a word or a character for each URL and K is the embedding dimension.There are several architectures to build the model for this task, for instance, LSTM architecture and 1D convolutions.To this end, we use CNNs as they can discover important information for the classifcation from groups of characters or words appearing together in the URL which could indicate if a URL is benign or malicious [29,30].In 1D convolutions, the width of the sliding window is constant which, in this case, is the embedding dimension K.The convolution process happens overx∈RL∗K.We use the same CNN classifer architecture proposed by [10] by using 4 convolutional fltersW∈RK∗h, where h=3, 4, 5, 6 and for each flter size, 456 flters are used.Using these flters, the network can consider the relationship among h components (characters or words) appearing together.In this study, we consider cross entropy as a loss function for training all classifers.

5.1 Performance Evaluation

To measure the performance of the proposed attacking methods, the accuracy of the model is observed on the generated adversarial samples.Effective attacks lead to lower accuracy as they are able to successfully fool the classifer.We test three variants of the attack as follows:(i) segmentbased attack for all URL segments except for the domain name, (ii) character-based attack for the domain name, and (iii) full attack where both character-based and segment-based attacks are used to create the fnal attack.To study the infuence of the dataset size on the accuracy and the robustness of the model, we conduct experiments on the full dataset and partial dataset by taking 100k training samples with preserving the basic proportions between benign and malicious samples.We also measure the effectiveness of each of these attacks by determining the reduction in the correct classifcation rates of malicious URLs, i.e., frst, adversarial samples are constructed,then the percentage of those that are correctly classifed as malicious URLs is found.The results of these experiments are presented in Tab.1.

From Tab.1, we can see that the best base-ground accuracy is achieved by applying the model proposed in [10] which uses both characters and words to classify the sample.We also note that the proposed attack was able to reduce the accuracy of all models.Character-based classifers have better accuracy and robustness than word-based classifers.That is due to the large vocabulary size in these applications and the freedom that the programmer has to name the site and the path inside this site.We also observe that, when the number of allowed characters to change equals to fve, the full attack achieved a 56.3% reduction in the accuracy of the full character+wordlevel classifer, 60.1% of the character-based classifer, and 67.5% of the word-based classifer.It is worth mentioning that the attack works better for small datasets; however, it achieved a 56.3%reduction in the accuracy of the best model making the use of these models insuffcient in real protection systems where attackers try every possible way to attack the system.

Table 1:Reduction in the accuracy of three classifers against the proposed attack

For the full attack, we observe the effectiveness of the attack by increasing the number of allowed characters to change in the domain name.As shown in Fig.6, the accuracy of the model starts to drop signifcantly when the number of characters allowed to change is more than two.

Moreover, to measure the effectiveness of our scoring functions, the proposed attack is compared with the following four random-based attacks that depend on introducing random changes on the URL:

1) Attack 1:Remove the tokens randomly from the malicious URL without replacing them with candidates from benign URLs.Four characters are allowed to be deleted from the domain name and all other segments to be completely deleted.

2) Attack 2:Order the tokens from the malicious URL according to their cost and pick tokens randomly from benign URLs for replacement.

3) Attack 3:Order the tokens randomly from the malicious URL and replace them with randomly selected tokens from benign URLs.

4) Attack 4:Order the tokens randomly from the malicious URL and replace them with tokens from benign URLs ordered according to their cost.

The reduction in the correctly classifed malicious URLs rates for the three previous classifers against these random-based attacks is shown in Tab.2.The results show that random-based attacks infuence all classifers considerably less than our proposed attack; thus, the proposed ordering and replacing approach using scoring functions is important.Similarly, the arrangement of the classifers according to their robustness against random attacks is the same as against our proposed attack.We also note that attack 4 infuences the classifers more than other attacks because it orders benign candidates according to their cost.Next, attack 2 takes place because it orders malicious tokens according to their cost.

Figure 6:Change in model accuracy according to the number of changed characters in the domain name

Table 2:Reduction in the accuracy of three classifers against the random-based attacks

5.2 Increasing the Robustness with Adversarial Training

Next, we study the impact of adversarial training on mitigating the effect of small perturbations on input samples.In an attempt to understand how adversarial training could increase the robustness of the classifer, the above-mentioned experiments are repeated to observe the difference in the performance between standard and adversarially trained model using two training datasets with different sizes.However, we restrict ourselves to augmenting the full classifer (as it achieved the best results), in an effort to make it more robust.The goal of using adversarial training is to improve the generalization performance of the classifer by considering crafted samples outside the original training dataset.An effective generalization reduces the classifer sensitivity to minor perturbations which increases its resilience against adversarial attacks.

Applying adversarial training is done by performing the attacks on all malicious URLs and then the perturbed URLs are augmented to the training set as malicious URLs (as shown in Fig.7).Using the same approach suggested earlier, the number of characters allowed to change in the domain name (m) ranges from one to six.Thus, we fx (m), fnd all adversarial samples,augment the samples to the training dataset, train the model again, and fnally, we try the attack on the fnal model and compare the accuracy before and after applying adversarial training.The results are shown in Tab.3.

Figure 7:Applying adversarial training by retraining the model on dataset received from augmenting the original dataset with adversarial examples received from applying the proposed attack

Table 3:Increase in the rate of correctly classifed malicious URLs for three classifers against the proposed attack after applying adversarial training

The columns in Tab.3 represent the attack that is used to augment the training dataset for adversarial training and the rows represent which attack is tried after re-training.Adversarial training shows promising results in reducing the infuence of the attack on the robustness of the model.The infuence of the segment-based attack was reduced to less than 3% on average.The infuence of the character-based attack and the full attacks was reduced to less than 7% on average.Furthermore, it was observed that a larger training dataset size leads to a smaller gap between the accuracies of standard and adversarial training models.

6 Discussion

We aim at estimating the robustness of DL-based malicious URL classifcation systems under adversarial attacks.The results of this study show that these systems are not applicable until the robustness of the model is considered and not just the accuracy.By observing the results, one of the challenges that can make these systems less secure than other NLP DL-based systems is the vast number of unique words that appear in the URL which do not exist in the formal English language.This is due to the fact that the domain name and other variables are set by the programmer without any rules of how to name these variables.Using n-gram could reduce the effect of this problem by sliding a window of n characters over the URL domain name to generate n-gram tokens.This solution could alleviate the problem but does not solve it completely when the attacker adds n-gram from a benign URL.Another problem with word-based classifers is that the attacker can replace all segments (except for the domain name) of the malicious URL with segments from a benign URL, which could reduce the maliciousness of the URL signifcantly.To address this issue, the domain name of any full URL classifed as malicious should be added to a blacklist, thus changing the path and other segments of the URL would not result in classifying it as benign.Here are some recommendations to make these systems more robust against these kinds of attacks:

1) Include the pre-treated input in the loss function which reduces the model’s sensitivity to small changes.

2) Consider a large training dataset with a reasonable amount of malicious URLs which increases the accuracy and robustness of the model.

3) Create some predictive features such as the length of the URL, the existence of executable extensions in the URL, the existence of redirection in the URL, etc.Although these features require domain knowledge and increase the complexity of the model, they harden the process of crafting adversarial samples.Add the domain name to a blacklist after being discovered with this model to prevent changing other segments which leads to misclassifcation from happening.

7 Limitations

Our study has some potential limitations.For instance, we performed the experiments on an open-source, publically available dataset; however, getting a larger dataset with more up-todate malicious URLs could lead to more precise results.Moreover, the proposed attack was tested against classifers that we designed ourselves.Testing the attack against real classifers used by security companies and web browsers would be more realistic.However, security companies usually do not share the architecture of their malicious URL classifers that is why we could not test it against those classifers.Besides, testing against these classifers is considered illegal.Furthermore, their classifers are usually a mixture of various approaches such as machine learning methods, blacklist-based techniques, heuristic approaches, etc.For adequate security, attacks on each employed strategy and their defense mechanisms should be separately studied.Nevertheless, our proposed attack highlights the security issues regarding the potential use of DL-based featureless methods for malicious URL detection in industrial solutions.

8 Conclusions

In this paper, we investigated the robustness of featureless malicious URLs detection models and proposed a black-box attack against these models.This attack exploits the sensitivity of NLP-based classifers against small purposely crafted perturbations.The attack can work at segment level, character level, or use both segments and characters changes to fool the classifer.All changes should preserve the typical URL parsing.In essence, we examined three CNNbased classifers:Character-based, word-based, and jointly word and character.The results of our experiments show that the attack causes a 56% decrease in the classifcation accuracy for a joint model, a 77% decrease for a word-level model, and a 60% decrease for a characterlevel model.Furthermore, we also used adversarial training to increase the robustness of the model converting this attack in order to augment the training data.Lastly, we introduced some recommendations that should be considered when designing such systems.The results of this paper indicate that there are still loose ends to further study before applying such systems in real-life security applications and this is related to the progress in defending against adversarial attacks on deep learning models.

Funding Statement:This research was supported by Korea Electric Power Corporation (Grant Number:R18XA02).

Conficts of Interest:The authors declare that they have no conficts of interest to report regarding the present study.