On Multi-Granulation Rough Sets with Its Applications

2024-05-25 14:41RadwanAbuGdairiMareayandBadr
Computers Materials&Continua 2024年4期

Radwan Abu-Gdairi ,R.Mareay and M.Badr

1Department of Mathematics,Faculty of Science,Zarqa University,Zarqa,13132,Jordan

2Department of Mathematics,Faculty of Science,Kafrelsheikh University,Kafrelsheikh,33516,Egypt

3Department of Mathematics,Faculty of Science,New Valley University,El Kharga,72713,Egypt

ABSTRACT Recently,much interest has been given to multi-granulation rough sets(MGRS),and various types of MGRS models have been developed from different viewpoints.In this paper,we introduce two techniques for the classification of MGRS.Firstly,we generate multi-topologies from multi-relations defined in the universe.Hence,a novel approximation space is established by leveraging the underlying topological structure.The characteristics of the newly proposed approximation space are discussed.We introduce an algorithm for the reduction of multi-relations.Secondly,a new approach for the classification of MGRS based on neighborhood concepts is introduced.Finally,a real-life application from medical records is introduced via our approach to the classification of MGRS.

KEYWORDS Multi-granulation;rough sets;data classifications;information systems;interior operators;closure operators;approximation structures

1 Introduction

According to the very rapid growth of data and the high incidence of Internet broadcasting,it has become a seriously urgent issue to extract useful information to make decisions.To do this accurately,quickly,and cost less,researchers need to work together in this field to unify their research framework.Many researchers have solved some of the problems of data sharing without a general conceptual framework governing their techniques.Some of them have used old mathematical techniques;some have used modern statistical methods;and others have developed hybrid methods between mathematics,statistics,and computer science.

In 1982,Pawlak defined the seminal theory of rough sets [1],which can be recognized as a pioneering mathematical framework designed to address the challenges posed by uncertainty,incompleteness,and imprecision within the realm of knowledge representation.Pawlak’s conceptualization delineates an approximation structure,denoted as AS=(ʊ,R),where ʊ represents a universal set and R denotes an equivalence relation imposed upon ʊ.The equivalence classes that arise within ʊ are commonly referred to as the knowledge base.The lower approximation of a subset A of ʊ is the union of all equivalence classes wholly included in A,while the upper approximation is established as the intersection of all equivalence classes that intersect A in a non-trivial manner.Consequently,a rough set is elegantly expressed as a dual entity,comprising both the lower approximation and the upper approximation of set A,each a precise set in its own right.Recognizing the limitations of equivalence relations in the context of various applications in the real world,Pawlak’s classical rough set theory necessitated a process of generalization.This generalization endeavor unfolds on two fronts: Firstly,it involves the substitution of the equivalence relation with alternative relations such as tolerance relations [2,3],similarity relations [4],characteristic relations [5,6],and arbitrary binary relations [7].Second,it involves replacing the partition caused by the equivalence relation with a covering mechanism.This makes it easier to approximate any subset of the universe[8].These versatile frameworks fall under the rubric of granular computing,representing mathematical models that offer innovative solutions to a diverse array of challenges spanning data mining,machine learning,pattern recognition,and cognitive science.Nonetheless,certain challenges persist,necessitating further extensions.

In 2006,Qian et al.[9] defined the concept of multi-granular computing as a paradigm that advocates the utilization of rough sets not in isolation,but as an ensemble of relations acting upon the same universal set.This novel approach,known as multi-granular computing,supersedes the utilization of a single relation typically employed within a single granular setting,as elucidated in references [9–11].Within the realm of mathematics,one of the most pivotal branches is topology,which serves as an indispensable tool for representing intricate relationships between objects or features,particularly when dealing with complex relational structures.Pawlak astutely emphasized the profound interconnection between topology and rough set theory,underlining the conviction that the topological space of rough sets constitutes a fundamental cornerstone within this domain.The convenience of this relationship has led researchers to undertake a comprehensive investigation into its features and practical uses and its real-world applications(see[12–14]).In 2013,Qian et al.investigated a new theory on MGRS from the topological point of view by inducing n-topological structures on the universe set ʊ from n-equivalence relations on ʊ.They studied the multi-granulation topological rough structure and its topological properties(see[15]).This study focuses on enhancing the accuracy measure of rough sets by containment neighbourhoods,specifically in the context of a medical application.Additionally,the study aims to compare two different types of rough approximations that are based on neighbourhoods,for new applications at the same research point(see[16–20]).Topology has many applications in life problems[21–25].

Multi-source information fusion based on rough set theory involves integrating information from multiple sources using rough set theory.Rough set theory is used to handle inexact and uncertain information.It has been applied in various domains,such as parallel computing,neural network modeling,and information entropy.The combination of several rough set models and the use of rough set theory to measure uncertainty in an information system are some of the key aspects discussed in the literature.Zhang et al.[26,27] have presented an application of rough set theory for multi-source information fusion.The approach involves integrating heterogeneous data from multiple sources.Rough set theory is considered an efficient tool for dealing with uncertainty in the context of information fusion.They lay the foundation for integrating rough sets into decision support systems,emphasising data fusion techniques.It explores how rough set theory can effectively handle uncertainty in multi-source information,providing a comprehensive review of decision support applications.Focusing on the integration of rough sets and intelligent systems,it provides insights into the synergies between rough set theory and intelligent systems,offering applications in knowledge representation and decision-making.They provide a comprehensive overview of the role of rough set theory in data fusion.It systematically categorises and analyses existing approaches,shedding light on the strengths and challenges of employing rough sets for integrating information from multiple sources.They focused on practical applications;it explores how rough set theory can be effectively applied in multi-source information fusion scenarios.It discusses real-world examples and showcases the utility of rough sets in handling uncertainties arising from different information sources.

Rough set theory has found practical application in the domain of decision support systems,specifically within the context of data fusion.This theoretical framework proves to be a proficient tool for effectively managing information characterized by imprecision and uncertainty.Numerous scholarly inquiries have delved into the utilization of rough set theory within decision support systems,with a particular emphasis on information fusion scenarios.Notably,Han et al.[28] created an evaluation method based on rough set theory for figuring out what happens when data is missing in decision fusion.Furthermore,academic literature extensively explores the application of rough set theory in knowledge acquisition pertaining to incomplete information systems,showcasing its relevance in constructing decision support models[29–31].Recognized for its efficacy,rough set theory is deemed instrumental in amalgamating disparate data from diverse sources,concurrently offering a means to quantify uncertainty in the information fusion process[32].

Huang et al.[33] stood as a pivotal contribution in the domain of rough set theory,specifically exploring the integration of multi-granulation and fuzzy sets for applications in feature selection.It enriches the theoretical foundations of rough set theory by integrating concepts from multigranulation and fuzzy sets.The novel framework introduced opens avenues for more expressive and adaptable modelling of uncertainty in real-world datasets.Secondly,the application of these concepts to feature selection showcases the practical utility of the proposed methodology.This not only enhances the understanding of data representation but also provides a valuable tool for data scientists and practitioners.

Chen et al.[34,35] introduced a novel variable precision multigranulation rough set model that extends traditional rough set theory by accommodating variable precision granules.The authors delve into the mathematical foundations of this model,elucidating the principles governing the variable precision within granules.The study also explores the practical implications of this model,particularly in the realm of attribute reduction,demonstrating its effectiveness in handling uncertainty and imprecision in real-world datasets.

The multi-granularization decision-theoretic rough set (MG-DTRS) helps with cost-sensitive decision-making in multi-view and multi-level situations.One shortcoming of the MG-DTRS model is the use of subjectively assigned probability parameters(αandβ)to compute three areas.Adaptive MG-DTRS(AMG-DTRS)is introduced in this study to overcome this issue.The suggested AMGDTRS model uses a compensation coefficientζto provide adaptability in acquiring probabilistic thresholds.The research also examines three mean AMG-DTRS models,providing a new perspective on multi-granulation information fusion.The following analysis compares the proposed AMGDTRS model to existing MGRS models,highlighting its advantages and generalizations.The paper also shows that the proposed framework may explicitly derive several existing MGRS models from(MG-DTRS),MGRS,and VP-MGRS models.These discoveries strengthen granular computing’s information fusion framework[36].

This work presents a novel approach that combines topology and rough set theory to address the challenge of exchanging multi-source,variable,and large-scale data in a more efficient manner.Additionally,we engage in the development of algorithms that are derived from the extraction of knowledge from the aforementioned data.The structure of this work is as follows:Sections 2 provides an exposition of the essential concepts and features of generic topology,along with an introduction to certain notions pertaining to information systems.In Sections 3,we present two methodologies for generalised multi-granulation,which can be classified into two distinct groups.The initial strategy involves the establishment of a novel approximation space with the objective of reducing the boundary region.Conversely,the second approach employs the notion of minimal neighbourhoods.In Sections 4 of our study,we utilise our findings to address the issue of attribute reduction in medical information systems.The conclusion and potential avenues for future research are outlined in Sections 5

2 Preliminaries

We provide the basic definitions and results on topological structures and rough sets.In classical rough set theory,the approximation structure is defined as(ʊ,R)where ʊ is non-empty finite set and R is an equivalence relation on ʊ.

Definition 2.1.[1].Let(ʊ,R) be a classical approximation structure.The lower and upper approximations of a given setY⊆ʊ are defined as follows:

Lemma 2.2.The boundary region ofYis given by-,is called the positive region while ʊ-is called the negative region.

Definition 2.3.[37].Let(ʊ,τ)be a topological structure,then theτ-closure of a subsetA1⊆ʊ is defined as follows:cl(A1)τ=∩{F⊆ʊ:A1⊆FandF∈τc}.

Definition 2.4.[37].Let(ʊ,τ)be a topological structure,then theτ-interior of a subsetA1⊆ʊ is defined as follows:int(A1)τ=∪{G⊆ʊ:G⊆A1andG∈τ}.

Pawlak pointed out in[1]that lower approximations correspond to interiors and upper approximations correspond to closures.This idea has prompted the researchers to study the theory of rough sets from a topological point of view to know more about rough sets.

Definition 2.5.[37].If ʊ is a finite universe and R is a binary relation on ʊ,then we define the right neighborhood ofx∈ʊ as follows:RN(x)={y:xRy}.

Definition 2.6.[38].Let ʊ be a non-empty set.A basis for a topology on ʊ is a collectionβof subsets of ʊ such that:

1.For everyx∈ʊ,there is at least one basis elementBcontaining x.

2.If x belongs to the intersection of two basis elementsB1and B2,then there is a basis elementB3containing x such thatB3⊆B1∩B2.

Definition 2.7.[37].Letτbe a topology on a finite set ʊ,with baseβ,then the rough membership function is=x∈ʊ whereBxis any member ofβcontaining x.

Theorem 2.8.[38].Let(ʊ,τ)be a topological space,A⊆ʊ thenx∈cl(A)τif and only ifG∩A/=φ,∀G∈τandx∈G.

The idea of multi-granulation is based on using multi-relation instead of a single relation to obtain a better approximation.Thus,we start by giving the definition of multi-granular rough sets based on equivalence relations.

Definition 2.9.[15].Let(ʊ,τ1),(ʊ,τ2),...,(ʊ,τn)be n topological structures induced by equivalence relationsR1,R2,...,Rn,respectively,andX⊆ʊ.Then,we define mint and mcl operators of X with respect to,whereΓ={τ1,τ2,...,τn},respectively,as follows:

3 MGRS Based on Topological Structure

In this section,we introduce a new theory on MGRS from the point of view of topological structures.We generate topological structures from arbitrary relations suitable for real life problems in other branches like artificial intelligence,knowledge discovery,machine learning and data mining.Also,we propose that it might be considered an extension or generalization of the Pawlak rough set framework,and we introduce a new algorithmic method for the reduction of attributes in the information(decision)system.

Theorem 3.1.LetRbe a binary relation on the nonempty set ʊ.∀p∈ʊ,∃RN(p)⊆ʊ,the familyτ={X⊆ʊ: ∀p∈X,RN(p)⊆X}forms a topology on ʊ.

Proof.

Obviously ∅,Ubelong toτ;

Suppose that{Xi:i∈N}is a family of sets inτ,p∈∪i∈I Xi.Then ∃Xi0,p∈Xi0,i0∈N.Hence,RN(p)⊆Xi0which leads toRN(p)⊆∪i∈N Xi.Thus ∪i∈N Xi∈τ;

LetX1,X2∈τ,p∈X1∩X2.Then,p∈X1andp∈X2which leads toRN(p)⊆X1andRN(p)⊆X2.Therefore,RN(p)⊆X1∩X2andX1∩X2∈τ;

The following example illustrates Theorem 3.1.

Example 3.2.Let U={a,b,c,d}be a non empt set and R={(a,a),(a,b),(b,b),(b,a)(c,c),(c,d),(d,b)}be an arbitrary relation.Then,RN(a)={a,b},RN(b)={a,b},RN(c)={c,d},RN(d)={b}.Hence,τ={U,∅,{a,b},{c,d},{b},{b,c,d}}forms a topology on ʊ.

Remark 3.3.From Theorem 3.1,we can generate many topological structures from any finite number of arbitrary relations.So,we are ready for the following definition of the approximation structure.

Definition 3.4.Let(ʊ,R) be a knowledge base,whereRis a family of binary relationsR1,R2,R3,...,Rnon the universe ʊ andτ1,τ2,τ3,...,τnare induced topologies on ʊ by the binary relations.Then,the multi-lower approximation and multi-upper approximation of the setX⊆ʊ are defined as,respectively,

The pair(MLA(X),MUA(X))is called a MGRS ofX.

WhenLA(X)=X,resp.,MUA(X)=X,we say thatXis a lower definable,resp.,an upper definable set in the MGRS model.If the setXis both lower definable and upper definable,we say the setXis definable.

Lemma 3.5.Suppose that(ʊ,R)andτ1,τ2,τ3,...,τnare induced topologies on ʊ.Then forX⊆ʊ,i∈N,we have:

Proof.

Proposition 3.6.Assume that(ʊ,R)is a knowledge base whereRis a family of binary relationsR1,R2,R3,...,Rnon the universe ʊ andτ1,τ2,τ3,...,τnare induced topologies on ʊ by the binary relations.ForX1,X2⊆ʊ,the following properties hold:

Proof.We will prove 7,8,9,10,11 and 12 parts.The proof of the other parts is clear from Definition 3.1.

Example 3.7.Suppose that(ʊ,R) is knowledge base sinceRis a family of binary relationsR1,R2,R3,...,Rnon the universe ʊ={a,b,c,d},where R1={(a,a),(a,b),(b,c),(b,d),(c,a),(d,a)},R2={(b,b),(c,d),(c,a),(d,b),(d,c)}and R3={(a,a),(b,c),(b,d),(c,c)}.Then,by Theorem 3.1,the induced topologies areτ1={{a,b},{c,d},{a},{a,c,d},U,∅},τ2={{a,d},{b,c},{b},{a,b,d},U,∅}andτ3={{a},{c,d},{c},{a,c,d},{a,c},U,∅}.LetX1={a,b,c},X2={b,c,d}.Then,the approximation of two sets is presented in the following Tables 1 and 2.

Table 1: Comparison among different approaches for approximation of X1

Table 2: Comparison among different approaches for approximation of X2

Remark 3.8.From Example 3.7,we note that the accuracy measure of our approximation structure is higher than the other approaches,as our approach is considered a generalization for the others.

3.1 Multi-Granulation of Rough Set Based on Neighborhood Concept

Definition 3.9.LetR1,R2,R3,...,Rnbe binary relations on the nonempty set ʊ.We define the minimal neighborhoodMN:U→P(U) asMN(x)=∩RNi(x),∀x∈ ʊ,whereRNi(x)right neighborhoodx∈ʊ.

Theorem 3.10.IfR1,R2is two binary relations on the nonempty set ʊ,then,the family={MN(x):x∈ʊ}form a topological base for ʊ.

Proof.Sincex1∈MN(x1) for everyx1∈ʊ.So,suppose thatβ1,β2∈,e∈β1∩β2.Thenβ1=MN(x1),β2=MN(x2) forx1,x2∈ʊ.Hence,e∈β1∩β2if and only ife∈β1=MN(x1)ande∈β2=MN(x2).Obviously,e∈MN(e) ⊆MN(x1)ande∈MN(e) ⊆MN(x2).So,∃β3=MN(e)∈such thate∈β3⊆β1∩β2.

Corollary 3.11.Suppose thatR1,R2,R3,...,Rnbe family of binary relations on the nonempty set ʊ.Then,the collection of neighborhoods={MN(x):x∈ʊ}form a topological base for ʊ and the structure(ʊ,τMN)is called generalized multi-granular rough based topological structureGMRTS.

Theorem 3.12.Suppose thatR1,R2,R3,...,Rnbe a family of binary relations on the nonempty set ʊ.If any relationRi0is an identity,then generalized multi-granular rough based topological structureGMRTS(ʊ,τMN)is the same topology generated byRi0.

Proof.SinceRi0,i0 ∈{1,2,3,...,n} is an identity relation.ThenRNi0(x)=x,∀x∈ʊ.Thus,N(x)=RNi0(x),∀x∈ʊ and the base={MN(x):x∈ʊ}is equal the base generated byRi0.So,topological structureGMRTS(ʊ,τMN)is the same topology generated byRi0.

Example 3.13.Let ʊ={b1,b2,b3,b4,b5,b6} andR1,R2,R3be family of relations on ʊ definedas:R1={(b1,b1),(b1,b2),(b3,b3),(b3,b5),(b5,b5),(b6,b3),(b6,b6)},R2={(b1,b1),(b1,b2),(b2,b1),(b2,b2),(b3,b3),(b3,b4),(b4,b4)},R3={(b2,b1)(b2,b2),(b3,b3),(b3,b4),(b4,b4),(b4,b5),(b4,b6),(b5,b5),(b5,b6)(b6,b4),(b6,b5),(b6,b6)}.ThenMN(b1)={b1,b2},MN(b2)={b1,b2},MN(b3)={b3},MN(b4)={b4},MN(b5)={b5},MN(b6)={b6}.Hence,={{b1,b2},{b3},{b4},{b5},{b6}} is the base of generalized multi-granular rough structureGMRTS.In Table 3,there is a comparison among different approximations of the setB={b1,b4,b6}.

Table 3: Comparison among different approaches for approximations on B

The reduction process of data is very important since we express the whole data by a part of it with conservation of the structure of the whole data.So we introduce an algorithm for relation reduction by removing the superfluous relations and expressing the hole data of the universe by fewer number of relations.In this algorithm,we remove the redundant bases that generated from the superfluous relations.This reduction may be helpful in the process of decision-making.

This algorithm is shown by the following example:

Example 3.14.That ʊ={b1,b2,b3,b4,b5,b6} andX={B1,B2,B3} be a family of bases induced by any arbitrary relationsR1,R2,R3,respectively,whereB1={{b1,b2,b3},{b4,b5,b6}},B2={{b1,b2,b3,b4,b6},{b5}}andB3={(b1,b4),(b2,b5)},(b3,b6)}}.Then ∩ni=1Bi={{b1},{b2},{b3},{b4},{b5},{b6}}.If we removeB1fromX,thenB2∩B3={{b1,b4},{b2},{b3,b6},{b5}}/=∩Bi.Hence,R1belongs to the reduct.Similarly,we find thatR3that belongs to the reduct,butR2is redundant relation and can be omitted.So,the reduct={R1,R3}.

4 Real Life Applications

The data used in this study is based on the collected data of the following paper[14].

4.1 Clinical Data Description

Patients with digestive disease have become so many of these lesions due to the high number of fast foods,which contain high calories,as well as processed meat.As a direct result of this food,many people suffer from excessive infusion and the subsequent diseases of the digestive system,the most serious of which are stomach and colon cancers.Because of the eradication of the stomach,the food directly goes to the intestine,causing confusion in the absorption.The patients have some violent symptoms after the meal,such as dizziness,headache,colic and increasing the blood sugar.After a period,the patient has the highest and most dangerous complications,such as high cholesterol and clogged arteries leading to heart attacks.

Hereditary nonpolyposis colorectal cancer (HNPCC) is the most common type of intrinsic stomach and colon cancer syndrome.HNPCC,also known as Lynch syndrome,raises the risk of stomach and colon cancer,as well as other malignancies.People with HNPCC are more likely to develop stomach and colon cancer before the age of 50.FAP(familial adenomatous polyposis)is an uncommon condition that causes hundreds of polyps to grow in the inner layer of your stomach,colon,and rectum.People who consume unprocessed FAP have a significantly higher chance of acquiring stomach and colon cancer before the age of 40.

4.2 Analysis of the Problem

Our aim in this study is to find recommendations for patients and show them appropriately greeted approach combines treatment and exercise to reach results and explain the function of every presentation of the positive and negative impact on the patient.The decision of the Physician for the medical reports is the continuation of the medical tests,which are all for another or off medical analysis.The patient’s condition is stable and insensitive to a healthy style of workout.

4.3 Problem Formulation

According to the medical reports requested by the doctor for patients in this case,the following attributes:

1)Liver Functions:Of type S.GPT(Natural percent between 0 to 45 U/L)and of the type S.GOT(Natural percent between 0 to 37 U/L).

2)Kidney Functions:The measurements of uric acid in the blood(Uric Acid varies between 3 to 7 mg/dl).

3)Fat Percentage:Fats in the blood are divided into two types,the cholesterol level has a natural a range of less than 200 mg/dl.The border range is between 200 to 240 mg/dl.The critical a range of it that causes arteriosclerosis or heart disease is higher than 240 mg/dl.Second,the so-called triglycerides range that has reference up to 150 mg/dl.

4)Heart Efficiency:We measured the enzyme(Serum LDH)has a range of reference between 0 to 480 U/L.

5) Signs of Tumors: We tested the digestive system through the scale (CEA) and normal Nonsmoking rooms if it was less than 5 mg/ml.The other measure is so-called CA 19.9 and extent of reference from 0 to 39 U/ml.

6)Viruses Hepatitis:Test the patient’s immunity against viruses of type B(HBC)and of type C(Highly infectious)furthermore is positive or negative.

7)Blood Sugar:The patient’s measurement of sugar after fasting for 6 hours,an hour after eating,and then two hours after eating.The results of the seven patients were collected from official files in the physician,which was done after six months of surgery(see Table 4).

Table 4: Medical decision information system

We define a suitable relation for every attribute and apply our approach to this data as follows:

By calculating the neighborhood for every element,we get={{p1},{p2},{p3},{p4},{p5},{p6},{p7}}as a topological base for approximation structure(ʊ,τMN).So,if we take the set of patients with conditionC(need more check up),PC={p1,p2,p5,p6}.Therefore,the lower approximationintMN(PC)and upper approximationclMN(PC)are equal toPC={p1,p2,p5,p6}and the accuracy measure is 100.When we apply the algorithm of reduction,we getredcut(R)={R7,R11,R12}and accuracy measure of the approximation remain 100.After reduction,the table of information is reduced to be as in Table 5 and has the same structure as the original data in Table 4,where{R7,R11,R12}represent the attributes{FP1,ST2,BS},respectively.From this reduct we get the decision rules to be used in the decision making in the future tests by a decision program.

Table 5: Reduct information system

4.4 Results Analysis

This method of dividing patient data from the results of the 12 medical examinations has been reduced to only three tests to be sufficient to make the right decision for these patients.There are other alternatives for decision-making.Using the pathological method of data analysis and division,we have been able to find more than one reduction in medical examinations and every patient can choose the appropriate alternative in terms of financial capacity and likelihood.

5 Conclusions and Future Works

The amount of research papers available online on the topological application is growing and this growth has generated a need for a unifying theory to compare the results.Also,we need new techniques and tools that can intelligently and automatically extract implicit information from these data.These tools and technicalities are the subjects of future research trends using general topological concepts.It may be inferred that the incorporation of topology in the construction of knowledge base concepts facilitates the generation of comprehensive outcomes,which encompass several logical statements that unveil concealed linkages within data.Furthermore,this integration potentially contributes to the formulation of precise rules.

In future papers,we hope to study more generalizations using topological concepts such as near sets.And apply these generalized concepts to realistic medical data of large sizes.The topic of multivariate data reduction can also be studied using generalized topological concepts,developing a unifying theory of topological generalizations that uses rough concepts.Scaling up to design topological software to handle big dimensional classification problems.

Acknowledgement:None.

Funding Statement:This research is funded by Zarqa University,Jordan.

Author Contributions:Study conception and design:Radwan Abu-Gdairi(RG),R.Mareay(RM).M.Badr (MB);data collection: RG,RM and MB;analysis and interpretation of results: RG,RM and MB;draft manuscript preparation:RG,RM and MB.All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:Data are contained within the article.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.