Ran Xu,Yuehong Zhao,*,Qingzhen Han,Xinyu Liu,Hongbin Cao,Hao Wen
1Division of Environmental Engineering and Technology,Institute of Process Engineering,Chinese Academy of Sciences,Beijing 100190,China
2State Key Laboratory of Multiphase Complex Systems,Institute of Process Engineering,Chinese Academy of Sciences,Beijing 100190,China
3University of Chinese Academy of Sciences,Beijing 100049,China
Keywords:Existing commercial compounds database De-phenol extractant Database-based generation strategy Environmental risk
ABSTRACT A database-based strategy of candidate generation was proposed for molecular design of new de-phenol extractants following the idea of finding new applications of existing commercial compounds.The strategy has the advantage that the environmental,safety and health risks of candidate compounds are known and controllable.In this work,the Existing Commercial Compounds(ECC)database and special combined search strategy were developed as the base for the proposed CAMD method following such idea,and molecules for phenol extraction used in coking wastewater treatment were selected from the ECC database.The candidate solvents cover the following categories:ketones,esters,ethers,alcohols,anhydrides and benzene compounds,which are consistent with the de-phenol extractants commonly used in the industry or experiment.The compounds with higher partition coefficient and selectivity than widely used methyl is obutyl ketone(MIBK)are mainly ketones.26obtained molecules showhigher partition coefficient and selectivity than MIBK,which aresuggested to be fur ther investigated by experiment.Furthermore,analysis of these potential molecules may present the effective functional groupsas the initial group setto generate new molecular structures of de-phenolextractants.The results sho wthat the proposed method enables us to efficiently generate chemicals with benefits of less time,less economical cost,and known environmental impact as well.
Phenols are a series of organic pollutants with strong biological toxicity and exist in the wastewater from petroleum,petrochemical,coal conversion,phenol-producing,and phenolic resin industries [1]. Extraction is used as an effective method of gathering and recovering phenols and other organic pollutants in wastewater, where benzene, octanol, isopropoxy propane, methyl isobutyl ketone (MIBK), pentyl acetate, ethyl ethanoate, isopropyl ethanoate, etc. are commonly used extractants [2]. The efficiency improvement of the extraction with lower loss of extractants is, therefore, focused on the design and screening of new extractants.
Computer-aidedmoleculardesign(CAMD)[3]isbecomingthe most potential methodology for design and selection of chemicals in the last two decades,besides the traditional experimental synthesis and screening.CAMD is a method to determine the structures of feasible chemical compounds that meet thetarget properties.It combines property prediction methods with computer-assisted search in the design of various chemical products.Great progresses on the property prediction models and methodology of CAMD,generate-and-test[4,5]and mathematical programming approach[6–9],were made with applications for chemical develop mentina variety of fields,suchas extractants[10],absorbents[11],crystallization[12]and reaction solvent selection[13,8],refrigerants[14],catalysts[15]and polymers[16].
CAMD generates a list of chemical candidates with reasonable accuracy within a moderate time scale.However,new candidates may have unknown environmental,safety and health risks,such as toxicity,body contact,emergency risk,and potential health effects.Assessment of these unknown risks of a chemical before its production and application will take much time and efforts.In Europe,the cost of registering chemicals to comply with REACH could exceed 2.1 billion Euros,based on about 30000 substances[17].In China,the reporting period of the new chemical substances will take 8 to 56 months[18].
In addition,the choice of functional groups for candidate structure generation is highly dependent upon the experience of researchers[4],which brings uncertainties in the type and total number of initial functional groups,and the complexity of structure generation.The preselection of functional groups proposed by Song[19]for extractant design is,based on the interactions between functional groups and solvent selectivity,helpful to limit the type of initial functional groups.
In this work,a strategy of candidate generation is attempted in molecular design of new de-phenol extractants,by selecting potential molecules from the existing commercial compounds database.The advantage of this strategy is that the candidate molecules selected from the existing commercial compounds database have full information on the environmental,safety and health risk.In other words,the strategy may present the possibility of finding new applications of existing commercial compounds.Furthermore,analysis of these potential molecules may present the effective functional groups as the initial group set to generate new molecular structures of de-phenol extractants.
In order to avoid unknown risk of new candidates generated from CAMD,we proposed a database-based generation strategy of candidate extractants.The strategy is based on the existing commercial compounds to design/select new de-phenol extractants.The existing commercial compounds have the advantage that the environmental,safety and health risks of candidate compounds are known and controllable,such as toxicity,body contact,emergency risk,and potential health effects.The environmental,safety and health impact of all these compounds can be obtained from many open databases via CAS number,such as the Material Safety Data Sheet(MSDS)[20].
By collecting the molecules in chemical inventories,the Existing Commercial Compounds(ECC)database was built.The proposed ECC database was detailed along with its structural information,property estimation methods,and estimation procedure via the group contribution methods and group match toolkit.By setting multiple property criteria for extraction,a list of potential molecules with high de-phenol performance may be generated.Therefore,the strategy also has the possibility of finding new applications of existing commercial compounds.
The promising molecules will be ranked and compared in terms of the extraction performance.Due to the rich known data of storage,transportation,personnel protection,accident emergency,waste disposal,etc.,candidate molecules generated from existing commercial compounds database can be used directly for experimental verification and further industrial production,which will save much economic and time cost.By analyzing the candidate molecular structures,the effective functional groups were obtained,which are useful as initial groups to generate new molecular structures of extractants.
The removal of phenol from coking wastewater by liquid–liquid extraction is the key step for such waste water treatment.In order to find potential molecules for phenol extraction process,a framework following database-based generation strategy was proposed as shown in Fig.1.Itcan be carried out by3 steps:(1)Development of the existing commercial compounds database;(2)generation of candidate de-phenol extractants;and(3)analysis on the generated candidate extractants.The details are given in the following parts.
In order to obtain the compounds with known environmental,safety and health risk,the ECC database was developed.The data sources of commercial compounds are from the inventory of existing chemical substances,involving the European Inventory of Existing Commercial Chemical Substances(EINECS),Toxic Substances Control Act Chemical Substance Inventory(TSCA Inventory),and Inventory of Existing Chemical Substances Produced or Imported in China(IECSC)[21].The three inventories are the main inventories covering most of the chemicals commonly used in regions and countries.
However,the inventory of existing chemical substances only contains CAS number and chemical substance names,and cannot meet the requirement of extract ant selection.In this paper,we used CAS number as input to compound registration system to extract the corresponding structure information(Mol files[22]),which is then inserted into the database for further property estimation for other data needed.The overview of the chemical inventories is listed in Table 1.Finally,a total of 34195 compounds were obtained.The distribution of chemical compounds in three inventories is shown in Fig.2.
Fig.1.Framework for candidate extract ant generation.
Table 1 Overview of the compounds in three inventories
Fig.2.Distributions of 34195 chemical compounds in EINECS,TSCA and IECSC.
Besides the basic information(CAS,molecular formula,the inventory type,the English name,etc.)and structure information(Mol files),the ECC database also should involve some target properties used for further extract ant generation.The constraint on the target properties will narrow the search scope of candidates.Based on the criteria of extract ant selection[23],we summarized the target properties set as criteria for solvent design and their impacton extraction process in Table 2.
Due to limited experimental data of solvent for a given separation problem,predictive group contribution(GC)methods[24]were used to calculate the above properties.GC methods have the advantages of acceptable error range of estimation accuracy;less use of other physical properties or parameters;simple calculation and wide application range[3].The extraction performance was formulated by the activity coefficients at in finite dilution evaluated by the Dortmund UNIFAC method[25].Compared to the original UNIFAC method,Dortmund UNIFAC provides a much better description of the temperature dependence of the activity coefficients[26],and a more reliable presentation of the real behavior of phase equilibria in the dilute region[27].The estimation models of physical properties,normal boiling point[24,28],melting point[24,28],density[24,29],vapor pressure[30],surface tension[30]and viscosity[30],are shown in Table 3.
Table 3 Summary of estimation methods of target properties
Table 4 The constraint value of the target properties used in de-phenol extraction
Table 5 Summary of contaminant removal
Then,we proposed an estimation procedure for the batch calculation of the target properties,as shown in Supplementary Material S1.Finally,the collected data forms the base for generation of the potential candidate extractants.The ECC database consists of the basic information of the compound(CAS,molecular formula,the directory,the English name),the structural information(Mol,SMILES,SMARTS),theextraction performance properties (partition coefficient,selectivity,solvent power,solvent loss),physical properties(melting point,normal boiling point,vapor pressure,density,molar mass,surface tension,viscosity)and number of groups of molecular.All information is stored in SQL server database. An example to show the storage of the molecular information in the ECC database is shown in Supplementary Material S2.
Fig.3.The joint retrieval process and entity relationship diagram of ECC database.
Fig.4.Number of feasible solvent candidates selected in different screening steps.
Generally,in the industrial production process,selection of the appropriate extraction agents mainly considers the follo wing criteria[23]:
(1)Extraction performance:high selectivity,high partition coefficient,low water solubility;
(2)Physical and chemical properties:large density difference,adequate interfacial tension,low meltingpoint,low dynamic viscosity;
(3)Environmental friendliness:good chemical stability,high flash point,small toxicity and corrosivity;
Fig.5.Matrix scatter plots and correlation coefficients of different extraction properties.(Abscissaand ordinate represent the values of extraction performance,that is partition coefficient(m),selectivity(β),solvent power(SP)and solvent loss(SL)respectively.)
Fig.6.Partition coefficient(m)vs.selectivity(β)for different solvent classes:shown in open points(solvents selected from database)and solid points(solvents commonly used in experiment).
(4)Economy:convenient storage and transportation,low prices,easy acquisition,easy recovery,etc.
Based on the criteria of extract ant screening, the scopes of properties for phenol extraction were determined by the extractants commonly used in experiment.The criteria used in phenol extraction are shown in Table 4.
Extract ants are also needed to obey environmental,health and safety constraints in addition to usual separation performance requirements.In order to minimize adverse effects of extractants on the environment,solvents with corrosion properties,environmental toxicity,or chemical instability must be avoided.Table 5 shows the compounds removed out of the candidate compounds list.
In this paper,we used SQL statement to achieve the joint search of target properties.The joint retrieval process is shown in Fig.3.And the entity relationship diagram of ECC database is connected by the unique CAS number,which makes it easy to ful fill the combined search.The numbers of potential solvent candidates after every screening are listed in Fig.4.At last,a list of feasible solvent candidates containing 594 compounds was generated from ECC database.
In order to narrow the solvent list further and set priorities of the selected solvents,the most promising candidates from previous stage should be sorted by evaluating the target properties.The top molecules in the list with priorities can be used for further research and experimental investigation,which will save much time and efforts.
Generally,the sorting criterion is based on one main target property or the objective function by assigning weights among properties[23].However,the weighting factors are Influenced greatly by experience.In this work,we studied the relationship between the extraction properties,as shown in Fig.5,to reduce the number of objective properties.Matrix scatter plots represent the distribution of two properties,for example,the relationship between m~β,m~SL,and m~SP in the first column.The histograms are the numbers of different values of a property.The numbers with asterisk represent the correlation coefficients of different extraction properties.The correlation analysis of different extraction performance properties was achieved by Performance Analytics package and chart.Correlation function of R language[31].
We can see that only the correlation between partition coefficientm and solvent power SP is highly linear correlated.The differences bet weenm and SP are the mass ratio and the activity coefficients at finite dilution of phenol in water,based on the formulas of m and SP.In this paper,phenol and water are identified substances,and thevalue is a constant.Meanwhile,the molar mass of extractants ranges from 70 and 200.The mass ratio of water and extractant changes slightly.Therefore,SP can be ignored in the further analysis of the extraction properties.
On the other hand,solvent losses SL of candidate solvents are all below 0.025,which indicates that SL can be ignored in the analysis of the extraction properties.Inthe following,the distribution of extract ant candidates with different m and β was analyzed.
The distribution of candidate extract ants with different partition coefficient m and selectivity β is shown in Fig.6.Promising de-phenol extractants cover the following categories:ketones,esters,ethers,alcohols,anhydrides and benzene compounds,which are consistent with the de-phenol extractants commonly used in the industry or experiment.The most promising molecule for de-phenol extraction is methyl isobutyl ketone(MIBK)generated from the experimental database[23].In this work,MIBK is setasa basis for molecule selection with bet terextract ion performance.The candidate solvents are divided in to four types based on partition coefficient mand selectivityβof MIBK with the dividing line,as shown in Fig.6.
The list of 594 promising solvent candidates covers the commonly used 20 compounds in experiment,as shown in Table 6,which proves that the method used in this paper is reliable.In order to compare the polarity of molecules,the dipole moment was calculated by Gaussian 03W.The dipole moments of water and phenol are 1.82 and 1.53 Debye respectively.
The first part contains 26 compounds whose partition coefficient m and selectivity β are both higher than MIBK,as shown in Table 7.These solvents are basically ketones and esters,and most of the ketones contain two carbonyl groups.Compounds with two carbonyl groups may be unstable such as occurring condensation and need further experimental verification.In addition to several cyclic ketones,the others are all chain ketones.The two carbonyl groups in type II molecules shown in Table8are adjacent orsym metric,which leads to the small dipole moment and the small polarity.Based on the Like-Dissolves-Like rule[32],the phenol is more likely to be dissolved by these candidate extractants with lower dipole moment.Although the type I molecules have a high dipole moment and the great polarity,the oxygen atoms in the carbonyl group easily form a hydrogen bond with α-H in phenol,which promotes the dissolution of phenol in candidate extractants.Meanwhile,because of the large charge density of carbonyl groups,the oxygen atoms are mutually exclusive with that in water,which reduces the solubility of each other.Therefore,these molecules show a good overall performance(high partition coefficient m and selectivity β).The molecules selected in the first part can be used to guide the experimental screening of the extraction solvents by giving a broad range of preferred solvents.
Table 6 Summary of solvent candidates commonly used in experiment
Table 7 Summary of the solvent candidates in first part
The second part distributes 17 compounds whose partition coef ficient m is higher than MIBK and selectivity β is lower than MIBK.We can see from Fig.6 and Table 9 that most of the solvents are chain molecules.Besides thecarbonylgroup,therearealsoestergroupsandethergroups appeared in the compounds.The molecules in second section contain more oxygen atoms than those in the first part.Due to the big polarity of oxygen atom,the more oxygen atoms in the molecule,the easier to form hydrogen bond with αH in phenol.Therefore,the molecules in the second part have a high dipole moment and a high partition coefficient m.On the other hand,the oxygen atoms of ester groups and ether groups easily generate hydrogen bonds with water,thus,the dissolution of molecules in water results in low selectivity β.
Table 8 The dipole moment of the solvent candidates in first part
The third part includes 45 compounds whose β is higher than MIBK while m is lower than MIBK.Table 10 lists the extraction properties and structural information of the solvent candidates in the third part.Most of the solvents are ketones and esters and many of these compounds contain one functional group.These molecules have a large dipole moment and a big polarity.Almost all of the compounds are cyclic compounds or aromatic compounds,which is similar to the structure of phenol.Because the similar substance is more likely to be dissolved by each other,the combination of these molecules with phenol is stronger than with water.As a result,the selectivity β of the molecules in the third ppart is higher. However, owing to the influence of steric hindrance, the interaction between phenol and extractant is not as good as those molecules in the first part. Consequently, the partition coefficient m is lower.
Table 9 Summary of the solvent candidates in second part
The solvents selected in the second part and third part show only one excellent performance, high partition coefficientmor high selectivity β. Although their extraction performance is not as well as those in the first part, they can be used to find suitable solventmixtures by combining the solventswith high partition coefficient and the solvents with high selectivity. Therefore, the molecules in the second part and third part are the promising solvents for further design of solvent mixtures.
In order to assist the preselection of functional groups in CAMD,we also studied the structure–property relationships,and summarized the potential functional groups and the ranges of number of different groups.The promising molecules in the three parts were split into a set of groups using the Dortmund UNIFAC method.
Fig.7 summarized thesum number of thegroupsin eachpart.Inthe first part,the functional groups ‘CH3’and ‘CH2’indicate a higher frequencythan ‘CH’and ‘C’,whichimplies theless appearance of branched structures within the molecules.And chain groups have a higher frequency than cyclic groups.The functional groups ‘CH3CO’and ‘CH2CO’have the highest frequency.In the second part,there are more straight chain hydrocarbon molecules and no aromatic groups or cyclic groups.There are four kinds of functional groups,including ester(CH3COO,CH2COO),ether(CH3O,CH2O),carbonyl(CH3CO,CH2CO)and hydroxyl(OH(S)).In the third part,almost all molecules contain the aromatic groups or cyclic groups,and the frequencies of the chain groups are basically similar to the frequencies of aromatic groups or cyclic groups.On the other hand,the frequencies of ‘CH3CO’and ‘CH2CO’groups are higher than those of the ‘CH2COO’,‘CH3O’and ‘CH2O’groups.Fig.8 shows the max number of each group in candidate solvents.It shows that for the functional groups,the max number of chain groups,aromatic groups and cyclic group is 6,while the max number of functional groups ‘CH3COO’,‘CH2COO’,‘CH3CO’,‘CH2CO’,‘CH3O’,‘CH2O’,and ‘OH(S)’is 2.Fig.9 shows the frequency of different carbon numbers in the molecule of each part.The partition of the number of carbon atoms is basically a normal distribution.
The number ranges of chain groups,aromatic groups and cyclic groups of compounds in each part can be used as the initial input condition for the CAMD.Depending on the number of functional groups,compounds with different extraction properties(high partition coefficient or high selectivity)can be designed.Overall,the proposed method enables us to design more efficient chemicals with bene fits of less time,less economical cost,and known environmental impact as well.
In this work,a database-based extractant generation strategy was established,following the idea of finding new applications of existing commercial compounds.And the Existing Commercial Compounds(ECC)database was developed as the base for molecular selection and design.
The case study for de-phenol extractant generation for liquid–liquid extraction for coking wastewater treatment shows that this method can successfully select promising novel solvents.A list of feasible solvent candidates containing 594 compounds was obtained.The result candidate solvents cover the following categories:ketones,esters,ethers,alcohols,anhydrides and benzene compounds,which are consistent with the de-phenol extractants commonly used in the industry or experiment.Among them,26 promising molecules with higher partition coefficient and selectivity than MIBK were selected,besides 17 molecules with higher partition coefficient and 45 molecules with higher selectivity.Moreover,we also studied the structure–property relationships.Four kinds of functional groups,including ester(CH3COO,CH2COO),ether(CH3O,CH2O),carbonyl(CH3CO,CH2CO)and hydroxyl(OH(S)),showed good de-phenol performance.The potential functional groups and the ranges of number of different groups were also summarized toassist carrying out CAMD.The results show that the proposed method enables us to design more efficient chemicals with bene fits of less time,less economical cost,and known environmental impact as well.
Table 10 Summary of the solvent candidates in third part
Table 10(continued)
Acknowledgments
Thanks to Shanghai Institute of Organic Chemistry,Chinese Academy of Sciences for helping to provide the Mol files of the compounds.
Fig.7.Sum numbers of the groups in each part.
Fig.8.Max numbers of the groups in each part.
Fig.9.Frequency of different carbon numbers in molecular of each part.
Supplementary Material
Supplementary data tothis article canbefoundonlineathttps://doi.org/10.1016/j.cjche.2018.01.014.
Chinese Journal of Chemical Engineering2018年7期