Yun Kaiguo Yang Yu Yang Yixian
(State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China)
Abstrac t:Audio information hiding technology is an important branch of stenography.This article introduces the basic concepts,principles,characteristics and the typical models of the audio information hiding.It then classifies the audio information hiding methods,according to the domain used by them including time,frequency,Discrete Cosine Transform(DCT),Discrete Wavelet Transform(DWT)and the compressed.,and discusses several mainstream algorithms in various domains.It finally points out that the audio information hiding technology cannot solve the problem of re-record attacks,but it also brings forward the possible solutions to this problem.
A udio information hiding technology has become a research focus and hot topic recently due to two reasons.One is that audio is an important tool for human beings to communicate,and is an indispensable part of our daily life.The other reason is that there is enough information redundancy in audio,which provides a favorable application environment for information imbedding.The core idea of the technology is to use audio as the hiding carrier and find out the characteristics that make the audio inaudible to human ears;then it imbeds the secret information into the audio by modifying some parameters of these characteristics;finally,the audio with the secret information is transmitted to the receiver.Some applications of this idea were discussed in references[1-6].
The main method used in audio information hiding is to modify the audio parameters to which human ears are insensitive of so that the secret information can be imbedded into the audio.Therefore,the primary task involved in audio information hiding is to find out those audio parameters.
The research shows features such as hearing threshold and auditory masking effect have great influence on the sensitivity of human ears.
In acoustics,the sound intensity,often represented by the letter“I”,is the amount of energy flowing per unit time through a unit area that is perpendicular to the direction in which the sound waves are traveling.The document intensity is often defined as I0=10-12Wm-2,and the sound intensity level,represented by the letter“L”,is expressed as .
Suppose the frequency of a sound wave is between 20 Hz and 20 kHz;it can be perceived by human ears only when its sound intensity reaches a certain level.This level of sound intensity is called hearing threshold.Lots of experimental results[7-8]show that under the same volume level,the sound waves at different frequencies are perceived by human ears to be at different sound intensity levels.Thus,the concept of equal loudness contour is introduced in acoustics.
Figure 1 is an equalloudness contour.Of all the sound intensity level curves in the figure,the lowest one(i.e.,the one in dotted lines)is called hearing threshold curve,representing the minimum sound intensity levels required for human ears to perceive the sound waves at different frequencies.From the shape of the hearing threshold curve,we can see that human ears are not as responsive to the sound waves at the frequencies at the two ends as to those at the intermediate frequencies.This feature of hearing threshold is one of the important theoretical bases for audio information hiding.
In some cases,two sound waves come to the ears at almost the same time,but one is stronger than the other.The weaker sound wave willthen be ignored by human ears due to the existence of the stronger one.In other cases,two sound waves at neighboring frequencies co-exist,and one frequency is stronger than the other.Such sound hiding methods as Least Significant Bit(LSB)substitution and band splitting hiding are developed on this L=10lg(d B)characteristic.
◀Figure 1.Equal loudness contour.
The range of voices that can be heard by human ears at different sound intensity levels and different frequencies is called tessitura.In such a range,three aspects of the voice are primarily perceived by human ears in terms of hearing psychology.Those are loundness,pitch and timber,which are expressed as amplitude,frequency and phase,respectively.Human ears are quite sensitive to the changes in amplitude and frequency,but rather less sensitive to the change in phase.As a result,phase becomes an important aspect to be taken into account in audio information hiding.
(1)Invisibility
Invisibility means the information imbedded in the carrier is unlikely to arouse the attention of the illegal third party.To meet the requirement for invisibility,two points have to be taken into account in designing audio information hiding algorithms:one is to take full advantage of the audio characteristics that are insensitive to human ears so that the audios with and without secret information have the same auditory effect;the other is to study and make use of other audio processing techniques to ensure the audio with secret information performs well against spectrum analysis and voice analysis.
(2)Robustness
Robustness in audio information hiding is defined as the ability of the audio to prevent the secret information from losing when in the face of attacks,including modification of the audio file,processing with signal processing techniques and environmental noises.This index is quite important for information hiding.To ensure the robustness of the secret information,those relatively steady audio characteristics should be chosen as operation objects,and the error-correction codes should be used to increase the strength of hiding.By doing so,the secret information can still be restored even after some file operations or signal processing.
(3)Undetectability
Undetectability refers to the characteristic that the secret information imbedded in the audio willnot be detected by the hiding analysis tools.In recent years,great progress has been made in the research of hiding analysis technologies.Now,the information hidden with simple LSB substitution,improved LSBsubstitution or even more complicated information hiding algorithms is likely to be detected.
Therefore,in designing audio information hiding algorithms,undetectability becomes an important aspect to be considered.The core idea of undetectability is to enable the carriers with and without secret information to be consistent in statistics.
(4)Security
Security in audio information hiding means the secret information is difficult for illegal users to restore,or even if the information is restored,its real meaning cannot be read.There are two ways to improve the security of information hiding algorithms.First is to keep the critical parameters of these algorithms(called hiding keys)secret so that illegal users willnot be able to restore the secret information.The second is to apply cryptography into the information hiding technology and encrypt the secret information before imbedding it.
An audio information hiding system is mainly made up of hiding and dehiding models.The hiding model is used to describe the process of imbedding the secret information into the audio carrier while the dehiding one is used to describe the process of restoring the secret information from the carrier.
Figure 2 is a typical hiding model.The steps to imbed the secret information are as follows:
(1)Collect the original secret information which may be in the format of audio,image or text
(2)Encrypt the secret information to enhance security
(3)Perform such operations as error correction and/or interleaving on the secret information to improve robustness
(4)Conduct parallel/serial conversion of the error-corrected and/or interleaved data,as information hiding often goes bit by bit(5)Access the original audio.If a transform-domain hiding algorithm is used,it is required to make corresponding transformation for the carrier
(6)Add the synchronous signals into the carrier before information imbedding.
This step is often necessary to enable accurate blind detection of the hiding information
(7)Imbed the secret information,perform inverse transform of the audio,and transmit the audio carrier with the secret information to the receiver.
In some applications,the model may vary with the special requirements for information hiding.
◀Figure 2.A hiding model.
Figure 3 is a typical dehiding model.The steps to restore the hidden information are as follows:
(1)Receive the audio with secret information from the sender.The audio carrier may be an audio file or an audio stream.
(2)Make corresponding transformation to the audio carrier if the information hiding is processed in transform domain.
(3)Access the synchronous signals and get ready for extracting the hidden information.In most cases,this is a critical step for correct information extraction.
(4)Make parallel/serial conversion,perform the reverse operations of error-correction and/or interleaving,and decrypt the information to get the original secret information.
In some cases,other audio processing steps,such as removing secret information,filtering and smoothing,may be required to ensure the auditory effect of the received audio.
In domains where the information is imbedded,the audio information hiding methods can be divided into the following categories:time-domain,frequency-domain,Discrete Cosine Transform(DCT)-domain,Discrete Wavelet Transform(DWT)-domain and compressed-domain.
The time-domain audio information hiding methods are relatively simple.
They process the secret information directly on the amplitude of the audio signal or the audio file structure.The methods include LSB substitution,improved LSB substitution,echo hiding and audio file structure hiding.
LSB substitution is a hiding method that replaces the least important bits of the audio with the secret information according to certain rules.It can process a large capacity of information and is easy to implement,but its robustness is so poor that it cannot even resist the attacks of weak noises.Besides,it does not perform very well in anti-detection.
Echo hiding is to add the secret information into the audio signals in the form of weak echoes.At the receiver end,it extracts the secret information by identifying the echoes.This method has good invisibility and robustness.
The audio file structure hiding method imbeds the secret information into the unimportant structure segments of an audio file.It is easy to implement but has poor robustness.
Frequency domain audio information hiding is a category of methods that achieve information imbedding by first converting the audio with the Discrete Fourier Transform(DFT)and then processing some frequency domain characteristics of the audio.Therefore,it is also called DFT-domain audio information hiding.The methods in this category include frequency-domain LSB substitution,spread spectrum hiding,phase hiding and band splitting hiding.
The frequency-domain LSB substitution method is similar to that in time domain,characterized by simple operation,large capacity and poor robustness.
Spread spectrum hiding,which comes from the similar idea of spread spectrum communications,spreads the secret information onto the whole audio band in the format of pseudo-noise.With good invisibility,powerful noise-resistance and high practicality,the method is one of the most successful audio information hiding algorithms.
Phase hiding algorithm makes use of the characteristic that human ears are insensitive to the absolute phase,and imbeds the information by changing the phase.Although this method is quite invisible,its noise-resistance capability is poor.
Band splitting hiding splits the band of the audio carrier into many sub-bands,and hide the information into those sub-bands of which human ears are insensitive of based on characteristics such as hearing threshold and auditory masking effect.This method can hide a large capacity of information,and its perceptibility is very good in hearing but not in frequency domain.
DCT-domain hiding converts the audio carrier using DCT,and then makes somechanges to the DCTcoefficients so as to imbed the information.The most important advantage of this method is that it performs excellently in resisting the attacks of analog/digital(A/D)or digital/analog(D/A)convert;so,it is highly valuable in practicalapplication and has been widely used.The main methods in this category include:
Figure 3.▶Dehiding model.
(1)DCT-domain LSB substitution,which is similar to those in time and frequency domains;
(2)DCT-domain phase hiding,which imbeds the information by changing the DCTphase,and is similar to phase hiding in frequency domain having good invisibility.
In DCTdomain,there are many other methods which imbed the information based on the quantities of data in different ranges or the parities of data in different bands.All of them perform well in terms of invisibility and robustness.
The wavelet domain hiding methods get the information imbedded by converting the audio carrier with wavelet transform,and then modifying the wavelet coefficients.Like the methods in DCT domain,these methods perform excellently in resisting the attack of A/D or D/A convert.The following are some main methods in this category:
(1)The wavelet-domain LSB substitution method replaces the least significant bits of wavelet coefficients with the secret information.Its realization is similar to the LSB substitution methods in other domains.
(2)Wavelet-domain energy ratio hiding imbeds the information by comparing and modifying the energies at different wavelet levels or modifying the quantity or parity of coefficients in a specific energy range of a wavelet level.
Other hiding methods in wavelet domain are operated on wavelet coefficients to achieve information imbedding.The hiding methods in wavelet domain are applied more widely than those in other domains.
The compressed domain hiding methods have been introduced in recent years.
The main objective of these methods is to imbed the secret information into the code stream or related coding table of a compression algorithm like Huffman coding table and MIDIcoding table,for instance.These methods are highly invisible,but they are not robust enough to resist the attacks from audio format conversion and signalprocessing.
Audio information hiding technology is an important branch of stenography,which is related to domains such as time,frequency,DCT,DWTand compressed.
The algorithms in time and frequency domains are simple,but are rather weak.The technologies in DCTand DWT domains have good invisibility and robustness.When attacked by A/D or D/A convert,their performances are especially excellent.However,their complexity is so high,making them difficult to implement.Although the methods in compressed domains have good invisibility,they are,however,vulnerable.Audio information hiding is well-applied in many occasions,and has solved many practical problems.
However,when in the face of re-record attacks,there is no satisfactory solution.Precise synchronization information and robust hiding methods are the two key points to solve this problem.For the second key point,solutions can be found in DCTor DWTdomain;while for the first one,there are no satisfactory ideas yet,and more research has to be done.