Journal of
Ravishankar University–B 34 (1), 1928 (2021)




Higher Order Statistics Based
Blind Steg analysis using Deep Learning
S. Bera^{1}, K. Thakur^{2}, P. Vyas^{3}*,
.M.Thakur^{4} and A. Shrivastava^{5}
^{1}VYTPG
College, Durg
^{2}Professor,
PTRSU, Raipur
^{3}Asst.
Professor, Disha College, Raipur
^{4}Developer
Associate, SAP Labs Pvt. Ltd, Banglore
^{5}Asst.
Professor, GEC, Jagdalpur
*Corresponding Author Email: prafullavyas@gmail.com
[Received: 12 February 2021;
Revised: 25 March 2021; Accepted: 01 April 2021]
Abstract: Universal isteganalysis
of grey level JPEG images is addressed by modelling the neighbourhood relationship
of the image coefficients using the higher order statistical method developed by
twostep Markov Transition Probability Matrix (TPM). The implementation of TPM together
with the neighbouring pixel relationship provides a better and comparable detection
results. The detection accuracy is evaluated on the stego image database using eXtreme
Gradient Boosting (XGBoost) with Principal Component Analysis (PCA) on nsF5 and
JUNIWARD hiding techniques. Execution time is also compared for all the
classifiers. The images are taken from Green spun library and Google website eXtreme
Gradient Boosting.
Keywords: Steganography,
Steganalysis, CNN, DWT, DCT.
Introduction
Image
steganography is the art of concealing data in an image and in contrary steganalysis
iis the detection of hidden data in an image. There is rigorous progress in the
JPEG steganography and steganalysis techniques. In the journey of the steganalysis
research, the researchers tried to develop more and more complex hiding process
so that it can be less detectable. So it becomes the requirement to embellish the
steganalysis methodology for increasing the detection capability.
The
complexity of the hiding technique gets increased in the introduction of the content
adaptive steganography which makes more difficult to design detection technique.
In the nonadaptive steganography i.e. in Jsteg (JPG,2020), Outguess (outgues,2020),
F5 (Westfeld et al., 2001), nsF5 (Fridrich et al., 2007)], Model Based (MB) (Song
et al., 2015) and Perturbed Quantization (PQ) (Fridrich et al., 2004) are few techniques
developed and later the content adaptive hiding techniques such as Texure Adaptive
PQ (PQt) (Filler et al., 2011), Model Optimized Distortion (MOD) (Guo et al.,
2014), Uniform Embedding Distortion (UED) (Holub et al., 2013) and JPEG UNIversal
Wavelet Relative Distortion (JUNIWARD) (Chhikara et al., 2018) are being developed.
In all the content adaptive steganography the distortion function is first to find
out and then messages are embedded while minimizing the
distortion function which increases the steganography security. The different content
adaptive steganographic technique used different distortion function istatistical +which
is based on the texture and energy content of the image. The content adaptive steganography
is more secure than nonadaptive steganography. In this paper, our concentration
is on the steganalysis for the content adaptive steganography.
Literature Review
In the history of universal steganalysis,
the trend is to extract the image features which get transformed after information
hiding because of their existing correlation. The features can be obtained either
from the spatial domain or in the domain where image gray level values are transformed
to other values by implementing any specific transformation mathematics. Almost
in all the image steganalysis, it is tried to find out the image statistical features
that represent the statistical variations of the image coefficients. These statistical
features are extracted from various preprocessed images. Various preprocessing
used are finding of Discrete Cosine Transformation (DCT) (Holub et al., 2013), Discrete
Wavelet Transformation (DWT) (Laimeche et al., 2018), Image Segment and Calibration
(Bashkirova et al., 2015), Intra and Inter Block Difference Block from spatial and
transform domain (Karimi et al., 2015) and Short Time Fourier Transform (STFT) (Song
et al., 2015) (Clausi et al., 2005) or Gabor Transform.
The different features in the transform domain
are mathematical statistics of wavelet coefficients (Lyu et al., 2002), higher order
statistics of DCT coefficients with the calibration concept (Fridrich et al,
2004), Markov’s features (Chen et al, 2014), Merged DCT and Markov’s features named
as Cartesian Calibrated Features (CCPEV) (Penvy et al., 2007). In almost all the
cases the feature dimension goes on increasing and the target is to increase the
detection accuracy. In the recent trend in the steganalysis designing, the concentration
is more towards the designing of the projection model which is constrained to the
complex texture region of the image which mainly targets to detect the content adaptive
steganography. The CCJRM (Cartesian Calibrated JPEG Rich Model (CCJRM) is based
on the intra and interblock cooccurrence (Kodovsky et al., 2012) is one of them.
There came a great change in the steganalysis designing with the introduction of
DCTR (Discrete Cosine Transform Residual) (Holub et al., 2014) technique. In DCTR
feature, the convolution of the uncompressed JPEG image is done with every 64 kernels
of the DCT and then by subsampling the residual images the histogram of the quantized
noise residuals are obtained. This technique provides better detection accuracy
than the previous techniques. Again a better technique was introduced i.e. PhaseAware
Projection Model (PHARM) (Holub et al., 2015), which utilizes the pixel residuals
of JPEG images and their phase for an 8 i× 8 grid.
A 2D Gabor filter was introduced for finding the texture of image characteristics
obtained from various scales and orientations and the first and higher order statistical
features are obtained in GFR (Song et. al., 2015) and Gabor Rich Filter (GRF) (Song
et al., 2017). Since the introduction of Gabor filter increases the detection accuracy
of the content adaptive steganography such as JUNIWARD (Denmark et al., 2016).
The accuracy of the JUNIWARD which is contentadaptive JPEG steganographic schemes
is more accurate. Deploying of the Gabor filter in the steganalysis verified to
be a better tool for the scheming of the image steganalysis. +i
The acquired statistical features from the
stego and nonstego images are deployed for the training and the testing of the
adopted classifier so that it developed to be a software based empirical machine
for the. Perception of any secret messages in an unknown media image. Various well
known feasible classifiers are Fischer Linear Discriminator (FLD) (Holub et
al., 2014), Support Vector Machine i(SVM) (Gul
et al., 2013) (Pathak et al., 2014), Ensemble Classifier (Song et al., 2017), Convolutional
Neural Network (CNN) (Bashkirova et al., 2016) and Random Forest (Laimeche et
al., 2018) (Bera et al., 2018) as suggested in the popular research papers. The
referred research paper deduces that the feature dimensions goes on increasing with
the introduction of improved steganalysis technique. Although increasing feature
set increases the detection accuracy but increases the execution time and the memory
cost. So, proper feature selection and reduction skill technique can be is introduced
which reduces the feature dimension without affecting the detection accuracy. Few
of them suggested in the cited research papers are Markov Random Field (MRF) Cliques
(Gul et al., 2013), ITERATIVE BEST Strategy (Kodovsky et. al., 2012), Improved Firefly
AlgorithmDyFA and Fitness function–SVM [Chhikara et al., 2018), Analysis of Variance
(ANOVA) (Laimeche et al., 2018), iArtificial Bee
Colony (ABC) (Mohammadi et al., 2014) and on the basis of highest detection accuracy
[Song et al., 2017). Global Local PSO (GLBPSO) is an improved version of
discrete PSO is applied on Markov features (486Dimensional) for the feature selection
purpose and obtained better results (Kumari et. al., 2017). Binary Bat
Algorithm.(BBA) feature selection technique is implemented on SPAM features.
Various classifiers had been implemented but SVM performed better (Liu et al.,
2019). First Order features, Second Order features, Extended DCT features and
Markov features are extracted from F5 stego images and SVM classifier is
implemented. Six different kernels and four different sampling were used for
finding the detection results. The dataset that had undergone crossvalidation.
The radial kernel and epanechnikov kernel had given very low percentage when
the dataset was not crossvalidated. The linear sampling has always given less
detection percentage. ANOVA gives a better result. (Shankar et al., 2018).
The major challenge of the present work
can be summarized as; Application of a suitable image feature extraction technique
to get better detection performance for the JPEG gray level images along by implementing
efficient machine learning tool. This paper represents the blind steganalysis based
twostep Markov’s TPM on JPEG hiding techniques. In the following sections, the
statistical methods have been discussed for the generation of the image features
and then, the experimental set up has been discussed. Further results and conclusions
are drawn.
Proposed Methodology
Image Features Generation
The difference
of absolute values of two immediately neighbouring BDCT coefficients is highly
concentrated around zero, having a Laplacian like distribution. Almost all the
values are within the 4 to +4 values. So instead of using all the DCT
coefficients for generating the features only coefficients whose values are
within 4 to +4 are considered for the feature calculation. This is known as
thresholding. The feature dimension of TPM depends on the range of the values
selected. The 4 to +4 is the widely used range in the various research works.
In most of the researches, the range of the values is also selected as 3 to
+3. The feature dimension depends on the value of the threshold selected. The
image features here is the TPM of second step which is a higher order
statistical parameter that can be calculated from the difference neighboring
matrix which representing the regional correlation. In this work, the final
image features consists of the TPM for both test image and its calibrated
image. The calibrated image for a test image is having similar statistics
similar to the test image if it is not containing any secret data hidden in it.
The number of feature parameters in the final image features is known as
feature dimension. The dimension image feature depends on the threshold value
selected and the number of the step used in transition probability matrix.
The implemented techniques are explained
in five steps. In the first section, the block image coefficients are discussed
and in the second section the difference neighbouring matrix is introduced. In
the third section, second step Markov’s transition probability matrix based
feature extraction is explained, in the fourth section, generation of
calibrated image is explained and in the fifth section the final image feature
generation is discussed.
Block
Image Coefficients
The two dimensional discrete cosine
transformations are applied on the input graylevel JPEG image and frequency
components of the gray levels of the image are obtained using equation 1. The
block size is 8 x 8. The figure 1 represents the image blocks after
transformation.
The obtained
transform image coefficient matrices have maximum values nearer to zero. The
first coefficients are DC coefficients and rests are middle and high frequency.
Figure 1. Representation of the Discrete Cosine
Transformation in an Image
A.C.
coefficients. The absolute value of the DCT coefficients is used for finding
the neighbouring image coefficients. In the steganographic technique, the DC
coefficients and high frequency components of the DCT coefficients are
untouched while embedding the data but middle frequency component is used for
embedding. So, it is required to find the deviations of the middle frequency
components. The absolute values of the quantized block DCT coefficients
construct the block image coefficients. The reason for taking absolute values
is because these JPEG DCT coefficients can be either positive, or negative, or
zero. So, for findings difference absolute values are required. The DCT
coefficients in general do not obey Gaussian distribution; however, these
coefficients are not statistically
independent of each other. There exists the correlation among the coefficients
among all the four direction of an individual coefficient i.e. the neighbouring
coefficients. The transformed and quantized image block coefficient matrices
are known as image blocks coefficient matrix.
Difference Neighbouring Array
The smoothness, regularity,
continuity and periodicity of the original image get deviates due to data
hiding. The deviations caused due to the data hiding will be more exaggerated
if the differences of the neighbouring elements in the block image coefficients
are find out. For this purpose, four difference neighbouring matrices are
considered which is evaluated from the obtained image block coefficient matrix.
… (2)
In the image block coefficients matrix, let named as
where
and
is the size of the block image coefficients in the horizontal
direction and
is in the vertical direction. The number of blocks for an
image is represented by
. Where
is having the dimension
where
is the number of rows and and
is the number of columns in the image as shown in equation 2.
In the present work, the
dimensions of all the images are 640 x 480 and the size of each block is
. So, in total there is 4,800 number of image coefficient
blocks. There exist strong relationship between the image pixel and its nearest
four neighbours. The four neighbourhood relationship is within an image block
is considered to be strong tool for measuring the pixel relationship. These
difference neighbouring arrays are generated using equation 3 for each block
for the four neighbour pixels shown in figure 2.
… (.3)
If
,
,
and
denote the forward, reverse, up and down difference arrays
respectively. The calculation of the differences of the pixels is within the
same block. The values of the difference neighbouring arrays are restricted in
the range of 4 to +4 value. That means if the values of the arrays are more
than +4, the actual value is limited to +4 and if the values of the arrays are
less than 4, the actual value is limited to 4. So, the values of the
difference neighbouring array are 4, 3, 2, 1, 0, +1, +2, +3, +4. This is
known as thresholding which decreases the complexity and time without much
affecting the detection accuracy.
Figure 2. Four Neighbouring Pixel Arrays
Markov’s Transition Probability Matrix
A random process is a
collection of random variables indexed by some set taking values in some other
set. For any random process the condition is that the current state of the
process is based on the past condition and then the random process is called a
Markov Process. Markov chains were
introduced in 1906 by Andrei Andreyevich Markov (1856–1922) and were named in
his honor.
Let
be a countable set,
. Each
is called a state and
is called the statespace.
Having probability space
. Here
is a set of outcomes,
is a set of subsets of
, and for
,
is the probability of
.
Let the sequence of random variables be
(taking values in
).
A random variable
with values in
is a function
.
A row vector
is called a measure if
for all
.
If
then it is a distribution (or probability measure). We start
with an initial distribution over
, specified by
such
that
for all
and
The special case that with probability 1 we start in state i is denoted
.
Then a transition matrix
with
for all
.
It is a stochastic matrix, meaning that
for all
and
.
(i.e. each row of
is a distribution over
).
is Markov chain with initial distribution
and transition matrix
if for all
and
… (4)
By checking the conditions
(i) and (ii) it can be determined whether a random process be a Markov’s
process or not.
Let
be an event probability of
instant
Then n step transition
probabilities can be represents as
… (5)
If firststep transition
probabilities are consider then probability of going from state at
to state at
is
… .6)
Similarly secondstep transition probabilities are
the probability of going from state
at
, passing through state
at
, and ending at state at
is
…(7).
is the onestep
transition probability matrix.
Now considering the
probability of going from state
at
, passing through
at
and ending at state
at time
. For all the values of
and
the nstep transition probability matrix is
all
all
… (8)
This is known as
ChapmanKolmogorov equations.
Firststep transition
probability matrix is used in several papers and secondstep transition
probability matrix will give the better result because the transition of the
quantized DCT values of the pixel is not only checked for nearest pixel but for
the alternate pixel also. In this work, the Markov’s twostep transition
probability matrix for all four difference neighbouring arrays is used for
obtaining the image features using equation (9). The probability is calculated
for the values
where
and
value is the row and the column number of the image
coefficient matrix.
…(9)
Where i, k, j ϵ { 4,
3, 2, 1, 0, 1, 2, 3, 4} and
The value
of the probability is 1 if condition is satisfied and the value will be zero if
the condition fail.
Let
,
and
are the four
secondstep transition probability matrices using equation (10).
is the forward transition probability matrix,
is the reverse transition probability matrix,
is the upper transition probability matrix and
is for the downward transition probability matrix.
….(10)
The number of elements of the transition probability matrix is (2 x 4+1)
x (2 x 4+1) i.e. 9 x 9= 81. In total, there are 81 x 4=324 elements obtained
from all the four transition probability matrices. These elements are termed as
image features. Further to decrease the feature dimension, the average value is
considered in equation 11 (Shi et al., 2006) and final features are obtained
using equation 12.
… (11)
… (12)
Calibrated Image
is the input stego image. The stego image
is decompressed and then cropped by 4 pixels in each
direction and then recompressed with the same quantization table as
to
obtain represents in figure 3. It is found that most of the
macroscopic features of the original cover image are similar to the calibrated
image obtained from the stego images. So, it is expected that the DCT
coefficients of original cover image and calibrated image is similar and hence
provide the same image features. The cropping and recompression produced an
approximated cover image of the input stego image. In the research introduced
by Jessica Fridrich in 2005, the inclusion of calibration technique increases the
accuracy of detection
Figure 3.
Generation
of Calibrated Image
Experimental Implementation and
Results
In the experiment, 2000 grayscale images
are obtained from Greenspun library available in www.philip.greenspun.com and from
Google website. These images are cropped in size 640x480 and recompressed with quality
factor of 95 which consists cover images. Then stego image database is generated
from the 2000 JPEG images by implementing nsF5 and JUNIWARD JPEG hiding techniques
for the capacity 0.05, 0.1 and 0.2 bpac payloads. All the steganography codes are
downloaded [Binghamton, 2020]. Figure 4 and figure 5 represents the Markov’s
feature for JUNIWARD and nsF5.
Figure 4. Image Features for JUNIWARD of
capacity 0.05 bpac

Figure 5. Image Features for nsF5 of
capacity 0.05 bpac

XGBoost is used for the classification purpose
having the values as: seed=7, test size=0.33. 80% of the dataset is used for training
and rest is used for testing purpose. In Table (1), the obtained
detection accuracy is presented and compared with the competitive techniques. It
is also observed that computation time also get reduced drastically besides
increasing the accuracy. In table (1) the obtained
detection accuracy is presented and compared with the competitive techniques.
Table
1: Comparison of the Proposed Detection Technique
with the Present State of Art
JPEG Steganography

Embedding Capacity
(bpnc/
bpac)

Detection Accuracy
(%)

Computation Time for
Reference Methods

Detection Accuracy
+i(%)

Computation time for
Proposed Method with XGBoost

DCTR.[Holub. et. al,
2014]

GFR.[Song, et. al,
2015]

Proposed Method with
Random Forest (RF).[Swagota, et. al, 2018]

Proposed Method with
XGBoost

JUNIWARD
[Holub et, al, 2013]

0.05

83.73

94.79

264.43

57.8

94.31

23.49

0.1

86.73

95.73

278.23

58.84

95.89

23.88

0.2

94.63

96.52

279.59

59.48

97.79

28.11

nsF5
[Fridrich et. al, 2007]

0.05

87.36

91.79

148.76

50.8

92.42

23.75

0.1

93.05

95.89

159.38

64.823

93.2

26.87

0.2

95.58

96.84

163.22

82.203

95.26

28.28

With the help of the graph 1 it is well
presented that in most of the cases, XGBoost is performing better for the
purpose of the Steganalysis.
Graph 1: Detection
Accuracy with respect to hiding techniques
Conclusion
With the introduction of eXtreme Gradient
Boosting (XGBoost) as classifier and PCA, there is improvement in the detection
accuracy as compared to RF results. The thresholding in the calculation of transition
probability matrices and averaging reduced the computational complexity and also
reduced the feature dimensionality. The proposed technique outperforms JUNIWARD
for all hiding capacities but outperforms in only 0.05 bpac in case of nsF5. +i
Acknowledgments
We appreciate Dr. Ayush Singhal who is a
Postdoctoral Research Fellow of NCBI (National Institute of Health) IN.Maryland
with his academic profile as PhD, M.Sc from the University of Minnesota, Twin Cities,
MN, USA and B.Tech from IIT Roorkee, India for his support
in providing the information about WEKA data mining tool and its applications.
References
Bashkirova, D. (2016). Convolutional neural networks for image
steganalysis. BioNanoScience, 6(3), 246248.
Bera, S., et al. (2018). Performance Analysis of Universal
Steganalysis Based on Higher Order Statistics for Neighbourhood Pixels in
Coimbatore Institute of Information Technology, CiiT International Journal of Fuzzy
System, 10(4), 8591.
Binghamton, retrieved Oct 4, 2020 from
http://dde.binghamton,edu/download.edu/download/syndrome.
Chen, B., et al. (2014). Mixing highdimensional
features for JPEG steganalysis with ensemble classifier. Signal, Image and
Video Processing, 8(8), 14751482.
Chhikara, R. R., et al. (2018). An improved dynamic
discrete firefly algorithm for blind image steganalysis. International Journal
of Machine Learning and Cybernetics, 9(5), 821835.
Clausi, D. A., et al. (2005). Designbased texture
feature fusion using Gabor filters and cooccurrence probabilities. IEEE
Transactions on Image Processing, 14(7), 925936.
Denemark, T. D., et al. (2016). Steganalysis features
for contentadaptive JPEG steganography. IEEE Transactions on Information
Forensics and Security, 11(8), 17361746.
Filler, T., et
al. (2011). Design of adaptive steganographic schemes for digital images. In
Media Watermarking, Security, and Forensics III. International Society for
Optics and Photonics. (Vol. 7880, p. 78800F).
Fridrich, J. (2004). Featurebased steganalysis for
JPEG images and its implications for future design of steganographic schemes.
In International Workshop on Information Hiding. Springer, Berlin, Heidelberg, 6781.
Fridrich, J., et al. (2004). Perturbed quantization
steganography with wet paper codes. In Proceedings of the 2004 workshop on
Multimedia and security, 415.
Fridrich, J., et al. (2007,). Statistically
undetectable jpeg steganography: dead ends challenges, and opportunities. In
Proceedings of the 9th workshop on Multimedia & security, 314.
Gul, G., et al. (2013). JPEG image steganalysis using
multivariate PDF estimates with MRF cliques. IEEE transactions on information
forensics and security, 8(3), 578587.
Guo, L., et al. (2014). Uniform embedding for
efficient JPEG steganography. IEEE transactions on Information Forensics and
Security, 9(5), 814825.
Holub, V., et al. (2013). Digital image steganography
using universal distortion. In Proceedings of the first ACM workshop on
Information hiding and multimedia security, 5968.
Holub, V., et al. (2014). Lowcomplexity features for
JPEG steganalysis using undecimated DCT. IEEE Transactions on Information
Forensics and Security, 10(2), 219228.
Holub, V., et al. (2015). Phaseaware projection model
for steganalysis of JPEG images. In Media Watermarking, Security, and Forensics
2015 (Vol. 9409, p. 94090T). International Society for Optics and Photonics.
JPG steganography, retrieved Sept. 29,2020 from ihttp://www.guillermito2.net/stegano/jsteg.
Karimi, H., et al. (2015). Steganalysis of JPEG images
using enhanced neighbouring joint density features. IET Image Processing, 9(7),
545552.
Kumari, M., et al. (2017). Blind image steganalysis
using neural networks and wrapper feature selection. In 2017 International
Conference on Computing, Communication and Automation (ICCCA), 10651069.
Kodovský, J., et al. (2012). Steganalysis of JPEG
images using rich models. In Media Watermarking, Security, and Forensics 2012).
International Society for Optics and Photonics, 8303, 83030A.
Laimeche, L., et al. (2018). A new feature extraction
scheme in wavelet transform for stego image classification. Evolving Systems,
9(3), 181194.
Liu, F., et al. (2019). Feature selection for image
steganalysis using binary bat algorithm. IEEE Access, 8, 42444249.
Lyu, S., et al. (2002). Detecting hidden messages
using higherorder statistics and support vector machines. In International
Workshop on information hiding. Springer, Berlin, Heidelberg, 340354.
Mohammadi, F. G., et al. (2014). Image steganalysis using
a bee colony based feature selection algorithm. Engineering Applications of
Artificial Intelligence, 31, 3543.
Outgues, retrieved Oct.3,2020 from http://www.outgues.org.
Pathak, P., et al. (2014). Blind Image Steganalysis of
JPEG images using feature extraction through the process of dilation. Digital
Investigation, 11(1), 6777.
PCA, retrieved Oct 10, 2020, from
https://en.wikipedia.org/wiki/Principal_component_analysis.
Pevny, T., et al. (2007). Merging Markov and DCT
features for multiclass JPEG steganalysis. In Security, Steganography, and
Watermarking of Multimedia Contents IX (Vol. 6505, p. 650503). International
Society for Optics and Photonics.
Shankar, D. D., et al. (2018). Result Analysis of
CrossValidation on low embedding Featurebased Blind Steganalysis of 25
percent on JPEG images using SVM. In 2018 International Conference on Circuits
and Systems in Digital Enterprise Technology (ICCSDET), 15.
Song, X., et al. (2015). Steganalysis of perturbed
quantization steganography based on the enhanced histogram features. Multimedia
Tools and Applications, 74(24), 1104511071.
Song, X., et al. (2015, June). Steganalysis of
adaptive JPEG steganography using 2D Gabor filters. In Proceedings of the 3rd
ACM workshop on information hiding and multimedia security (pp. 1523).
Song, X., et al. (2017). 2D Gabor filtersbased
steganalysis of contentadaptive JPEG steganography. Multimedia Tools and
Applications, 76(24), 2639126419.
Bera, S., et al. (2018). “Performance Analysis of
Universal Steganalysis Based on Higher Order Statistics for Neighbourhood
Pixels” in Coimbatore Institute of Information Technology, CiiT International
Journal of Fuzzy System, 10(4) 8591.
Westfeld, A., (2001). F5—a steganographic algorithm.
In International workshop on information hiding. Springer, Berlin, Heidelberg, 289302.