Article in HTML

Author(s): S. Bera, K. Thakur, P. Vyas, .M.Thakur, A. Shrivastava


Address: VYTPG College, Durg
Professor, PTRSU, Raipur
Asst. Professor, Disha College, Raipur
Developer Associate, SAP Labs Pvt. Ltd, Banglore
Asst. Professor, GEC, Jagdalpur

Published In:   Volume - 34,      Issue - 1,     Year - 2021

Cite this article:
Bera et al. (2021). Higher Order Statistics Based Blind Steg analysis using Deep Learning. Journal of Ravishankar University (Part-B: Science), 34(1), pp. 19-28.


Journal of Ravishankar University–B 34 (1), 19-28 (2021)



Higher Order Statistics Based Blind Steg analysis using Deep Learning

S. Bera1, K. Thakur2, P. Vyas3*, .M.Thakur4 and A. Shrivastava5

1VYTPG College, Durg

2Professor, PTRSU, Raipur

3Asst. Professor, Disha College, Raipur

4Developer Associate, SAP Labs Pvt. Ltd, Banglore

5Asst. Professor, GEC, Jagdalpur


*Corresponding Author Email:

[Received: 12 February 2021; Revised: 25 March 2021; Accepted: 01 April 2021]

Abstract: Universal isteganalysis of grey level JPEG images is addressed by modelling the neighbourhood relationship of the image coefficients using the higher order statistical method developed by two-step Markov Transition Probability Matrix (TPM). The implementation of TPM together with the neighbouring pixel relationship provides a better and comparable detection results. The detection accuracy is evaluated on the stego image database using eXtreme Gradient Boosting (XGBoost) with Principal Component Analysis (PCA) on nsF5 and JUNIWARD hiding techniques. Execution time is also compared for all the classifiers. The images are taken from Green spun library and Google website- eXtreme Gradient Boosting.

Keywords: Steganography, Steganalysis, CNN, DWT, DCT.


Image steganography is the art of concealing data in an image and in contrary steganalysis iis the detection of hidden data in an image. There is rigorous progress in the JPEG steganography and steganalysis techniques. In the journey of the steganalysis research, the researchers tried to develop more and more complex hiding process so that it can be less detectable. So it becomes the requirement to embellish the steganalysis methodology for increasing the detection capability.

The complexity of the hiding technique gets increased in the introduction of the content adaptive steganography which makes more difficult to design detection technique. In the non-adaptive steganography i.e. in Jsteg (JPG,2020), Outguess (outgues,2020), F5 (Westfeld et al., 2001), nsF5 (Fridrich et al., 2007)], Model Based (MB) (Song et al., 2015) and Perturbed Quantization (PQ) (Fridrich et al., 2004) are few techniques developed and later the content adaptive hiding techniques such as Texure Adaptive PQ (PQt) (Filler et al., 2011), Model Optimized Distortion (MOD) (Guo et al., 2014), Uniform Embedding Distortion (UED) (Holub et al., 2013) and JPEG UNIversal Wavelet Relative Distortion (JUNIWARD) (Chhikara et al., 2018) are being developed. In all the content adaptive steganography the distortion function is first to find out and then messages are embedded while minimizing the distortion function which increases the steganography security. The different content adaptive steganographic technique used different distortion function istatistical +which is based on the texture and energy content of the image. The content adaptive steganography is more secure than non-adaptive steganography. In this paper, our concentration is on the steganalysis for the content adaptive steganography.

Literature Review

In the history of universal steganalysis, the trend is to extract the image features which get transformed after information hiding because of their existing correlation. The features can be obtained either from the spatial domain or in the domain where image gray level values are transformed to other values by implementing any specific transformation mathematics. Almost in all the image steganalysis, it is tried to find out the image statistical features that represent the statistical variations of the image coefficients. These statistical features are extracted from various pre-processed images. Various pre-processing used are finding of Discrete Cosine Transformation (DCT) (Holub et al., 2013), Discrete Wavelet Transformation (DWT) (Laimeche et al., 2018), Image Segment and Calibration (Bashkirova et al., 2015), Intra and Inter Block Difference Block from spatial and transform domain (Karimi et al., 2015) and Short Time Fourier Transform (STFT) (Song et al., 2015) (Clausi et al., 2005) or Gabor Transform.

The different features in the transform domain are mathematical statistics of wavelet coefficients (Lyu et al., 2002), higher order statistics of DCT coefficients with the calibration concept (Fridrich et al, 2004), Markov’s features (Chen et al, 2014), Merged DCT and Markov’s features named as Cartesian Calibrated Features (CCPEV) (Penvy et al., 2007). In almost all the cases the feature dimension goes on increasing and the target is to increase the detection accuracy. In the recent trend in the steganalysis designing, the concentration is more towards the designing of the projection model which is constrained to the complex texture region of the image which mainly targets to detect the content adaptive steganography. The CC-JRM (Cartesian Calibrated JPEG Rich Model (CC-JRM) is based on the intra and inter-block co-occurrence (Kodovsky et al., 2012) is one of them. There came a great change in the steganalysis designing with the introduction of DCTR (Discrete Cosine Transform Residual) (Holub et al., 2014) technique. In DCTR feature, the convolution of the uncompressed JPEG image is done with every 64 kernels of the DCT and then by subsampling the residual images the histogram of the quantized noise residuals are obtained. This technique provides better detection accuracy than the previous techniques. Again a better technique was introduced i.e. Phase-Aware Projection Model (PHARM) (Holub et al., 2015), which utilizes the pixel residuals of JPEG images and their phase for an 8 i× 8 grid. A 2D Gabor filter was introduced for finding the texture of image characteristics obtained from various scales and orientations and the first and higher order statistical features are obtained in GFR (Song et. al., 2015) and Gabor Rich Filter (GRF) (Song et al., 2017). Since the introduction of Gabor filter increases the detection accuracy of the content adaptive steganography such as J-UNIWARD (Denmark et al., 2016). The accuracy of the JUNIWARD which is content-adaptive JPEG steganographic schemes is more accurate. Deploying of the Gabor filter in the steganalysis verified to be a better tool for the scheming of the image steganalysis. +i

The acquired statistical features from the stego and non-stego images are deployed for the training and the testing of the adopted classifier so that it developed to be a software based empirical machine for the. Perception of any secret messages in an unknown media image. Various well known feasible classifiers are Fischer Linear Discriminator (FLD) (Holub et al., 2014), Support Vector Machine i(SVM) (Gul et al., 2013) (Pathak et al., 2014), Ensemble Classifier (Song et al., 2017), Convolutional Neural Network (CNN) (Bashkirova et al., 2016) and Random Forest (Laimeche et al., 2018) (Bera et al., 2018) as suggested in the popular research papers. The referred research paper deduces that the feature dimensions goes on increasing with the introduction of improved steganalysis technique. Although increasing feature set increases the detection accuracy but increases the execution time and the memory cost. So, proper feature selection and reduction skill technique can be is introduced which reduces the feature dimension without affecting the detection accuracy. Few of them suggested in the cited research papers are Markov Random Field (MRF) Cliques (Gul et al., 2013), ITERATIVE BEST Strategy (Kodovsky et. al., 2012), Improved Firefly Algorithm-DyFA and Fitness function–SVM [Chhikara et al., 2018), Analysis of Variance (ANOVA) (Laimeche et al., 2018), iArtificial Bee Colony (ABC) (Mohammadi et al., 2014) and on the basis of highest detection accuracy [Song et al., 2017). Global Local PSO (GLBPSO) is an improved version of discrete PSO is applied on Markov features (486-Dimensional) for the feature selection purpose and obtained better results (Kumari et. al., 2017). Binary Bat Algorithm.(BBA) feature selection technique is implemented on SPAM features. Various classifiers had been implemented but SVM performed better (Liu et al., 2019). First Order features, Second Order features, Extended DCT features and Markov features are extracted from F5 stego images and SVM classifier is implemented. Six different kernels and four different sampling were used for finding the detection results. The dataset that had undergone cross-validation. The radial kernel and epanechnikov kernel had given very low percentage when the dataset was not cross-validated. The linear sampling has always given less detection percentage. ANOVA gives a better result. (Shankar et al., 2018).

The major challenge of the present work can be summarized as; Application of a suitable image feature extraction technique to get better detection performance for the JPEG gray level images along by implementing efficient machine learning tool. This paper represents the blind steganalysis based two-step Markov’s TPM on JPEG hiding techniques. In the following sections, the statistical methods have been discussed for the generation of the image features and then, the experimental set up has been discussed. Further results and conclusions are drawn.

Proposed Methodology 

Image Features Generation

The difference of absolute values of two immediately neighbouring BDCT coefficients is highly concentrated around zero, having a Laplacian like distribution. Almost all the values are within the -4 to +4 values. So instead of using all the DCT coefficients for generating the features only coefficients whose values are within -4 to +4 are considered for the feature calculation. This is known as thresholding. The feature dimension of TPM depends on the range of the values selected. The -4 to +4 is the widely used range in the various research works. In most of the researches, the range of the values is also selected as -3 to +3. The feature dimension depends on the value of the threshold selected. The image features here is the TPM of second step which is a higher order statistical parameter that can be calculated from the difference neighboring matrix which representing the regional correlation. In this work, the final image features consists of the TPM for both test image and its calibrated image. The calibrated image for a test image is having similar statistics similar to the test image if it is not containing any secret data hidden in it. The number of feature parameters in the final image features is known as feature dimension. The dimension image feature depends on the threshold value selected and the number of the step used in transition probability matrix.

The implemented techniques are explained in five steps. In the first section, the block image coefficients are discussed and in the second section the difference neighbouring matrix is introduced. In the third section, second step Markov’s transition probability matrix based feature extraction is explained, in the fourth section, generation of calibrated image is explained and in the fifth section the final image feature generation is discussed.

Block Image Coefficients

The two dimensional discrete cosine transformations are applied on the input graylevel JPEG image and frequency components of the gray levels of the image are obtained using equation 1. The block size is 8 x 8. The figure 1 represents the image blocks after transformation.

The obtained transform image coefficient matrices have maximum values nearer to zero. The first coefficients are DC coefficients and rests are middle and high frequency.

Figure 1. Representation of the Discrete Cosine Transformation in an Image

A.C. coefficients. The absolute value of the DCT coefficients is used for finding the neighbouring image coefficients. In the steganographic technique, the DC coefficients and high frequency components of the DCT coefficients are untouched while embedding the data but middle frequency component is used for embedding. So, it is required to find the deviations of the middle frequency components. The absolute values of the quantized block DCT coefficients construct the block image coefficients. The reason for taking absolute values is because these JPEG DCT coefficients can be either positive, or negative, or zero. So, for findings difference absolute values are required. The DCT coefficients in general do not obey Gaussian distribution; however, these

coefficients are not statistically independent of each other. There exists the correlation among the coefficients among all the four direction of an individual coefficient i.e. the neighbouring coefficients. The transformed and quantized image block coefficient matrices are known as image blocks coefficient matrix.

Difference Neighbouring Array

The smoothness, regularity, continuity and periodicity of the original image get deviates due to data hiding. The deviations caused due to the data hiding will be more exaggerated if the differences of the neighbouring elements in the block image coefficients are find out. For this purpose, four difference neighbouring matrices are considered which is evaluated from the obtained image block coefficient matrix.

… (2)

In the image block coefficients matrix, let named as  where  and is the size of the block image coefficients in the horizontal direction and is in the vertical direction. The number of blocks for an image is represented by . Where is having the dimension where is the number of rows and and is the number of columns in the image as shown in equation 2.

In the present work, the dimensions of all the images are 640 x 480 and the size of each block is . So, in total there is 4,800 number of image coefficient blocks. There exist strong relationship between the image pixel and its nearest four neighbours. The four neighbourhood relationship is within an image block is considered to be strong tool for measuring the pixel relationship. These difference neighbouring arrays are generated using equation 3 for each block for the four neighbour pixels shown in figure 2.

… (.3)

If , , and denote the forward, reverse, up and down difference arrays respectively. The calculation of the differences of the pixels is within the same block. The values of the difference neighbouring arrays are restricted in the range of -4 to +4 value. That means if the values of the arrays are more than +4, the actual value is limited to +4 and if the values of the arrays are less than -4, the actual value is limited to -4. So, the values of the difference neighbouring array are -4, -3, -2, -1, 0, +1, +2, +3, +4. This is known as thresholding which decreases the complexity and time without much affecting the detection accuracy.

Figure 2. Four Neighbouring Pixel Arrays

Markov’s Transition Probability Matrix

A random process is a collection of random variables indexed by some set taking values in some other set. For any random process the condition is that the current state of the process is based on the past condition and then the random process is called a Markov Process. Markov chains were introduced in 1906 by Andrei Andreyevich Markov (1856–1922) and were named in his honor.

Let be a countable set, . Each is called a state and is called the state-space.

Having probability space . Here  is a set of outcomes, is a set of subsets of , and for , is the probability of .

Let the sequence of random variables be (taking values in ).

A random variable with values in is a function .

A row vector is called a measure if  for all .

 If then it is a distribution (or probability measure). We start with an initial distribution over , specified by such  that for all and

The special case that with probability 1 we start in state i is denoted .

Then a transition matrix with  for all .

It is a stochastic matrix, meaning that for all and .

 (i.e. each row of is a distribution over ).

is Markov chain with initial distribution and transition matrix if for all


… (4)

     By checking the conditions (i) and (ii) it can be determined whether a random process be a Markov’s process or not.

Let be an event probability of instant

Then n- step transition probabilities can be represents as

… (5)

If first-step transition probabilities are consider then probability of going from state at  to state at is

… .6)

Similarly second-step transition probabilities are the probability of going from state at

, passing through state at , and ending at state at is


 is the one-step transition probability matrix. 

Now considering the probability of going from state at , passing through at  and ending at state at time . For all the values of and the n-step transition probability matrix is

                        all    all … (8)

This is known as Chapman-Kolmogorov equations.

First-step transition probability matrix is used in several papers and second-step transition probability matrix will give the better result because the transition of the quantized DCT values of the pixel is not only checked for nearest pixel but for the alternate pixel also. In this work, the Markov’s two-step transition probability matrix for all four difference neighbouring arrays is used for obtaining the image features using equation (9). The probability is calculated for the values where and value is the row and the column number of the image coefficient matrix.


Where i, k, j ϵ { -4, -3, -2, -1, 0, 1, 2, 3, 4} and

The value of the probability is 1 if condition is satisfied and the value will be zero if the condition fail.

Let , and  are the four second-step transition probability matrices using equation (10). is the forward transition probability matrix, is the reverse transition probability matrix, is the upper transition probability matrix and is for the downward transition probability matrix.


The number of elements of the transition probability matrix is (2 x 4+1) x (2 x 4+1) i.e. 9 x 9= 81. In total, there are 81 x 4=324 elements obtained from all the four transition probability matrices. These elements are termed as image features. Further to decrease the feature dimension, the average value is considered in equation 11 (Shi et al., 2006) and final features are obtained using equation 12.

… (11)

… (12)

Calibrated Image

is the input stego image. The stego image is decompressed and then cropped by 4 pixels in each direction and then recompressed with the same quantization table as to obtain represents in figure 3. It is found that most of the macroscopic features of the original cover image are similar to the calibrated image obtained from the stego images. So, it is expected that the DCT coefficients of original cover image and calibrated image is similar and hence provide the same image features. The cropping and recompression produced an approximated cover image of the input stego image. In the research introduced by Jessica Fridrich in 2005, the inclusion of calibration technique increases the accuracy of detection








Figure 3. Generation of Calibrated Image

Experimental Implementation and Results

In the experiment, 2000 grayscale images are obtained from Greenspun library available in and from Google website. These images are cropped in size 640x480 and recompressed with quality factor of 95 which consists cover images. Then stego image database is generated from the 2000 JPEG images by implementing nsF5 and JUNIWARD JPEG hiding techniques for the capacity 0.05, 0.1 and 0.2 bpac payloads. All the steganography codes are downloaded [Binghamton, 2020]. Figure 4 and figure 5 represents the Markov’s feature for JUNIWARD and nsF5.

Figure 4. Image Features for JUNIWARD of
capacity 0.05 bpac

Figure 5. Image Features for nsF5 of
capacity 0.05 bpac

XGBoost is used for the classification purpose having the values as: seed=7, test size=0.33. 80% of the dataset is used for training and rest is used for testing purpose.  In Table (1), the obtained detection accuracy is presented and compared with the competitive techniques. It is also observed that computation time also get reduced drastically besides increasing the accuracy. In table (1) the obtained detection accuracy is presented and compared with the competitive techniques.

Table 1: Comparison of the Proposed Detection Technique with the Present State of Art

JPEG Steganography

Embedding Capacity (bpnc/


Detection Accuracy

Computation Time for Reference Methods

Detection Accuracy

Computation time for Proposed Method with XGBoost

DCTR.[Holub. et. al, 2014]

GFR.[Song, et. al, 2015]

Proposed Method with Random Forest (RF).[Swagota, et. al, 2018]

Proposed Method with XGBoost


[Holub et, al, 2013]























[Fridrich et. al, 2007]






















With the help of the graph 1 it is well presented that in most of the cases, XGBoost is performing better for the purpose of the Steganalysis. 

Graph 1: Detection Accuracy with respect to hiding techniques


With the introduction of eXtreme Gradient Boosting (XGBoost) as classifier and PCA, there is improvement in the detection accuracy as compared to RF results. The thresholding in the calculation of transition probability matrices and averaging reduced the computational complexity and also reduced the feature dimensionality. The proposed technique outperforms JUNIWARD for all hiding capacities but outperforms in only 0.05 bpac in case of nsF5. +i


We appreciate Dr. Ayush Singhal who is a Postdoctoral Research Fellow of NCBI (National Institute of Health) IN.Maryland with his academic profile as PhD, M.Sc from the University of Minnesota, Twin Cities, MN, USA and B.Tech from IIT Roorkee, India for his support in providing the information about WEKA data mining tool and its applications.


Bashkirova, D. (2016). Convolutional neural networks for image steganalysis. BioNanoScience, 6(3), 246-248.

Bera, S., et al. (2018). Performance Analysis of Universal Steganalysis Based on Higher Order Statistics for Neighbourhood Pixels in Coimbatore Institute of Information Technology, CiiT International Journal of Fuzzy System, 10(4), 85-91.

Binghamton, retrieved Oct 4, 2020 from http://dde.binghamton,edu/

Chen, B., et al. (2014). Mixing high-dimensional features for JPEG steganalysis with ensemble classifier. Signal, Image and Video Processing, 8(8), 1475-1482.

Chhikara, R. R., et al. (2018). An improved dynamic discrete firefly algorithm for blind image steganalysis. International Journal of Machine Learning and Cybernetics, 9(5), 821-835.

Clausi, D. A., et al. (2005). Design-based texture feature fusion using Gabor filters and co-occurrence probabilities. IEEE Transactions on Image Processing, 14(7), 925-936.

Denemark, T. D., et al. (2016). Steganalysis features for content-adaptive JPEG steganography. IEEE Transactions on Information Forensics and Security, 11(8), 1736-1746.

 Filler, T., et al. (2011). Design of adaptive steganographic schemes for digital images. In Media Watermarking, Security, and Forensics III. International Society for Optics and Photonics. (Vol. 7880, p. 78800F).

Fridrich, J. (2004). Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. In International Workshop on Information Hiding. Springer, Berlin, Heidelberg, 67-81.

Fridrich, J., et al. (2004). Perturbed quantization steganography with wet paper codes. In Proceedings of the 2004 workshop on Multimedia and security, 4-15.

Fridrich, J., et al. (2007,). Statistically undetectable jpeg steganography: dead ends challenges, and opportunities. In Proceedings of the 9th workshop on Multimedia & security, 3-14.

Gul, G., et al. (2013). JPEG image steganalysis using multivariate PDF estimates with MRF cliques. IEEE transactions on information forensics and security, 8(3), 578-587.

Guo, L., et al. (2014). Uniform embedding for efficient JPEG steganography. IEEE transactions on Information Forensics and Security, 9(5), 814-825.

Holub, V., et al. (2013). Digital image steganography using universal distortion. In Proceedings of the first ACM workshop on Information hiding and multimedia security, 59-68.

Holub, V., et al. (2014). Low-complexity features for JPEG steganalysis using undecimated DCT. IEEE Transactions on Information Forensics and Security, 10(2), 219-228.

Holub, V., et al. (2015). Phase-aware projection model for steganalysis of JPEG images. In Media Watermarking, Security, and Forensics 2015 (Vol. 9409, p. 94090T). International Society for Optics and Photonics.

JPG steganography, retrieved Sept. 29,2020 from  i

Karimi, H., et al. (2015). Steganalysis of JPEG images using enhanced neighbouring joint density features. IET Image Processing, 9(7), 545-552.

Kumari, M., et al. (2017). Blind image steganalysis using neural networks and wrapper feature selection. In 2017 International Conference on Computing, Communication and Automation (ICCCA), 1065-1069.

Kodovský, J., et al. (2012). Steganalysis of JPEG images using rich models. In Media Watermarking, Security, and Forensics 2012). International Society for Optics and Photonics, 8303, 83030A.

Laimeche, L., et al. (2018). A new feature extraction scheme in wavelet transform for stego image classification. Evolving Systems, 9(3), 181-194.

Liu, F., et al. (2019). Feature selection for image steganalysis using binary bat algorithm. IEEE Access, 8, 4244-4249.

Lyu, S., et al. (2002). Detecting hidden messages using higher-order statistics and support vector machines. In International Workshop on information hiding. Springer, Berlin, Heidelberg, 340-354.

Mohammadi, F. G., et al. (2014). Image steganalysis using a bee colony based feature selection algorithm. Engineering Applications of Artificial Intelligence, 31, 35-43.

Outgues, retrieved Oct.3,2020 from

Pathak, P., et al. (2014). Blind Image Steganalysis of JPEG images using feature extraction through the process of dilation. Digital Investigation, 11(1), 67-77.

PCA, retrieved Oct 10, 2020, from

Pevny, T., et al. (2007). Merging Markov and DCT features for multi-class JPEG steganalysis. In Security, Steganography, and Watermarking of Multimedia Contents IX (Vol. 6505, p. 650503). International Society for Optics and Photonics.

Shankar, D. D., et al. (2018). Result Analysis of Cross-Validation on low embedding Feature-based Blind Steganalysis of 25 percent on JPEG images using SVM. In 2018 International Conference on Circuits and Systems in Digital Enterprise Technology (ICCSDET), 1-5.

Song, X., et al. (2015). Steganalysis of perturbed quantization steganography based on the enhanced histogram features. Multimedia Tools and Applications, 74(24), 11045-11071.

Song, X., et al. (2015, June). Steganalysis of adaptive JPEG steganography using 2D Gabor filters. In Proceedings of the 3rd ACM workshop on information hiding and multimedia security (pp. 15-23).

Song, X., et al. (2017). 2D Gabor filters-based steganalysis of content-adaptive JPEG steganography. Multimedia Tools and Applications, 76(24), 26391-26419.

Bera, S., et al. (2018). “Performance Analysis of Universal Steganalysis Based on Higher Order Statistics for Neighbourhood Pixels” in Coimbatore Institute of Information Technology, CiiT International Journal of Fuzzy System, 10(4) 85-91.

Westfeld, A., (2001). F5—a steganographic algorithm. In International workshop on information hiding. Springer, Berlin, Heidelberg, 289-302.


Related Images:

Recomonded Articles:

Author(s): S. Bera; K. Thakur; P. Vyas; .M.Thakur; A. Shrivastava

DOI: 10.52228/JRUB.2021-34-1-3         Access: Open Access Read More

Author(s): Narendra K. Garg; A.K. Bansal

DOI:         Access: Open Access Read More

Author(s): M.G. Roymon; Rashmi Zankyani; Mabel Varghese

DOI:         Access: Open Access Read More

Author(s): B K Senapati; P K Panigrahi

DOI:         Access: Open Access Read More

Author(s): Vikrant Singh Thakur; Kavita Thakur; Shubhrata Gupta

DOI:         Access: Open Access Read More