您的当前位置：首页 Ensembles of neural networks for soft classification of remote sensing images

Ensembles of neural networks for soft classification of remote sensing images

来源：华佗小知识

ENSEMBLES OF NEURAL NETWORKS

FOR SOFT CLASSIFICATION OF REMOTE-SENSING IMAGES

Giorgio Giacinto and Fabio Roli

Dept. of Electrical and Electronic Eng., University of Cagliari, ITALY

Piazza D'Armi, 09123, Cagliari, Italy

Phone: +39-70-6755874 Fax: +39-70-6755900 e-mail {giacinto, roli}@diee.unica.it

ABSTRACT: In recent years, the remote-sensing community has became very interested in applying neural networks toimage classification and in comparing neural networks performances with the ones of classical statistical methods.These experimental comparisons pointed out that no single classification algorithm can be regarded as a “panacea”. Inthis paper, we propose the use of “ensembles” of neural networks as an alternative approach based on the exploitationof the complementary characteristics of different neural classifiers. Classification results provided by neural networkscontained in these ensembles are “merged” according to statistical combination methods. In addition, the use ofensembles formed by neural and statistical classifiers is considered. Experimental results on a multisensor remote-sensing data set are reported that point out that the use of neural networks ensembles can constitute a valid alternative tothe development of new neural classifiers “more complex” than the present ones. In particular, we show that thecombination of results provided by statistical and neural algorithms provides classification accuracies better than theones obtained by single classifiers after long “designing” phases.

1. INTRODUCTION

Classification of remote-sensing images has traditionally been performed by classical statistical methods (e.g., bayesianand k-nearest-neighbor classifiers). In recent years, the remote-sensing community has become very interested inapplying neural networks to image classification and in comparing neural networks performances with the ones ofstatistical methods [Benediktsson et al. 1990, Roli et al. 1996]. This interest has been also motivated from thepotentialities offered by neural networks to provide “soft classifications” that are necessary to manage uncertaintycontained in remote-sensing data (e.g., uncertainty related to “mixed” pixels) [Binagi et al. eds. 1995]. Experimentalcomparisons among neural and statistical classifiers reported in the remote-sensing literature pointed out that no singleclassification algorithm can be regarded as a “panacea”. In particular, it seems to us that the reported superiority of onealgorithm over the other strongly depends on the selected data set and on the effort devoted to the “designing phase” ofeach classifier (i.e., designing of classifier “architecture”, choice of learning parameters, etc.). As an example, thesuperiority of the k-nearest-neighbor classifier (KNN) over the Multilayer Perceptron (MLP) neural network, or viceversa, strongly depends on the efforts devoted to the related designing phases: selection of the appropriate “k” value andof the appropriate “distance measure” for the KNN classifier, selection of the appropriate architecture and suitablelearning parameters for the MLP. In addition, according to our experience, any algorithm may reach a certain level ofclassification accuracy through a reasonable “designing” effort. Further improvements often require an increasinglyexpensive designing phase [Bruzzone et al. 1997, Roli et al. 1996, Serpico et al. 1995].

In spite of the above considerations, the most of the present research work on remote-sensing image classification isfocused on the development of new statistical and neural classification algorithms. No emphasis is given to theexploitation of the complementary characteristics of existing algorithms by suitable techniques that “combine” resultsprovided by each algorithm. Few papers addressed the need of integrating classification results provided by differentalgorithms [Kanellopoulos et al. 1993]. In addition, no investigation has been carried out to evaluate the benefits ofcombining different classification algorithms in order to reduce the complexity of the designing phase (i.e., in order toobtain the desired classification accuracy with a reduced designing effort).

In this paper, we propose the use of “ensembles” of neural networks as an alternative approach based on theexploitation of the complementary characteristics of different neural classifiers. Ensembles of neural and statisticalclassifiers are also considered. We report experimental results on a multisensor remote-sensing data set that prove thatthe use of neural networks ensembles can constitute a valid alternative to the development of new algorithms “morecomplex” than the present ones. In particular, we show that the combination of results provided by statistical and neuralalgorithms provides classification accuracies better than the ones obtained by single classifiers after long “designing”phases.

2. METHODS FOR COMBINING MULTIPLE CLASSIFIERS

In the following, we propose various methods that can be used to combine results provided by an ensemble ofclassification algorithms (e.g., an ensemble of neural networks). In Section 2.1, some combination methods previouslyproposed in the handwriting recognition field are described [Xu et al. 1992]. These methods assume that classifierscontained in the ensemble behave “independently”, that is, they make uncorrelated classification mistakes. In Section2.2, a combination method based on a “metaclassification” paradigm is proposed that avoids the assumption ofindependent mistakes.

2.1 STATISTICAL COMBINATION METHODS

Let us assume an image classification problem with M “data classes”. Each class represents a set of specific patterns.Each pattern is characterized by a feature vector X. In addition, let us assume that K different classification algorithmsare available to solve the classification problem at hand. Therefore, we can consider ensembles formed by “k” differentclassifiers (k=1..K). In order to exploit the complementary characteristics of available classifiers, the statisticalcombination methods described in the following sections can be used [Xu et al. 1992].2.1.1 Combination by Voting Principle

Let us assume that each classifier contained in a given ensemble performs a “hard” classification assigning each inputpattern to one of the M data classes. A simple method to combine results provided by different classifiers is to interpreteach classification result as a “vote” for one of the M data classes. Consequently, the data class that receives a numberof votes higher than a prefixed threshold is taken as the “final” classification. Typically, the threshold is the half of thenumber of the considered classifiers (“majority rule”). More conservative rules can be adopted (e.g., the “unison” rule).2.1.2 Combination by Bayesian Average

It is well known that some classification algorithms are able to provide an estimation of the posterior probability that aninput pattern belongs to the data class ωi :XP( ∈ ωi / X ) i=1..MX(1)

For example, estimates of the postprobabilities are provided by multilayer perceptrons. It is straightforward to computepostprobabilities for the KNN classifier.

A natural way of combining the estimates provided by “K” different classifiers is to use the following average value:

Pav(X∈ωi/X)=∑Pk(X∈ωi/X) i=1..M

Kk=1

(2)

The final classification is taken according to the Bayesian criterion, that is, the input pattern X is assigned to the dataclass for which Pav( ∈ ωi / X ) is maximum.X2.2.2 Combination by Belief Functions

This method exploits the prior knowledge available on each classifier. In particular, the knowledge on the “errors” madeby each classifier is exploited. Such prior knowledge is contained into the so called “confusion matrix”. For the kthclassifier Ck, it is quite simple to see that the confusion matrix can provide estimates of the following probabilities:

P(X∈ωi/Ck(X)=j)=(k)nij∑M(k)ni=1ij i=1..M, j=1..M, k=1..K(3)

(k) denotes the number of samples of class ω that have been assigned a label j by C.where nijik

On the basis of the above probabilities, the combination can be carried out by the following \"belief\" functions:

bel(i)=η∏P(X∈ωi/Ck(X)=jk) i=1..M j=1..K

k=1K(4)

The final classification is taken by assigning the input pattern to the data class for which bel(i) is maximum.X2.2 METACLASSIFICATION AS A COMBINATION METHOD

In this Section, we propose an alternative approach for combining results provided by different classifiers. Thisapproach is based on a concept that we called “metaclassification” (Figure 1). The results provided by each classifierare interpreted as new “features” to be used for characterizing the input pattern to be classified. In particular, each inputpattern is characterized by a feature vector containing the outputs of all the classifiers contained in the given ensemble.Let us assume that each classifier contained in the ensemble provides as output an estimation of the posteriorprobability that an input pattern belongs to the data class ωi:XPk( ∈ ωi / X ) k=1..K i=1..MXTherefore, each input pattern can be characterized by the following feature vector :P P = {(P1( ∈ ω1 / X ),.....,P1( X ∈ ωM / X )),..........,(Pk( X ∈ ω1 / X ),.....Pk( X ∈ ωM / X )} X(6) (5)

CLASSIFIER 1class 1class 2..class 1class 2..............................class Mclass 1class 2..CLASSIFIER 2X..........class MMETACLASSIFIERCLASSIFIER Kclass 1class 2..class Mclass MFigure 1 - A basic scheme for a “metaclassifier”

The feature vector P can be given as input to a “Metaclassifier” that classifies the input pattern on the basis of theclassifications provided by all the classifiers contained in the given ensemble. In principle, any kind of classifier can beused as a metaclassifier. For example [Suen et al. 1995] proposed a combination method that can be regarded as the useof a MLP neural network as a metaclassifier.

However, in order to perform effective metaclassifications, it is necessary to take into account the special characteristicsof the feature space related to the metaclassifier. In this paper, we propose a metaclassifier based on a modified versionof the k-nearest-neighbor rule. In particular, the distances between a given input pattern and the training samples arecalculated by means of the inner product among the corresponding features vectors. This kind of similarity measure wasused because the most similar training patterns are those that obtained the same classification from the classifiersensemble. By taking into account that the sum of the outputs for each classifier is equal to one, thus it follows that thenearest samples are those with the higher inner product.

In addition, some methods for reducing the dimensionality of the feature vector were proposed. For example given anPinput pattern, it is possible to describe the outputs of the ensemble for each class using their mean value and thestandard deviation. In this way the feature space has a fixed dimensionality of 2*M, whatever the number of classifiersin the ensemble.

3. EXPERIMENTAL RESULTS

3.1 SELECTED DATA SET

The selected data set consists of a set of multisensor remote-sensing images related to an agricultural area near thevillage of Feltwell (UK) [Serpico et al. 1995]. The images (each of 250 x 350 pixels) were acquired by two imagingsensors installed on an airplane: a multi-band optical sensor (an Airborne Thematic Mapper sensor with eleven bands)and a multi-channel radar sensor (a Synthetic Aperture Radar with twelve channels related to three bands, with fourpolarizations for each band). For our experiments, six bands of the optical sensors and nine channels of the radar sensorwere selected. As the image recognition process was carried out on a “pixel basis”, each pixel was characterized by afifteen-element “feature vector” containing the brightness values in the six optical bands and over the nine radarchannels considered. For our experiments, we selected 10944 pixels belonging to five agricultural classes (i.e., sugarbeets, stubble, bare soil, potatoes, carrots) and subdivided them into a training set (5124 pixels) and a test set (5820pixels).

3.2 RESULTS AND COMPARISONS

Our experiments were mainly aimed to investigate the following aspects:

(a)to prove that the use of ensembles consisting of different classifiers allows one to obtain satisfactory classificationaccuracies with short designing phases;

(b)to compare the performances of the proposed combination method (Section 2.2) with the ones provided bycombination methods based on the assumption of independent errors (Section 2.1).

First of all, in order to create a large “library” of classification algorithms that would allow us to test many differentensembles, we applied various classification algorithms to the selected data set. In particular, two statistical classifiers:the Gaussian Classifier, and the k-nearest neighbor classifier (KNN), and three neural networks classifiers: theMultilayer Perceptron neural network (MLP), the Radial Basis Functions neural network (RBF), and the ProbabilisticNeural Network (PNN). For each classifier, a careful designing phase was carried out in order to assess the bestperformances provided by single classifiers after long designing phases. For the k-nearest neighbor classifier, wecarried out different trials with “k” values ranging from 1 up to 91. For the Multilayer Perceptron neural networks, 5different architectures with one or two hidden layers (15-30-5, 15-8-5, 15-15-5, 15-30-15-5, 15-7-7-5) wereexperimented; for each architecture, 20 trials with different random initial-weights (“multi-start” learning strategy) werecarried out. For the Radial-Basis-Functions neural networks, different trials of the clustering algorithms (“k means”)used to define the network architecture (from 10 up to 30 hidden nodes) were performed. The Gaussian classifier andthe Probabilistic Neural Networks need no designing phases.

Consequently, 182 classifiers were trained and tested on the selected data set. The performances provided by differentclassifiers on the test set are summarized in Table 1.CLASSIFIERGaussianKNNMLPRBFPNN

LOWERACCURACY79.37%86.63%73.45%71.40%88.66%

MEANACCURACY79.37%88.36%81.60%78.95%88.66%

HIGHERACCURACY79.37%90.10%.75%86.51%88.66%

No. of “trained”classifiers

146100341

Table 1-Performances on the test set provided by the 182 classifier considered.

It is worth noticing that the best classifier obtained after the above “very long” designing phase is the k-nearest neighbor(k=21) with 90.10% of classification accuracy.

In order to prove that the combination of different classifiers allows one to obtain satisfactory classification accuracieswith “reduced” designing phases, we combined the best k-nearest neighbor classifier (k=21) with the “best trial” of theMultilayer Perceptron with architecture 15-15-5 (.48%), and the Probabilistic Neural Network (88.62%). The

Combination by Majority Rule of the three above classifiers provided a classification accuracy of 90.87%. TheCombination by Belief Functions provided a classification accuracy of 93.44%. This experiment proves that thecombination of just three classifiers performs better than the best classifier among 182. It is worth noticing that thedesigning phase necessary to produce these three classifiers needs of training and testing 66 classifiers (40 trials for theKNN and 20 trials for the MLP) and allows one to obtain performances that are better than the ones provided by thebest classifier (KNN, k=21) obtained after a designing phase involving 182 classifiers. Other similar experiments, thatwe do not report for the sake of brevity, were carried out using different ensembles (containing just neural classifiers ora “mixture” of statistical and neural classifiers). Also these experiments confirmed the conclusion that the combinationof different classification algorithms allows one to obtain satisfactory classification accuracies with reduced designingphases.

In order to evaluate the performances of the combination method that we proposed in Section 2.2, we carried out severalexperiments using different neural network ensembles. Here we report one experiment where three MLPs with differentnetwork architectures (15-30-15-5, 15-7-7-5, and 15-15-5) were combined. In this experiment, the combination methodbased on the metaclassifier described in Section 2.2 was compared with the statistical methods described in Section 2.1.Each neural classifier provided the following classification accuracies: 81.87% for the neural network with architecture15-30-15-5, 83.21% for the neural network with architecture 15-7-7-5, and 87.25% for the neural network witharchitecture 15-15-5. The Combination by Majority Rule provided 85.99%, the Combination by Bayesian Averageprovided 85.45%, and the Combination by Belief Functions provided 85.6%. Our combination based onmetaclassification provided a classification accuracy of 87.08% that proves the usefulness of using a combinationmethod that avoids the assumption of independent errors.

REFERENCES

Benediktsson, J.A.; Swain, P.H.; Ersoy O.K. 1990. Neural network approaches versus statistical methods inclassification of multisource remote-sensing data. IEEE Transactions on Geoscience and Remote Sensing Vol. 28, pp.540-552.

Binagi et al. Editors 1995. Proc. of the Int. Workshop on Soft Computing in Remote Sensing Data Analysis. WorldScientific Press, Milan.

Bruzzone, L.; Conese, C.; Maselli, F.; Roli F. 1997. Multisource classification of complex rural areas by statistical andneural-network approaches. Photogrammetric Engineering and Remote Sensing (PE&RS) journal, in press.

Kanellopoulos, I. et al. 1993. Integration of neural network and statistical image classification for land cover mapping.Proc. IGARSS 93, Tokio, pp. 511-513.

Roli, F.; Serpico, S.B.; Vernazza, G. 1996. Neural Networks for Classification of Remotely Sensed Images. FuzzyLogic and Neural Network Handbook, Part 2, Chapter 15. McGraw-Hill Series on Computer Eng., C.H.Chen Editor, pp.15.1-15.28.

Serpico, S.B.; F.Roli 1995. Classification of multisensor remote-sensing images by structured neural networks. IEEETransactions on Geoscience and Remote Sensing Vol. 33, No. 3, pp. 562-578.

Suen, C.Y. et al. 1995. The combination of multiple classifiers by a neural network approach. Int. Journal of PatternRecognition and Artificial Inteligence Vol. 9, no.3, pp.579-597.

Xu, L.; Krzyzak, A.; Suen, C.Y. 1992. Methods for combining multiple classifiers and their applications to handwritingrecognition. IEEE Trans. on Systems, Man, and Cyb. Vol. 22, No. 3, pp. 418-435.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文