unsupervised image classifcation using isodata and fuzzy …eprints.utem.edu.my/15021/1/unsupervised...

24
UNSUPERVISED IMAGE CLASSIFCATION USING ISODATA AND FUZZY C-MEANS KEK SOO LING UNIVERSITI TEKNIKAL MALAYSIA MELAKA

Upload: others

Post on 02-Nov-2019

30 views

Category:

Documents


9 download

TRANSCRIPT

UNSUPERVISED IMAGE CLASSIFCATION USING

ISODATA AND FUZZY C-MEANS

KEK SOO LING

UNIVERSITI TEKNIKAL MALAYSIA MELAKA

BORANG PENGESAHAN STATUS TESIS JUDUL: UNSUPERVISED IMAGE CLASSIFICATION USING ISODATA AND FUZZY C-MEANS SESI PENGAJIAN: SESI 2013/2014 Saya KEK SOO LING mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah) ini disimpan di Perpustakaan Fakulti Teknologi Maklumat dan Komunikasi dengan syarat-syarat kegunaan seperti berikut: 1. Tesis dan projek adalah hakmilik Universiti Teknikal Malaysia Melaka. 2. Perpustakaan Fakulti Teknologi Maklumat dan Komunikasi dibenarkan membuat salinan untuk tujuan pengajian sahaja. 3. Perpustakaan Fakulti Teknologi Maklumat dan Komunikasi dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi pengajian tinggi. 4.**Sila tandakan (/) _____________ SULIT (Mengandungi maklumat yang berdarjah

keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASMI 1972)

_____________ TERHAD (Mengandungi maklumat TERHAD yang telah

ditentukan oleh organisasi/badan di mana penyelidikan dijalankan)

√ TIDAK TERHAD

__________________________ ___________________________

(TANDATANGAN PENULIS) (TANDATANGAN PENYELIA) Alamat tetap :3510, Taman Muhibbah, EN. OTHMAN BIN MOHD 73200 Gemencheh, NS Nama Penyelia Tarikh : ___________________ Tarikh: ____________________

CATATAN: * Tesis dimaksudkan sebagai Laporan Akhir Projek Sarjana Muda (PSM) ** Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak berkuasa.

UNSUPERVISED IMAGE CLASSIFICATION USING

ISODATA AND FUZZY C-MEANS

KEK SOO LING

This report is submitted in partial fulfillment of the requirement for the Bachelor of

Computer Science (Computer Networking)

FACULTY OF INFORMATION AND COMMUNICATION TECHNOLOGY

UNIVERSITI TEKNIKAL MALAYSIA MELAKA

2014

i

DECLARATION

I hereby declare that this project entitled

UNSUPERVISED IMAGE CLASSIFICATION

USING ISODATA AND FUZZY C-MEANS

is written by me and is my own effort and that no part has been plagiarized without

citations

STUDENT: ___________________________ DATE: _________________

(KEK SOO LING)

SUPERVISOR: ________________________ DATE: _________________

(EN.OTHMAN BIN MOHD)

ii

DEDICATION

Dear Parents

Thank you for your giving me the big support and encouragement. Your biggest

support and care have helped me to achieve the final task in my university life.

Dear Lecturer, Supervisors and Evaluator

Thank you for your guidance, encouragement and knowledge.

iii

ACKNOWLEDGEMENTS

Firstly, I feel lucky to have persevere, patience and good health throughout the

duration of this final year project research. I feel very fortunate to have En Othman

Bin Mohd as my research supervisor. Besides, I would like to express my high

appreciation to my supervisor En. Othman Bin Mohd. The supervision and support

that he gave truly help the progression and smoothness of my final year project. The

co-operation is much indeed appreciated. For evaluator, Dr Azman Abu, thank you

for guidance during the presentation on PSM and also for the evaluating and reading

this report.

Moreover, deepest thanks and appreciation to my parents and family members for

their encouragement and full of support for the report completion, from the

beginning till the end.

Last but not least, thanks to all of my friends, they have been contributed by

supporting my work and help myself during the final year project progress.

iv

ABSTRACT

The research is focus on unsupervised of satellite image classification to classify an image into thematic image by using two techniques: Iterative Self-Organizing Data Technique Algorithm (ISODATA) and Fuzzy C-Mean (FCM). ISODATA is an extension of K-Means algorithm but ISODATA determines the number of clusters dynamically. ISODATA tries to find the best cluster centroids through the iterative approach until it meet some convergence criteria. Besides, ISODATA involves splitting and merging of the resulting clusters based on the user pre-specified thresholds. For splitting situation, when a cluster standard deviation above a pre-specified threshold, a cluster will be split into two while for merging situation, when the distance between the centroids is below another pre-specified threshold, two clusters are merged. FCM is the most popular fuzzy clustering technique which allows one data point to belong to two or more groups or clusters with different membership degrees between 0 and 1. The aim of FCM is to find the cluster centers that minimize a dissimilarity function. Thus the research is necessary to know the how the classification work and which of the techniques can produce the better output after classify the satellite image. Moreover, the research will compare the results generated between the two different techniques. The research is carried out using Matlab R2010a and at the end of this research, the result will show which techniques will produce the better output after doing the comparison of the final outputs.

v

ABSTRAK

Kajian ini fokus kepada teknik tanpa pengawasan ke atas imej satelite untuk membuat klasifikasi dengan mengunakan dua teknik iaitu ISODATA dan FCM. ISODATA merupakan lanjutan kepada KMeans algorithma tetapi ISODATA menentukan beberapa kumpulan secara dinamik. ISODATA akan mencari kluster sentroid yang terbaik melalui pendekatan leleran sehingga is mencapai penumpuan kriteria. Selain itu, ISODATA melibatkan pemisahan dan penggabungan keputusan kluster atas ambang yang dispesifikasikan oleh pengguna. Bagi situasi permisahan, kluster akan memisahkan kepada dua bahagian semasa sesuatu sisihan piawai kluster berada pada tahap yang atas ambang yang dispesifikasikan. Bagi situasi penggabungan, dua kluster akan bergabung semasa jarak antara sentroid kurang dari tahap yang ditetapkan. FCM merupakan teknik fuzzy yang popular kerana ia berkaitan dengan hubungan antara data dengan 0 and 1. Maka, objektif FCM adalah mencari kluster sentroid untuk mengurangkan ketidaksamaan antara algorithma. Kesimpulannya, kajian ini adalah untuk menjalankan pengklasifikasian dan membincangkan teknik yang menghasilkan keputusan yang baik dengan menggunakan Matlab 2010a.

vi

TABLE OF CONTENTS

CHAPTER SUBJECT PAGE

DECLARATION i

DEDICATION ii

ACKNOWLEDGEMENTS iii

ABSTRACT iv

ABSTRAK v

TABLE OF CONTENTS vi

LIST OF TABLES ix

LIST OF FIGURES x

LIST OF ABBREVIATIONS xii

CHAPTER 1 INTRODUCTION

1.1 Project Background 2

1.2 Problem Statements 4

1.3 Research Questions 5

1.4 Objective 6

1.5 Scope 7

1.6 Project Significant 8

1.7 Expected Result 8

1.8 Report Organization 10

1.9 Summary 10

CHAPTER 2 LITERATURE REVIEW

2.1 Introduction 11

2.2 Related Work 12

2.2.1 Satellite Image 14

vii

2.2.2 Image Classification

Definitions

15

2.2.3 Unsupervised Classification

Techniques

17

2.3 Analysis of current problem and

justification

21

2.4 Proposed Solution 22

2.5 Summary 23

CHAPTER 3 METHODOLOGY

3.1 Introduction 24

3.2 Project Methodology 24

3.3 Project schedule and Milestones 30

3.4 Gantt Chart 35

3.5 Summary 36

CHAPTER 4 IMPLEMENTATION

4.1 Introduction 38

4.2 Project Requirements 38

4.2.1 Software Requirements 39

4.2.2 Hardware Requirements 40

4.3 Unsupervised Classification

Techniques Algorithms

40

4.3.1 ISODATA 42

4.3.2 FCM 45

4.4 Summary 54

viii

CHAPTER 5 TESTING

5.1 Introduction 55

5.2 Test Plan 55

5.2.1 Test Organization 56

5.3 Test Description 56

5.3.1 Preprocessing 58

5.3.2 Processing 59

5.3.3 Decision 64

5.3.4 Accuracy Test 68

5.3.5 Final Output 69

5.4 Summary 71

CHAPTER 6 PROJECT CONCLUSION

6.1 Introduction 72

6.2 Advantages and Disadvantages of

Techniques

72

6.3 Contribution 73

6.4 Future Work 74

6.5 Summary 74

REFERENCES 75

APPENDIX A 76

APPENDIX B 80

ix

LIST OF TABLES

FIGURE TITLE PAGE

1.1 Summary of Problem Statements 4

1.2 Summary of Research Questions 5

1.3 Summary of Research Objectives 6

3.1 Milestones for PSM 1 Summary 31

3.2 Milestones for PSM 2 Summary 32

3.3 PSM 1 Activities 32

3.4 PSM 2 Activities 34

4.1 Software Requirements 39

4.2 Hardware Requirement 40

4.3 ISODATA Parameters 44

4.4 FCM Parameters 47

4.5 ClusterFinder Buttons Description 49

4.6 Specified ISODATA Parameters 51

5.1 Organization responsibilities on testing phase 56

5.2 Organization responsibilities on testing phase 57

5.3 Objects for ISODATA clusters 65

5.4 Objects for FCM clusters 66

5.5 Comparison of clusters between ISODATA and FCM 67

5.6 Classified Image and Pie Chart of ISODATA 67

5.7 Classified Image and Pie Chart of FCM 68

5.8 MSE of ISODATA and FCM 69

5.9 Characteristic comparison between ISODATA and

FCM

70

6.1 Advantages of ISODATA and FCM 73

6.2 Disadvantages of ISODATA and FCM 73

x

LIST OF FIGURES

FIGURE TITLE PAGE

2.1 Summary Chart for Image Processing 13

2.2 Image Classification Process 15

2.3 Overview of Proposed Solution 22

3.1 Summary Chart of Image Processing 25

3.2 Unsupervised Classification of the Satellite Image 26

3.3 Five main stages for the overall project 27

3.4 Project Methodology Flows 29

4.1 The Flow of Unsupervised Classification

Techniques Implementation

41

4.2 The Overall Flow of ISODATA 42

4.3 The Flow Chart of ISODATA 43

4.4 The Overall Flow of FCM 45

4.5 The Flow Chart of FCM 46

4.6 GUI for ISODATA testing 48

4.7 GUI for FCM testing 48

4.8 Code to read RGB image 50

4.9 Codes to convert RGB image to L*a*b image 50

4.10 Codes to convert data into ‘double’ format 50

4.11 Codes to reshape data 50

4.12 Codes for FCM to train data 51

4.13 Codes for FCM to train data 51

4.14 Codes for reshaping resulted cluster of ISODATA 52

4.15 Codes for reshaping resulted cluster of FCM 52

4.16 Codes for ISODATA color segmentation 52

4.17 Codes for FCM color segmentation 53

xi

4.18 Codes for display FCM segmentation 53

4.19 Codes for display ISODATA segmentation 53

4.20 MSE Main Function 54

5.1 Preprocessing Flow 58

5.2 Reshape data(X), mean and standard deviation of

X

58

5.3 Preprocessing test output image. (a) Original RGB

image, (b) L*a*b image

59

5.4 Preprocessing test output image 59

5.5 ISODATA Classified Image 60

5.6 Segmented ISODATA Image. (a) Objects in

cluster 1, (b) Objects in cluster 2, (c) Objects in

cluster 3, (d) Objects in cluster 4, (e) Objects in

cluster 5

61

5.7 FCM Classified Image 62

5.8 Segmented FCM Image. (a) Objects in cluster 1,

(b) Objects in cluster 2, (c) Objects in cluster 3, (d)

Objects in cluster 4, (e) Objects in cluster 5

63

5.9 The Flow of Decision Test for ISODATA and

FCM.

64

5.10 The Flow of Final Output Testing 69

5.11 Final Classified Images. (a) ISODATA Classified

Image, (b) FCM Classified Image

70

xii

LIST OF ABBREVIATIONS

TERMS DESCRIPTIONS

FCM Fuzzy c-Means

ISODATA Iterative Self-Organizing Data

Analysis Technique Algorithm

MSE Mean Square Error

RP Research Problem

RQ Research Question

RO Research Objective

TIFF Tag Image File Format

uint 8 Unsigned 8-bit

1

CHAPTER I

INTRODUCTION

Nowadays, there are a large number of satellite images been generated and

widely used in many field such as forestry, geology, military and so on. A satellite

image commonly has multiple bands and different band representing different

wavelengths from the ultraviolet. These ultraviolet wavelengths were through the

visible and infrared portions of the electromagnetic spectrum. In other words,

satellite images consist on layers and each layer contains special information that

requires us to reveal (Arias et al., 2009). The data of the satellite image provides

direct observation of the land cover at repetitive interval, thus allow class mapping

and allow identifying the changes in land cover. The techniques that used to classify

a satellite image can be categorized into two which are supervised and unsupervised

classification techniques. Supervised classification techniques consist of Minimum

Distance to Mean, Maximum Likelihood, Mahalanobis, Stepwise Linear, Suits’

Maximum Relative and Back Propagation while unsupervised classification

techniques consist of Iterative Self-Organizing Data Analysis Technique Algorithm

(ISODATA), Fuzzy C Means, K-Means, Simple One-Pass Clustering, Minimum

Distribution Angle, Self-Organization, and Adaptive Resonance. This project

discusses on the use of two unsupervised classification techniques to identify the

land cover using QuickBird satellite image. The main focus of this study is to

classifying satellite image using ISODATA and Fuzzy C-Mean as well as to compare

which technique produces better result or more accurate to represent the data in

satellite image.

2

1.1 Project Background

Image processing is a method to extract the useful information from an image

and by performing some algorithms or operations after converting the image into

form of digital (Hossain, 2012). It was an assignment of signal in which the input

image can be in form of photo or video frame and produce an output that full with

characteristics that related to the input image (Hossain, 2012). Image processing can

be divided into several parts which include image classification, image compression,

image segmentation, image restoration and so on. However, this project will be

covered on image classification.

Image classification is the process of extracting differentiated classes or

themes from raw remotely sensed data. The main or objectives of the classification

process is to categorize all pixels in an image with amount of digital data into several

spectral classes, or themes with similar spectral attributes and then each spectral class

is assigned to an information class (Caprioli et al., 2003).

Normally, multispectral data are used to perform the classification and,

indeed, the spectral pattern present within the data for each pixel is used as the

numerical basis for categorization (Lillesand and Kiefer, 1994). Categorization of

image pixels is based on their digital number or gray values in one or more spectral

bands.

As humans expert with satellite image based on visual perception that provide

surroundings information but as sensors, it provide visible range of electromagnetic

energy which are greatly limited sensitivity to humans eyes and it is able to form a

lasting record of what human view, therefore a lot of methods been developed to

increase the ability to see and record the physical properties of land cover (Tur et al.,

2001).

Generally, there are two approaches taken to perform digital image

classification which are supervised and unsupervised classification. For supervised

image classification, it is required prior knowledge and familiar about the land cover

and the spectral classes must be identified. The data from the pixels in an image are

3

used to train an algorithm of image classification. After trained, the algorithm can be

applied to all of image and finally the image that is classified is gained

(Kamaruzaman et al., 2009). Supervised classification was divided into several

techniques which were Minimum Distance to Mean, Maximum Likelihood,

Mahalanobis, Stepwise Linear, Suits’ Maximum Relative and Back Propagation.

For unsupervised image classification, it does not require prior knowledge

about the land cover and the image is automatically classified into spectral classes

based on natural groupings found into the data (Caprioli et al., 2003). It was divided

into several techniques which were Iterative Self-Organizing Data Analysis

Technique Algorithm (ISODATA), K-Means, Fuzzy C Means, Simple One-Pass

Clustering, Minimum Distribution Angle, Self-Organization, and Adaptive

Resonance.

A pixel in satellite image might represent a mixture of class covers,

variability of within-class or other complex surface cover patterns. The overlapping

pixel cannot be correctly described by one class due to the image spatial resolution

(pixel size). The mixed pixel (overlapping pixel) may influenced the accuracy of the

classification or misclassification even though a good training area been selected

(Mohd Othman et al., 2012).

According to (Ozesmi et al., 2002), the problem of overlapping pixels is

dealing with fuzzy classification. In supervised classification, mixed pixels typically

generate misclassification while the mixed pixels in unsupervised classification are

dealing by using mixed labels because the information can be extracted without

using the traditional per-pixel classifier (Mohd Othman et al., 2012).

However, this project was only focus on the unsupervised image

classification using the two different techniques; ISODATA and Fuzzy C Means.

4

1.2 Problem Statements

Each pixel in an image reflects different spectral or represent important

information that required us to reveal. Sometimes we have no prior knowledge about

the land cover and it is not easy to recognize or identify the object or pixels inside the

image that needed. In order to make the image easy to classified, recognized and

identified without prior knowledge, the unsupervised image classification is applied.

However, there are a lot of techniques under image classification and each produce

different result. Therefore, it was required to find out which technique is more

suitable. The Research Problem (RP) is summarized into Table 1.1 as below:

Table 1.1: Summary of Problem Statements

RP Research Problem

RP1 Too many techniques can be applied to image classification.

RP2 Lack of understanding how unsupervised techniques process.

RP3 Lack of image understanding and recognition of an object or pixel which

represent information in the image.

Based on the research problems in Table1.1, three research questions are

conducted and the explanation for each of the Research Problems is explained as

below:

RP1: Too many techniques can be applied to image classification.

This research problem is due to a lot of techniques that can be applied in

image classification and may no prior knowledge about the techniques and have to

make a decision to select suitable techniques to classify the satellite image

RP2: Lack of understanding how unsupervised techniques process.

This research problem is due to different techniques may produce different

result or output. Therefore it is needed to understand and analyze how these

unsupervised image classification techniques to classify the image.

5

RP3: Lack of image understanding and recognition of an object or pixel which

represent information in the image.

This research problem is because of each pixel in an image can be a mixed

pixel and unmixed pixel. In unmixed pixel, the data or information in an image can

be easily be identified but in mixed pixels, there may contain two different pixels

inside one pixel, therefore it is hardly to be classified.

1.3 Research Questions

Based on the problem statement that been stated in Table 1.1, the research

question was identified. The research questions were summarizing and stated in

Table 1.2.

Table 1.2: Summary of Research Questions

RP RQ Research Questions

RP1 RQ1 Which image classification techniques been selected to classify the

satellite image?

RP2 RQ2 How unsupervised image classification techniques function?

RP3 RQ3 What is the meaning or information of the group of the pixels in an

image that produced by the techniques represent?

From Table 1.2, the explanation for each of the Research Questions is

constructed and described as follow:

RQ1: Which image classification techniques been selected to classify the

satellite image?

This research question is stated in order to understand which classification

techniques are suitable for this project. Without a prior knowledge about the land

cover, unsupervised image classification is more suitable in this project

6

RQ2: How unsupervised image classification techniques function?

This research question is stated by considering the understanding of the

procedure of how the unsupervised image classification techniques function. Each

unsupervised technique performed in different ways to classify an image.

RQ3: What is the meaning or information of the group of the pixels in an

image that produced by the techniques represent?

This research question is stated in order to understand and recognize the

groups of pixels that been classified and the pixels that bring out the meaningful

information to the image.

1.4 Objective

The appropriate research objectives are constructed based on the research

problems and research questions objectives that were mentioned before. The

summary of the research objectives were shown in Table 1.3.

Table 1.3: Summary of Research Objectives

RP RQ RO Research Objectives

RP1 RQ1 RO1 To select different techniques of unsupervised image

classification to separate land cover from satellite image.

RP2 RQ2 RO2 To analyze unsupervised classification techniques for satellite

image processing.

RP3 RQ3 RO3 To separate land cover of satellite image using different

unsupervised classification techniques

From the Table 1.3, the explanation of each of the research objectives are

constructed as follow:

7

RO1: To select different techniques of unsupervised image classification to

separate land cover from satellite image.

In order to classify the pixels of an image into meaningful data, first need to

identify the image classification techniques. Since do not have the prior knowledge

of land cover, unsupervised classification techniques are more suitable to be selected.

The unsupervised classification techniques that been chosen were ISODATA and

Fuzzy C-Means.

RO2: To analyze unsupervised classification techniques for satellite image

processing.

Each unsupervised techniques have their own procedure and function,

therefore need to analyze the procedure of each technique to classify the pixels into

meaningful class.

RO3: To separate land cover of satellite image using different unsupervised

classification techniques

Based on the analysis, each data in an image will be separate into different

clusters and the output will be produce.

1.5 Scope

The scope of this project focusing on the ISODATA and Fuzzy C-Means

techniques which is used to compare and find out the most suitable unsupervised

image classification which produce more accurate output. The QuickBird satellite

image that used in this project was using uint8(unsigned integer 8) with the size of

225x277x3 pixels from Erdas Imagine. The satellite image was taken on 17 March

2010 of Kuala Linggi Mangroves Forest, Alor Gajah Melaka. The software used to

construct the studied was Matlab R2010a.

8

1.6 Project Significant

The main purpose of this project was to automatically classify the image into

spectral classes based on identification of natural groupings found in the data. The

procedure of classifying the satellite image using ISODATA and Fuzzy C Mean

were shown in this project and the thematic classes been produced. At the end of this

project, the resulted of image classification using these two techniques were analyzed

and compared. Finally, each percentage of the generated clusters will be calculated.

1.7 Expected Result

By the end of this project, the expected results must achieve the goals of this

project:

i. The procedure of each unsupervised image classification techniques that

performed to classify the satellite images is shown.

ii. The objects or pixel in an image can be distinguished and identified easily.

iii. The thematic image is identified and shown.

1.8 Report Organization

In this project, the report organization included six main chapters. Each of the

main chapters will describe the main task or procedure in order to perform this

project.

In Chapter I: Introduction, eight main topics which included the project’s

background, problem statements, research questions, project objective, project scope,

expected result, and report organization were discussed in this chapter. The ideas of

this project were produced through the research question and problem statement that

stated and finally the objectives of the research are generated. Besides that, project

background is an important part to get an idea on the flows of the project. The project

9

scope was focus on the domain of the project which was ISODATA and Fuzzy C

Means. Apart from that, the expected result is the part that stated the goals of the

objective and must be achieved at the end of the project. Finally was the report

organization which was carried out to assure the report was organizing in a

systematic manner.

In Chapter II: Literature Review, the related work, analysis of current

problem and justification, and proposed solution for the project were discussed and

described in this chapter. The related work was the research on the basic concepts of

the project domain. After that, current problems in the current situation of image

classification were analyzed and discussed. In order to overcome those problems, the

proposed solutions were proposed.

In Chapter III: Methodology, the topics included project methodology,

project schedules and milestones were discussed. The project methodology shows the

overall of the project flows and the main five steps flows were discussed in this

chapter. For the project schedules and milestone, it was a topic that shows the person

that included in this project and how to organize the time and activities in order to

perform in a timely manner.

In Chapter IV: Implementation, the two main topics were discussed: project

requirement and the algorithms of the unsupervised classification techniques. In

project requirement, it was divided into two sub topics which were software

requirement and hardware requirement while for algorithms of the unsupervised

classification techniques, it was discussed the algorithms and the different processing

flows of the classification techniques and the steps to implement in this project were

discussed in this chapter.

In the Chapter V: Testing and Analysis, test plan and test description are the

main topics for this chapter. The test plan was taken to find out the person that

responsible to test this project. The test description is the step by step testing that

used for the two classification techniques.