Supervised vs Unsupervised (Perspective: Maximum Likelihood to Select The Sample Area)
As we known, data mining techniques come in two main forms: supervised
(also known as predictive or directed) and unsupervised (also known as
descriptive or un-directed). Both categories encompass functions capable of
finding different hidden patterns in large data sets. Supervised data mining
techniques are appropriate when we have a specific target value that you’d like to
predict about your data. The targets can have two or more possible outcomes, or
even be a continuous numeric value (more on that later). The accuracy is determined
by the quality of the sampling and the number of samples. The sample area is
created using Region Of Interest (ROI).
ROI must first be created before conducting this supervised classification
process. Region Of Interest is the sampling area formed as a training area on
the supervised classification. The classification model can have more than two
possible values in the target attribute.
To use these methods, we ideally have a subset of data points for which
this target value is already known. We use that data to build a model of what a
typical data point looks like when it has one of the various target values. We
then apply that model to data for which that target value is currently unknown.
The algorithm identifies the “new” data points that match the model of each
target value.
The pixel-based classification using the maximum likelihood algorithm is a
guided classification method based on the Bayes theorem (homogeneous objects
always display the normal histogram distributed). MLE uses discriminant
function (density probability / probability density). At the time of
classification, all unspecified pixels are set to be class members that have
been determined by the highest probability of occurrence in each class. If the
probability value of a class is smaller than the specified threshold value
then, the pixel is not grouped.
Fig. 1 The image satellite of Sentinel-2 composite 432 plus Near infrared 8
In this Figure 1. we can see the satellite image which downloaded from https://scihub.copernicus.eu with specific image Sentinel-2
on date 01 January 2017 in some area Wielkopolski. We used a composite band for
Sentinel True Color, after that we can change the composite from true color to
RGB (432) plus NIR 8. This change with the aim of sharp the object of
vegetation and water. The Processing satellite image-based Supervised
Classification using Maximum Likelihood Estimation algorithm (MLE). Figure 2 is
the result map of supervised classification in ArcGis 10.3
Fig. 2 The Result map with maximum likelihood
(supervised classification) in ArcGis 10.3.
The using of this maximum likelihood method gives the interpreter the
chance to determine the type
of object we own, but the weakness of the u-normal
distribution (un-bias) pattern causes pixel
scattering that does not reach the
actual object. For the example, In the object high vegetation (forest)
there is
a little appearance of a high building which mean in the real world is
impossible happened.
There is some error interpretation because of abnormal
distribution and sampling.
So, the conclusion of using maximum likelihood (supervised method) for
interpretation land use/land cover is:
Advantages:
1.
Maximum likelihood provides a consistent approach to
parameter estimation problems. This means that maximum likelihood estimates can
be developed for a large variety of estimation situations.
2.
Maximum likelihood methods have desirable mathematical
and optimal properties which mean, they become minimum variance unbiased
estimators as the sample size increases and They have approximately normal
distributions and approximate sample variances that can be used to generate
confidence bounds and hypothesis tests for the parameters.
Disadvantages:
1.
The likelihood equations need to be specifically worked
out for a given distribution and estimation problem.
2.
Maximum likelihood estimates can be heavily biased for
small samples. The optimal properties may not apply for small samples.
3.
Maximum likelihood can be sensitive to the choice of
starting values as like high building and low building
Komentar
Posting Komentar