22 Kasım 2018 Perşembe

Clustering Techniques



Various clustering techniques are used in the literature. These clustering techniques can be grouped under five groups:

Partitioning Clustering
At first all elements are considered as a single cluster, then iteratively grouped the respective elements together in smaller chambers. In other words, it is a clustering technique that divides a data set consisting of n elements into k pieces. Partition clustering is usually done with the help of a objective function. The most popular partitioning clustering techniques are k-Means (Lloyd, 1982), k-Median, k-Medoids, PAM (Rousseeuw and Kaufman, 1990), CLARA (Rousseeuw and Kaufman, 1990) ve CLARANS (Ng and Han, 2002).

Hierarchical Clustering
Data objects are grouped by creating tree-like structures in hierarchical clustering. There are two different approaches to hierarchical clustering: (i) agglomerative, (ii) divisive. In the agglomerative method, a single object is initially selected and the neighbors of these objects are combined with this object according to their distance from this object. In the divisive method, all data is initially a single set, then the set is divided into ideal small partitions iteratively. The most popular hierarchical clustering techniques are BIRCH (Zhang et al., 1996), CURE (Guha et al., 1998), ROCK (Guha et al., 2000), Chameleon (Karypis et al., 1999) ve CACTUS (Ganti et al., 1999).


Density Based Clustering
Data objects are categorized according to core points, boundary points and noise points. Based on the density, the elements around the core points are located in the same clusters. The most popular density based clustering techniques are DBSCAN (Ester et al., 1996), OPTICS (Ankerst et al., 1999), DBCLASD (Xu et al., 1998), DENCLUE (Hinneburg et al., 1998) ve SUBCLU (Kailing et al., 2004).

Grid Based Clustering
The data set is divided into a certain number of cells to form a grid structure and all clustering operations are performed over this grid structure. The most popular grid based clustering techniques are STING (Wang et al., 1997), CLIQUE (Agrawal et al., 1998), Wave Cluster (Sheikholeslami et al., 1998), BANG (Schikuta and Erhart, 1997) ve OptiGrid (Hinneburg and Keim, 1999).

Model Based Clustering
Data elements are combined by a series of statistical and conceptual methods. The harmony between data and some mathematical models is tried to be optimized. There are two different approaches in model-based clustering: statistical approach and artificial neural networks. The most popular grid based clustering techniques are EM (Dempster et al., 1977), COBWEB (Fisher, 1987), CLASSIST (Gennari et al., 1989), SOM (Kohonen, 1997) ve SLINK (Han et al., 2011).

All of the aforementioned clustering algorithms perform batch processing, so they access data on the disk. In this way, they have information about the whole data. They can process the data multiple times and randomly access the data at any point in the algorithm.


Hiç yorum yok:

Yorum Gönder