22 Kasım 2018 Perşembe

Clustering Techniques



Various clustering techniques are used in the literature. These clustering techniques can be grouped under five groups:

Partitioning Clustering
At first all elements are considered as a single cluster, then iteratively grouped the respective elements together in smaller chambers. In other words, it is a clustering technique that divides a data set consisting of n elements into k pieces. Partition clustering is usually done with the help of a objective function. The most popular partitioning clustering techniques are k-Means (Lloyd, 1982), k-Median, k-Medoids, PAM (Rousseeuw and Kaufman, 1990), CLARA (Rousseeuw and Kaufman, 1990) ve CLARANS (Ng and Han, 2002).

Hierarchical Clustering
Data objects are grouped by creating tree-like structures in hierarchical clustering. There are two different approaches to hierarchical clustering: (i) agglomerative, (ii) divisive. In the agglomerative method, a single object is initially selected and the neighbors of these objects are combined with this object according to their distance from this object. In the divisive method, all data is initially a single set, then the set is divided into ideal small partitions iteratively. The most popular hierarchical clustering techniques are BIRCH (Zhang et al., 1996), CURE (Guha et al., 1998), ROCK (Guha et al., 2000), Chameleon (Karypis et al., 1999) ve CACTUS (Ganti et al., 1999).


Density Based Clustering
Data objects are categorized according to core points, boundary points and noise points. Based on the density, the elements around the core points are located in the same clusters. The most popular density based clustering techniques are DBSCAN (Ester et al., 1996), OPTICS (Ankerst et al., 1999), DBCLASD (Xu et al., 1998), DENCLUE (Hinneburg et al., 1998) ve SUBCLU (Kailing et al., 2004).

Grid Based Clustering
The data set is divided into a certain number of cells to form a grid structure and all clustering operations are performed over this grid structure. The most popular grid based clustering techniques are STING (Wang et al., 1997), CLIQUE (Agrawal et al., 1998), Wave Cluster (Sheikholeslami et al., 1998), BANG (Schikuta and Erhart, 1997) ve OptiGrid (Hinneburg and Keim, 1999).

Model Based Clustering
Data elements are combined by a series of statistical and conceptual methods. The harmony between data and some mathematical models is tried to be optimized. There are two different approaches in model-based clustering: statistical approach and artificial neural networks. The most popular grid based clustering techniques are EM (Dempster et al., 1977), COBWEB (Fisher, 1987), CLASSIST (Gennari et al., 1989), SOM (Kohonen, 1997) ve SLINK (Han et al., 2011).

All of the aforementioned clustering algorithms perform batch processing, so they access data on the disk. In this way, they have information about the whole data. They can process the data multiple times and randomly access the data at any point in the algorithm.


16 Kasım 2018 Cuma

Clustering



Clustering in computer science, is an important issue that can be handled both in the field of data mining because it can obtain meaningful patterns from the data and in the field of machine learning because it is a learning method (unsupervised learning). For this reason, a lot of research has been done about clustering. In the field of machine learning, classification is known as supervised learning, clustering is also known as unsupervised learning technique. Because while group labels are known when classifying, group labels are not known in clustering, and finding class tags is the task of the clustering algorithm. Therefore, clustering is more difficult than classification. There are many definitions of cluster and clustering in the literature (Everitt, 1980).

·         A cluster is a collection of elements in which the elements in the same group are similar and in which the elements in different groups are not similar.
·         Clusters are groups in which the distance between two different elements in the same group is smaller than the distance between two elements in two different groups.
·         A cluster is a state of high-density points separated from lower-density points in a d-dimensional attribute space.

 

The purpose of clustering is to divide the finite, unlabeled data set into finite labeled natural groups (Baraldi and Alpaydin, 2002; Vladimir S et al., 2007).

Clustering, as mentioned earlier, is a learning method and nowadays is used in many areas ranges from manufacturing to artificial intelligent and from network security to surveillance system. Maybe you have a computer and work as a server on the internet. There are a lot of node to connect this server. The majority of these nodes also can be innocent so they have normal tcp or udp connection to the server. However there can be some malicious nodes that want to attack and corrupt the server in some ways like DDOS and man-in-the middle attack. So, clustering is a way to learn which node is innocent and which one is malicious node.