DSpace Repository

A k-Median Algorithm with Running Time Independent of Data Size

Show simple item record

dc.contributor.author Meyerson Adam
dc.contributor.author 'callaghan Liadan O
dc.contributor.author Plotkin Serge
dc.contributor.author Ben-David Shai
dc.date.accessioned 2017-11-14T17:04:18Z
dc.date.available 2017-11-14T17:04:18Z
dc.date.issued 2004
dc.identifier.uri http://hdl.handle.net/123456789/3799
dc.description.abstract We give a sampling-based algorithm for the k-Median problem, with running time O(k(k 2 log k) 2 log (k log k)), where k is the desired number of clusters and is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1)-approximation, if each cluster in some optimal solution has (n k) points. We also give weakly-polynomial-time algorithms for this problem and a relaxed version of k-Median in which a small fraction of outliers can be excluded. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.
dc.format application/pdf
dc.title A k-Median Algorithm with Running Time Independent of Data Size
dc.type journal-article
dc.source.volume 56
dc.source.journal Machine Learning


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account