AN 856: K-Mean Clustering with the Intel® FPGA SDK for OpenCL™

ID 683395
Date 6/12/2018
Public

1.3. Performance Results

We compared the performance of this implementation on an FPGA with the optimized k-mean implementation on a CPU. For both FPGA and CPU runs, the same data set was used.

The FPGA used for performance comparison was an Intel® Arria® 10 GX FPGA Development Kit. The FPGA was programmed with Intel® FPGA SDK for OpenCL™ Version 17.1 Update 1. During testing, the FPGA had an fMAX of 320 MHz.

The CPU used for performance comparison was an Intel® Xeon® E5-2680 (24 cores, no hyperthreading).

The following table shows the time to converge on acceptable clusters for data with various data sizes.
Data Size (bytes) FPGA CPU
Time with initialization method 1 (ms) Time with initialization method 2 (ms) Time (ms)
512 0.028 0.016 0.065
1024 0.042 0.032 0.573
2048 0.051 0.037 0.627
4096 0.089 0.039 0.804
8192 0.105 0.044 0.919

In this experiment, the number of clusters are set to 10.

Each data set includes 2 features of floating type and different numbers of input data sets (512 to 8192) are used to compare the performance of FPGA and CPU.

For the FPGA runs, we tried two initialization methods. In the first method, we used the first k-data as the centroids of the clusters. In the second method, we chose centroids randomly. With randomly-chosen initial centroids, the algorithm required fewer iterations and therefore achieved faster times.