Tudor Lapusan's Blog

Post info:

Setting Ganglia filters for Hadoop

Metrics filtering is useful when you don’t need all metrics and don’t want to overload your Ganglia cluster with useless information.

First of all I’m working with Apache Hadoop 1.1.2(metrics2 implementation), Ganglia 3.1.7 and I assume you have the knowledge about how to integrate Hadoop with Ganglia. If not you can read this article.

Filtering can be made on three levels : source, record and metric. To apply any kind of filtering you should define the way filtering will be made : Regex or Glob.
I chose datanode metrics to explain how to apply filters.
See here how datanode metrics look without any filtering.
The following lines should be put into conf/hadoop-metrics2.properties file :


// setting Glob to filter metrics for all three levels
// filename to store all the datanode metrics information
// filter out any metric which start with heart (ex.  heartBeats_num_ops=6)
// filter out any record which has the context value as dfs (ex. context=dfs)
// filter out any record which has the context value as jvm (ex. context=jvm)
// filter out any record which has the hostName value as gdelt (ex. hostName=gdelt)
All of the sink filter out metrics happen in MetricsSinkAdapter class from hadoop-core package.
If you want to take a look on it and to better understand how filtering works search for the method publishMetrics(MetricsBuffer) from MetricsSinkAdaper class.
It took me days to understand how filtering works, especially because of poor documentation.
I had to look into source code to understand how it works.
If you have questions, let me know.