[jinghang@hadoop102 job]$ vim flume-netcat-logger.conf
添加内容如下:
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1注:配置文件来源于官方手册http://flume.apache.org/FlumeUserGuide.html
[jinghang@hadoop102 group1]$ touch flume-flume-hdfs.conf[jinghang@hadoop102 group1]$ vim flume-flume-hdfs.conf
添加如下内容
# Name the components on this agenta2.sources = r1a2.sinks = k1a2.channels = c1# Describe/configure the source# source端的avro是一个数据接收服务a2.sources.r1.type = avroa2.sources.r1.bind = hadoop102a2.sources.r1.port = 4141# Describe the sinka2.sinks.k1.type = hdfsa2.sinks.k1.hdfs.path = hdfs://hadoop102:9000/flume2/%Y%m%d/%H#上传文件的前缀a2.sinks.k1.hdfs.filePrefix = flume2-#是否按照时间滚动文件夹a2.sinks.k1.hdfs.round = true#多少时间单位创建一个新的文件夹a2.sinks.k1.hdfs.roundValue = 1#重新定义时间单位a2.sinks.k1.hdfs.roundUnit = hour#是否使用本地时间戳a2.sinks.k1.hdfs.useLocalTimeStamp = true#积攒多少个Event才flush到HDFS一次a2.sinks.k1.hdfs.batchSize = 100#设置文件类型,可支持压缩a2.sinks.k1.hdfs.fileType = DataStream#多久生成一个新的文件a2.sinks.k1.hdfs.rollInterval = 600#设置每个文件的滚动大小大概是128Ma2.sinks.k1.hdfs.rollSize = 134217700#文件的滚动与Event数量无关a2.sinks.k1.hdfs.rollCount = 0# Describe the channela2.channels.c1.type = memorya2.channels.c1.capacity = 1000a2.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela2.sources.r1.channels = c1a2.sinks.k1.channel = c1
3.创建flume-flume-dir.conf
配置上级Flume输出的Source,输出是到本地目录的Sink。 创建配置文件并打开
[jinghang@hadoop102 group1]$ touch flume-flume-dir.conf[jinghang@hadoop102 group1]$ vim flume-flume-dir.conf
添加如下内容
# Name the components on this agenta3.sources = r1a3.sinks = k1a3.channels = c2# Describe/configure the sourcea3.sources.r1.type = avroa3.sources.r1.bind = hadoop102a3.sources.r1.port = 4142# Describe the sinka3.sinks.k1.type = file_rolla3.sinks.k1.sink.directory = /opt/module/data/flume3# Describe the channela3.channels.c2.type = memorya3.channels.c2.capacity = 1000a3.channels.c2.transactionCapacity = 100# Bind the source and sink to the channela3.sources.r1.channels = c2a3.sinks.k1.channel = c2
[jinghang@hadoop102 group2]$ touch flume-flume-console1.conf[jinghang@hadoop102 group2]$ vim flume-flume-console1.conf
添加如下内容
# Name the components on this agenta2.sources = r1a2.sinks = k1a2.channels = c1# Describe/configure the sourcea2.sources.r1.type = avroa2.sources.r1.bind = hadoop102a2.sources.r1.port = 4141# Describe the sinka2.sinks.k1.type = logger# Describe the channela2.channels.c1.type = memorya2.channels.c1.capacity = 1000a2.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela2.sources.r1.channels = c1a2.sinks.k1.channel = c1
3.创建flume-flume-console2.conf
配置上级Flume输出的Source,输出是到本地控制台。 创建配置文件并打开
[jinghang@hadoop102 group2]$ touch flume-flume-console2.conf[jinghang@hadoop102 group2]$ vim flume-flume-console2.conf
添加如下内容
# Name the components on this agenta3.sources = r1a3.sinks = k1a3.channels = c2# Describe/configure the sourcea3.sources.r1.type = avroa3.sources.r1.bind = hadoop102a3.sources.r1.port = 4142# Describe the sinka3.sinks.k1.type = logger# Describe the channela3.channels.c2.type = memorya3.channels.c2.capacity = 1000a3.channels.c2.transactionCapacity = 100# Bind the source and sink to the channela3.sources.r1.channels = c2a3.sinks.k1.channel = c2
[jinghang@hadoop103 group3]$ touch flume1-logger-flume.conf[jinghang@hadoop103 group3]$ vim flume1-logger-flume.conf
添加如下内容
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = execa1.sources.r1.command = tail -F /opt/module/group.loga1.sources.r1.shell = /bin/bash -c# Describe the sinka1.sinks.k1.type = avroa1.sinks.k1.hostname = hadoop104a1.sinks.k1.port = 4141# Describe the channela1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c12.创建flume2-netcat-flume.conf配置Source监控端口44444数据流,配置Sink数据到下一级Flume:在hadoop102上创建配置文件并打开[jinghang@hadoop102 group3]$ touch flume2-netcat-flume.conf[jinghang@hadoop102 group3]$ vim flume2-netcat-flume.conf添加如下内容# Name the components on this agenta2.sources = r1a2.sinks = k1a2.channels = c1# Describe/configure the sourcea2.sources.r1.type = netcata2.sources.r1.bind = hadoop102a2.sources.r1.port = 44444# Describe the sinka2.sinks.k1.type = avroa2.sinks.k1.hostname = hadoop104a2.sinks.k1.port = 4141# Use a channel which buffers events in memorya2.channels.c1.type = memorya2.channels.c1.capacity = 1000a2.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela2.sources.r1.channels = c1a2.sinks.k1.channel = c1
[jinghang@hadoop104 group3]$ touch flume3-flume-logger.conf[jinghang@hadoop104 group3]$ vim flume3-flume-logger.conf
添加如下内容
# Name the components on this agenta3.sources = r1a3.sinks = k1a3.channels = c1# Describe/configure the sourcea3.sources.r1.type = avroa3.sources.r1.bind = hadoop104a3.sources.r1.port = 4141# Describe the sink# Describe the sinka3.sinks.k1.type = logger# Describe the channela3.channels.c1.type = memorya3.channels.c1.capacity = 1000a3.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela3.sources.r1.channels = c1a3.sinks.k1.channel = c1
[jinghang@hadoop103 group3]$ touch flume1-logger-flume.conf[jinghang@hadoop103 group3]$ vim flume1-logger-flume.conf
添加如下内容
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = execa1.sources.r1.command = tail -F /opt/module/group.loga1.sources.r1.shell = /bin/bash -c# Describe the sinka1.sinks.k1.type = avroa1.sinks.k1.hostname = hadoop104a1.sinks.k1.port = 4141# Describe the channela1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1
[jinghang@hadoop102 group3]$ touch flume2-netcat-flume.conf[jinghang@hadoop102 group3]$ vim flume2-netcat-flume.conf
添加如下内容
# Name the components on this agenta2.sources = r1a2.sinks = k1a2.channels = c1# Describe/configure the sourcea2.sources.r1.type = netcata2.sources.r1.bind = hadoop102a2.sources.r1.port = 44444# Describe the sinka2.sinks.k1.type = avroa2.sinks.k1.hostname = hadoop104a2.sinks.k1.port = 4141# Use a channel which buffers events in memorya2.channels.c1.type = memorya2.channels.c1.capacity = 1000a2.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela2.sources.r1.channels = c1a2.sinks.k1.channel = c13.创建flume3-flume-logger.conf配置source用于接收flume1与flume2发送过来的数据流,最终合并后sink到控制台。在hadoop104上创建配置文件并打开[jinghang@hadoop104 group3]$ touch flume3-flume-logger.conf[jinghang@hadoop104 group3]$ vim flume3-flume-logger.conf添加如下内容# Name the components on this agenta3.sources = r1a3.sinks = k1a3.channels = c1# Describe/configure the sourcea3.sources.r1.type = avroa3.sources.r1.bind = hadoop104a3.sources.r1.port = 4141# Describe the sink# Describe the sinka3.sinks.k1.type = logger# Describe the channela3.channels.c1.type = memorya3.channels.c1.capacity = 1000a3.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela3.sources.r1.channels = c1a3.sinks.k1.channel = c1
gmond(Ganglia Monitoring Daemon)是一种轻量级服务,安装在每台需要收集指标数据的节点主机上。使用gmond,你可以很容易收集很多系统指标数据,如CPU、内存、磁盘、网络和活跃进程的数据等。 gmetad(Ganglia Meta Daemon)整合所有信息,并将其以RRD格式存储至磁盘的服务。 gweb(Ganglia Web)Ganglia可视化工具,gweb是一种利用浏览器显示gmetad所存储数据的PHP前端。在Web界面中以图表方式展现集群的运行状态下收集的多种不同指标数据。
4) 修改配置文件/etc/httpd/conf.d/ganglia.conf
[jinghang@hadoop102 flume]$ sudo vim /etc/httpd/conf.d/ganglia.conf
修改为红颜色的配置:
# Ganglia monitoring system php web frontendAlias /ganglia /usr/share/ganglia Order deny,allow #Deny from all Allow from all # Allow from 127.0.0.1 # Allow from ::1 # Allow from .example.com
5) 修改配置文件/etc/ganglia/gmetad.conf
[jinghang@hadoop102 flume]$ sudo vim /etc/ganglia/gmetad.conf
修改为:
data_source "hadoop102" 192.168.1.102
6) 修改配置文件/etc/ganglia/gmond.conf
[jinghang@hadoop102 flume]$ sudo vim /etc/ganglia/gmond.conf
修改为:
cluster { name = "hadoop102" owner = "unspecified" latlong = "unspecified" url = "unspecified"}udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. # mcast_join = 239.2.11.71 host = 192.168.1.102 port = 8649 ttl = 1}udp_recv_channel { # mcast_join = 239.2.11.71 port = 8649 bind = 192.168.1.102 retry_bind = true # Size of the UDP buffer. If you are handling lots of metrics you really # should bump it up to e.g. 10MB or even higher. # buffer = 10485760}
7) 修改配置文件/etc/selinux/config
[jinghang@hadoop102 flume]$ sudo vim /etc/selinux/config
修改为:
# This file controls the state of SELinux on the system.# SELINUX= can take one of these three values:# enforcing - SELinux security policy is enforced.# permissive - SELinux prints warnings instead of enforcing.# disabled - No SELinux policy is loaded.SELINUX=disabled# SELINUXTYPE= can take one of these two values:# targeted - Targeted processes are protected,# mls - Multi Level Security protection.SELINUXTYPE=targeted
尖叫提示:selinux本次生效关闭必须重启,如果此时不想重启,可以临时生效之:
[jinghang@hadoop102 flume]$ sudo setenforce 0
5) 启动ganglia
[jinghang@hadoop102 flume]$ sudo service httpd start[jinghang@hadoop102 flume]$ sudo service gmetad start[jinghang@hadoop102 flume]$ sudo service gmond start
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = com.jinghang.MySourcea1.sources.r1.delay = 1000#a1.sources.r1.field = jinghang# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1# Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = localhosta1.sources.r1.port = 44444# Describe the sinka1.sinks.k1.type = com.jinghang.MySink#a1.sinks.k1.prefix = jinghang:a1.sinks.k1.suffix = :jinghang# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1