Hadoop Install directory - /usr/lib/hadoop-0.20/
The port number for Namenode is ’70′, for job tracker is ’30′ and for task tracker is ’60′.
3 config files: core-site, mapred-site.xml, hdfs-site.xml
Spill factor is the size after which your files move to the temp file. Hadoop-temp directory is used for this.
Hdfs-site.xml properties:
dfs.name.dir, dfs.data.dir and fs.checkpoint.dir
Fsck – file system check
Jps – to check if hadoop daemons are running
Restart hadoop daemons
start-yarn.sh, stop-yarn.sh
start-all.sh, stop-all.sh
Slaves and Masters file are used by the startup and the shutdown commands.
Slaves consist of a list of hosts, one per line, that host datanode and task tracker servers.
Masters contain a list of hosts, one per line, that are to host secondary namenode servers.
hadoop-env.sh provides the environment for Hadoop to run. JAVA_HOME is set over here.
The command mapred.job.tracker lists out which of your nodes is acting as a job tracker.
/etc /init.d specifies where daemons (services) are placed or to see the status of these daemons. It is very LINUX specific, and nothing to do with Hadoop.
2. Pseudo-distributed mode -
3. Fully distributed mode – daemons running on clusters.
How can we check whether Namenode is working or not?
To check whether Namenode is working or not, use the command /etc/init.d/hadoop-0.20-namenode status or as simple as jps
Default Ports
SSH – 22
The port number for Namenode is ’70′, for job tracker is ’30′ and for task tracker is ’60′.
http://Hadoopmaster:50070/ – web UI of the NameNode daemon
http://Hadoopmaster:50030/ – web UI of the JobTracker daemon
http://Hadoopmaster:50060/ – web UI of the TaskTracker daemon
Quickly switching hadoop modes
hadoop@computer:~$ cd /your/hadoop/installation/
hadoop@computer:~$ cp -R conf conf.standalone
hadoop@computer:~$ cp -R conf conf.pseudo
hadoop@computer:~$ cp -R conf conf.distributed
hadoop@computer:~$ rm -R conf
mapreduce.tasktracker.map.
mapreduce.tasktracker.reduce.
Important: If you change these settings, restart all of the TaskTracker nodes.
SSH is a password-less secure communication where data packets are sent across the slave
SSH is nothing but a secure shell communication, it is a kind of a protocol that works on a Port No. 22, and when you do an SSH, what you really require is a password.
The port number for Namenode is ’70′, for job tracker is ’30′ and for task tracker is ’60′.
3 config files: core-site, mapred-site.xml, hdfs-site.xml
Spill factor is the size after which your files move to the temp file. Hadoop-temp directory is used for this.
Hdfs-site.xml properties:
dfs.name.dir, dfs.data.dir and fs.checkpoint.dir
Fsck – file system check
Jps – to check if hadoop daemons are running
Restart hadoop daemons
start-yarn.sh, stop-yarn.sh
start-all.sh, stop-all.sh
Slaves and Masters file are used by the startup and the shutdown commands.
Slaves consist of a list of hosts, one per line, that host datanode and task tracker servers.
Masters contain a list of hosts, one per line, that are to host secondary namenode servers.
hadoop-env.sh provides the environment for Hadoop to run. JAVA_HOME is set over here.
The command mapred.job.tracker lists out which of your nodes is acting as a job tracker.
/etc /init.d specifies where daemons (services) are placed or to see the status of these daemons. It is very LINUX specific, and nothing to do with Hadoop.
Which are the three modes in which Hadoop can be run?
1. standalone (local) mode – no daemons, all on single JVM, no dfs, only local file system.2. Pseudo-distributed mode -
3. Fully distributed mode – daemons running on clusters.
How can we check whether Namenode is working or not?
To check whether Namenode is working or not, use the command /etc/init.d/hadoop-0.20-namenode status or as simple as jps
Default Ports
SSH – 22
The port number for Namenode is ’70′, for job tracker is ’30′ and for task tracker is ’60′.
http://Hadoopmaster:50070/ – web UI of the NameNode daemon
http://Hadoopmaster:50030/ – web UI of the JobTracker daemon
http://Hadoopmaster:50060/ – web UI of the TaskTracker daemon
Quickly switching hadoop modes
hadoop@computer:~$ cd /your/hadoop/installation/
hadoop@computer:~$ cp -R conf conf.standalone
hadoop@computer:~$ cp -R conf conf.pseudo
hadoop@computer:~$ cp -R conf conf.distributed
hadoop@computer:~$ rm -R conf
ln-
to create a link for a folder.
Switching
to standalone modehadoop@computer:~$
ln -s conf.standalone conf
Switching
to pseudo-distributed modehadoop@computer:~$
ln -s conf.pseudo conf
Switching
to fully distributed modehadoop@computer:~$
ln -s conf.distributed conf
Map
and reduce slots are controled in mapred-site.xmlmapreduce.tasktracker.map.
mapreduce.tasktracker.reduce.
Important: If you change these settings, restart all of the TaskTracker nodes.
What are the network requirements for Hadoop?
The Hadoop core uses Shell (SSH) to launch the server processes on the slave nodes. It requires password-less SSH connection between the master and all the slaves and the secondary machines.SSH is a password-less secure communication where data packets are sent across the slave
SSH is nothing but a secure shell communication, it is a kind of a protocol that works on a Port No. 22, and when you do an SSH, what you really require is a password.
No comments:
Post a Comment