Reference:http://www.thegeekstuff.com/
Hadoop filesystem commands
hadoop fs -mkdir /dir
hadoop fs -ls
hadoop fs -cat <filename>
hadoop fs -rm <<filename>>
hadoop fs -mv file:///data/datafile /user/hduser/data
hadoop fs -touchz <<filename>> -create empty file
hadoop fs -stat <filename>
hadoop fs -expunge <<empty trash on hdfs>>
ram@ram:/etc/init.d$ hadoop fs -du /user
50270 /user/1.log
0 /user/hive
hadoop fs -copyFromLocal <source> <destination>
hadoop fs -copyToLocal <source> <destination>
hadoop fs -put <source> <destination> --copy from remote location
hadoop fs -get <source> <destination> --copy to remote location
hadoop distcp hdfs://192.168.0.8:8020/input hdfs://192.168.0.8:8020/output
-- Copy data from one cluster to another using the cluster URL
hadoop fs -setrep -w 3 file1
hadoop fs -getmerge mydir bigfile
-- Merge files in mydir directory and download it as one big file
Hadoop Job Commands
hadoop job -submit <job-file>
hadoop job -status <job-id>
hadoop job -history
hadoop job -kill-task <task-id>
ram@ram:/etc/init.d$ hadoop job -list all
DEPRECATED: Use of
this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/07/29 21:03:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/29 21:03:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total jobs:0
JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info
ram@ram:/etc/init.d$ hadoop job -list-active-trackers
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/07/29 21:04:24 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
tracker_ram:49874
Hadoop Namenode commands
hadoop namenode -format
hadoop namenode -upgrade
hadoop namenode -recover -force
hadoop fsck -delete <<delete corrupted files>>
hadoop fsck -move <<move corrupted files to lost+found folder>
-- Recover namenode metadata after a cluster failure (may lose data)
ram@ram:/etc/init.d$ stop-dfs.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
ram@ram:/etc/init.d$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
ram@ram:/etc/init.d$ start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-ram-namenode-ram.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-ram-datanode-ram.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-ram-secondarynamenode-ram.out
ram@ram:/etc/init.d$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-ram-resourcemanager-ram.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-ram-nodemanager-ram.out
ram@ram:/etc/init.d$
ram@ram:/etc/init.d$ jps
6330 NodeManager
6192 ResourceManager
5827 DataNode
6649 Jps
6028 SecondaryNameNode
5664 NameNode
ram@ram:/etc/init.d$ hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://localhost:50070/fsck?ugi=ram&path=%2F
FSCK started by ram (auth:SIMPLE) from /127.0.0.1 for path / at Wed Jul 29 20:56:55 IST 2015
.
/user/1.log: Under replicated BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 50270 B
Total dirs: 7
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 50270 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (66.666664 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Wed Jul 29 20:56:55 IST 2015 in 3 milliseconds
The filesystem under path '/' is HEALTHY
ram@ram:/etc/init.d$ hadoop fsck / -files -blocks -locations -racks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://localhost:50070/fsck?ugi=ram&files=1&blocks=1&locations=1&racks=1&path=%2F
FSCK started by ram (auth:SIMPLE) from /127.0.0.1 for path / at Wed Jul 29 20:58:22 IST 2015
/ <dir>
/tmp <dir>
/tmp/hive <dir>
/tmp/hive/ram <dir>
/user <dir>
/user/1.log 50270 bytes, 1 block(s): Under replicated BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s).
0. BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001 len=50270 repl=1 [/default-rack/127.0.0.1:50010]
/user/hive <dir>
/user/hive/warehouse <dir>
Status: HEALTHY
Total size: 50270 B
Total dirs: 7
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 50270 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (66.666664 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Wed Jul 29 20:58:22 IST 2015 in 3 milliseconds
The filesystem under path '/' is HEALTHY
Hadoop dfsadmin commands
ram@ram:/etc/init.d$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 98496679936 (91.73 GB)
Present Capacity: 80164052992 (74.66 GB)
DFS Remaining: 80163958784 (74.66 GB)
DFS Used: 94208 (92 KB)
DFS Used%: 0.00%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (localhost)
Hostname: ram
Decommission Status : Normal
Configured Capacity: 98496679936 (91.73 GB)
DFS Used: 94208 (92 KB)
Non DFS Used: 18332626944 (17.07 GB)
DFS Remaining: 80163958784 (74.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jul 29 21:06:41 IST 2015
ram@ram:/etc/init.d$ hadoop dfsadmin -setQuota 10 /user
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
ram@ram:/etc/init.d$ hadoop fs -count -q /user
10 6 none inf 3 1 50270 /user
ram@ram:/etc/init.d$
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode enter
Safe mode is ON
ram@ram:/etc/init.d$ hadoop dfsadmin -saveNamespace
<<Backup Metadata (fsimage & edits). Put cluster in safe mode before this command.>>
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Save namespace successful
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is ON
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is OFF
ram@ram:/etc/init.d$
Hadoop yarn commands
Hadoop Balancer commands
ram@ram:/etc/init.d$ start-balancer.sh
starting balancer, logging to /usr/local/hadoop/logs/hadoop-ram-balancer-ram.out
hadoop dfsadmin -setBalancerBandwidth <bandwidthinbytes>
ram@ram:/etc/init.d$ hadoop balancer -threshold 20
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/07/29 21:16:22 INFO balancer.Balancer: Using a threshold of 20.0
15/07/29 21:16:22 INFO balancer.Balancer: namenodes = [hdfs://localhost:9000]
15/07/29 21:16:22 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=20.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/07/29 21:16:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/29 21:16:24 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
15/07/29 21:16:24 INFO balancer.Balancer: 0 over-utilized: []
15/07/29 21:16:24 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
29 Jul, 2015 9:16:24 PM 0 0 B 0 B -1 B
29 Jul, 2015 9:16:24 PM Balancing took 2.217 seconds
ram@ram:/etc/init.d$
Hadoop filesystem commands
hadoop fs -mkdir /dir
hadoop fs -ls
hadoop fs -cat <filename>
hadoop fs -rm <<filename>>
hadoop fs -mv file:///data/datafile /user/hduser/data
hadoop fs -touchz <<filename>> -create empty file
hadoop fs -stat <filename>
hadoop fs -expunge <<empty trash on hdfs>>
ram@ram:/etc/init.d$ hadoop fs -du /user
50270 /user/1.log
0 /user/hive
hadoop fs -copyFromLocal <source> <destination>
hadoop fs -copyToLocal <source> <destination>
hadoop fs -put <source> <destination> --copy from remote location
hadoop fs -get <source> <destination> --copy to remote location
hadoop distcp hdfs://192.168.0.8:8020/input hdfs://192.168.0.8:8020/output
-- Copy data from one cluster to another using the cluster URL
hadoop fs -setrep -w 3 file1
hadoop fs -getmerge mydir bigfile
-- Merge files in mydir directory and download it as one big file
Hadoop Job Commands
hadoop job -submit <job-file>
hadoop job -status <job-id>
hadoop job -history
hadoop job -kill-task <task-id>
ram@ram:/etc/init.d$ hadoop job -list all
DEPRECATED: Use of
this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/07/29 21:03:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/29 21:03:51 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Total jobs:0
JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info
ram@ram:/etc/init.d$ hadoop job -list-active-trackers
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.
15/07/29 21:04:24 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
tracker_ram:49874
Hadoop Namenode commands
hadoop namenode -format
hadoop namenode -upgrade
hadoop namenode -recover -force
hadoop fsck -delete <<delete corrupted files>>
hadoop fsck -move <<move corrupted files to lost+found folder>
-- Recover namenode metadata after a cluster failure (may lose data)
ram@ram:/etc/init.d$ stop-dfs.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
ram@ram:/etc/init.d$ stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
ram@ram:/etc/init.d$ start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-ram-namenode-ram.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-ram-datanode-ram.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-ram-secondarynamenode-ram.out
ram@ram:/etc/init.d$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-ram-resourcemanager-ram.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-ram-nodemanager-ram.out
ram@ram:/etc/init.d$
ram@ram:/etc/init.d$ jps
6330 NodeManager
6192 ResourceManager
5827 DataNode
6649 Jps
6028 SecondaryNameNode
5664 NameNode
ram@ram:/etc/init.d$ hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://localhost:50070/fsck?ugi=ram&path=%2F
FSCK started by ram (auth:SIMPLE) from /127.0.0.1 for path / at Wed Jul 29 20:56:55 IST 2015
.
/user/1.log: Under replicated BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 50270 B
Total dirs: 7
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 50270 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (66.666664 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Wed Jul 29 20:56:55 IST 2015 in 3 milliseconds
The filesystem under path '/' is HEALTHY
ram@ram:/etc/init.d$ hadoop fsck / -files -blocks -locations -racks
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Connecting to namenode via http://localhost:50070/fsck?ugi=ram&files=1&blocks=1&locations=1&racks=1&path=%2F
FSCK started by ram (auth:SIMPLE) from /127.0.0.1 for path / at Wed Jul 29 20:58:22 IST 2015
/ <dir>
/tmp <dir>
/tmp/hive <dir>
/tmp/hive/ram <dir>
/user <dir>
/user/1.log 50270 bytes, 1 block(s): Under replicated BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001. Target Replicas is 3 but found 1 replica(s).
0. BP-393036986-127.0.1.1-1437358619878:blk_1073741825_1001 len=50270 repl=1 [/default-rack/127.0.0.1:50010]
/user/hive <dir>
/user/hive/warehouse <dir>
Status: HEALTHY
Total size: 50270 B
Total dirs: 7
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 50270 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 2 (66.666664 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Wed Jul 29 20:58:22 IST 2015 in 3 milliseconds
The filesystem under path '/' is HEALTHY
Hadoop dfsadmin commands
ram@ram:/etc/init.d$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Configured Capacity: 98496679936 (91.73 GB)
Present Capacity: 80164052992 (74.66 GB)
DFS Remaining: 80163958784 (74.66 GB)
DFS Used: 94208 (92 KB)
DFS Used%: 0.00%
Under replicated blocks: 1
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (1):
Name: 127.0.0.1:50010 (localhost)
Hostname: ram
Decommission Status : Normal
Configured Capacity: 98496679936 (91.73 GB)
DFS Used: 94208 (92 KB)
Non DFS Used: 18332626944 (17.07 GB)
DFS Remaining: 80163958784 (74.66 GB)
DFS Used%: 0.00%
DFS Remaining%: 81.39%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Jul 29 21:06:41 IST 2015
ram@ram:/etc/init.d$ hadoop dfsadmin -setQuota 10 /user
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
ram@ram:/etc/init.d$ hadoop fs -count -q /user
10 6 none inf 3 1 50270 /user
ram@ram:/etc/init.d$
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode enter
Safe mode is ON
ram@ram:/etc/init.d$ hadoop dfsadmin -saveNamespace
<<Backup Metadata (fsimage & edits). Put cluster in safe mode before this command.>>
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Save namespace successful
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is ON
ram@ram:/etc/init.d$ hadoop dfsadmin -safemode leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is OFF
ram@ram:/etc/init.d$
Hadoop yarn commands
Hadoop Balancer commands
ram@ram:/etc/init.d$ start-balancer.sh
starting balancer, logging to /usr/local/hadoop/logs/hadoop-ram-balancer-ram.out
hadoop dfsadmin -setBalancerBandwidth <bandwidthinbytes>
ram@ram:/etc/init.d$ hadoop balancer -threshold 20
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/07/29 21:16:22 INFO balancer.Balancer: Using a threshold of 20.0
15/07/29 21:16:22 INFO balancer.Balancer: namenodes = [hdfs://localhost:9000]
15/07/29 21:16:22 INFO balancer.Balancer: parameters = Balancer.Parameters[BalancingPolicy.Node, threshold=20.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0]
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
15/07/29 21:16:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/29 21:16:24 INFO net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010
15/07/29 21:16:24 INFO balancer.Balancer: 0 over-utilized: []
15/07/29 21:16:24 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
29 Jul, 2015 9:16:24 PM 0 0 B 0 B -1 B
29 Jul, 2015 9:16:24 PM Balancing took 2.217 seconds
ram@ram:/etc/init.d$