Learn Hadoop and NoSQL: August 2015

Create table in Hive

hive> create table employee(
    > empno int,
    > empname string,
    > empcity string,
    > empcountry string)
    > row format delimited
    > fields terminated by ','
    > stored as textfile;
OK
Time taken: 0.168 seconds

hive> describe employee;
OK
empno                   int
empname                 string
empcity                 string
empcountry              string
Time taken: 0.115 seconds, Fetched: 4 row(s)

Load data from local path

hive> load data local inpath '/home/ram/employee.txt' into table employee;
Loading data to table hivedemo.employee
Table hivedemo.employee stats: [numFiles=1, totalSize=48]
OK
Time taken: 0.352 seconds
hive> select * from employee;
OK
1    ramkumar    palani    india
2    kavitha    ramnad    india
Time taken: 0.146 seconds, Fetched: 3 row(s)
hive>

load data from dfs

hadoop fs -copyFromLocal /home/ram/employee1.txt /tmp

hive> load data inpath '/tmp/employee1.txt' into table employee;
hive> select * from employee where empno is not null;
OK
1    ramkumar    palani    india
2    kavitha    ramnad    india
1    ramkumar    palani    india
2    kavitha    ramnad    india
Time taken: 0.179 seconds, Fetched: 4 row(s)
hive>

files available in employee folder

ram@ram:~$ hadoop fs -ls /user/hive/warehouse/hivedemo.db/employee
Found 2 items
-rwxrwxr-x   3 ram 48 2015-08-31 21:00 /user/hive/warehouse/hivedemo.db/employee/employee.txt
-rwxrwxr-x   3 ram 48 2015-08-31 21:11 /user/hive/warehouse/hivedemo.db/employee/employee1.txt

hive> alter table employee rename to empnew;
OK
Time taken: 0.223 seconds

hive> show tables;
OK
empnew
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive>

ram@ram:~$ hadoop fs -ls /user/hive/warehouse/hivedemo.db/Found 1 items
drwxrwxr-x   - ram 0 2015-08-31 21:12 /user/hive/warehouse/hivedemo.db/empnew

ram@ram:~$ hadoop fs -ls /user/hive/warehouse/hivedemo.db/empnewFound 2 items
-rwxrwxr-x   3 ram 48 2015-08-31 21:00 /user/hive/warehouse/hivedemo.db/empnew/employee.txt
-rwxrwxr-x   3 ram 48 2015-08-31 21:11 /user/hive/warehouse/hivedemo.db/empnew/employee1.txt
ram@ram:~$

overwrite - overwrite existing files in employee folder

hive> load data inpath '/tmp/employee1.txt' overwrite into table empnew;

Loading data to table hivedemo.empnew
Table hivedemo.empnew stats: [numFiles=1, numRows=0, totalSize=48, rawDataSize=0]
OK
Time taken: 0.42 seconds

hive> select * from empnew;
OK
1    ramkumar    palani    india
2    kavitha    ramnad    india
NULL    NULL    NULL    NULL
Time taken: 0.182 seconds, Fetched: 3 row(s)
hive>

Drop the table

hive> drop table empnew1;
OK
Time taken: 0.185 seconds
hive>

create external table

- multiple tables can point same path
- non hive application (without exclusive lock) can use this

hive> create external table empexternal (col1 int, col2 string, col3 string, col4 string)
    > row format delimited
    > fields terminated by ','
    > stored as textfile
    > location '/tmp/emptmp'
    > ;

OK
Time taken: 0.129 seconds

hive> select * from empexternal;
OK
1    ramkumar    palani    india
2    kavitha    ramnad    india
Time taken: 0.146 seconds, Fetched: 3 row(s)
hive>

copy file to external table path

hive> load data inpath '/tmp/emptmp/emp2.txt' into table emp2;

Partitions

source: http://hadooptutorial.info/partitioning-in-hive/

hive> CREATE TABLE partitioned_user(
    > firstname VARCHAR(64),
    > lastname VARCHAR(64),
    > address   STRING,
    > city   VARCHAR(64),
    > post      STRING,
    > phone1    VARCHAR(64),
    > phone2    STRING,
    > email     STRING,
    > web       STRING
    > )
    > PARTITIONED BY (country VARCHAR(64), state VARCHAR(64))
    > row format delimited
    > fields terminated by ','
    > STORED AS TEXTFILE;
OK
Time taken: 0.139 seconds

USCA.txt

Rebbecca,Didio,171 E 24th St,Leith,7315,03-8174-9123,0458-665-290,rebbecca.didio@didio.com.au,http://www.brandtjonathanfesq.com.au
Stevie,Hallo,22222 Acoma St,Proston,4613,07-9997-3366,0497-622-620,stevie.hallo@hotmail.com,http://www.landrumtemporaryservices.com.au
Mariko,Stayer,534 Schoenborn St #51,Hamel,6215,08-5558-9019,0427-885-282,mariko_stayer@hotmail.com,http://www.inabinetmacreesq.com.au

hive> LOAD DATA LOCAL INPATH '/home/ram/USCA.txt'
    >       INTO TABLE partitioned_user
    >       PARTITION (country = 'US', state = 'CA');

Loading data to table default.partitioned_user partition (country=US, state=CA)
Partition default.partitioned_user{country=US, state=CA} stats: [numFiles=1, numRows=0, totalSize=400, rawDataSize=0]
OK
Time taken: 0.619 seconds

hive> select firstname, city, state, country, post from partitioned_user;
OK
Rebbecca    Leith    CA    US    7315
Stevie    Proston    CA    US    4613
Mariko    Hamel    CA    US    6215
Time taken: 0.238 seconds, Fetched: 3 row(s)

hive>

ram@ram:~$ hadoop fs -ls /user/hive/warehouse/partitioned_user/country=US/state=CA
Found 1 items
-rwxrwxr-x   3 ram 400 2015-09-01 09:50 /user/hive/warehouse/partitioned_user/country=US/state=CA/USCA.txt
ram@ram:~$

INDCHN.txt
Ramkumar,Gopal,171 E 24th St,Leith,7315,03-8174-9123,0458-665-290,rebbecca.didio@didio.com.au,http://www.brandtjonathanfesq.com.au
Nagarajan,Palani,22222 Acoma St,Proston,4613,07-9997-3366,0497-622-620,stevie.hallo@hotmail.com,http://www.landrumtemporaryservices.com.au
Karthik,Bhaskaran,534 Schoenborn St #51,Hamel,6215,08-5558-9019,0427-885-282,mariko_stayer@hotmail.com,http://www.inabinetmacreesq.com.au

hive> LOAD DATA LOCAL INPATH '/home/ram/INDCHN.txt'
    > INTO TABLE partitioned_user
    > PARTITION (country = 'IND', state = 'CHN');
Loading data to table default.partitioned_user partition (country=IND, state=CHN)
Partition default.partitioned_user{country=IND, state=CHN} stats: [numFiles=1, numRows=0, totalSize=409, rawDataSize=0]
OK
Time taken: 0.614 seconds

hive> select firstname, city, state, country, post from partitioned_user;
OK
Ramkumar    Leith    CHN    IND    7315
Nagarajan    Proston    CHN    IND    4613
Karthik    Hamel    CHN    IND    6215
Rebbecca    Leith    CA    US    7315
Stevie        Proston    CA    US    4613
Mariko        Hamel    CA    US    6215
Time taken: 0.238 seconds, Fetched: 7 row(s)

hive> select firstname, city, state, country, post from partitioned_user
> where country = 'IND';

OK
Ramkumar    Leith    CHN    IND    7315
Nagarajan    Proston    CHN    IND    4613
Karthik        Hamel    CHN    IND    6215
Time taken: 0.584 seconds, Fetched: 4 row(s)

INSERT FROM ANOTHER TABLE

hive> CREATE TABLE user_demo(
    > firstname VARCHAR(64),
    > lastname VARCHAR(64),
    > address   STRING,
    > city   VARCHAR(64),
    > post      STRING,
    > phone1    VARCHAR(64),
    > phone2    STRING,
    > email     STRING,
    > web       STRING,
    > country varchar(64),
    > state varchar(64)
    > )
    > row format delimited
    > fields terminated by ','
    > STORED AS TEXTFILE;
OK
Time taken: 0.163 seconds

hive> insert overwrite table user_demo
    > select * from partitioned_user
    > where firstname = 'Nagarajan' or firstname = 'Mariko';

Query ID = ram_20150901102328_85ea0fcb-86d4-4aa4-8d5a-6a5eb7e997c3
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1441075729906_0001, Tracking URL = http://ram:8088/proxy/application_1441075729906_0001/
Kill Command = /usr/local/hadoop/bin/hadoop job -kill job_1441075729906_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-09-01 10:23:39,660 Stage-1 map = 0%, reduce = 0%
2015-09-01 10:23:48,262 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.97 sec
MapReduce Total cumulative CPU time: 2 seconds 970 msec
Ended Job = job_1441075729906_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://localhost:9000/user/hive/warehouse/user_demo/.hive-staging_hive_2015-09-01_10-23-28_494_5725270058421044549-1/-ext-10000
Loading data to table default.user_demo
Table default.user_demo stats: [numFiles=1, numRows=2, totalSize=287, rawDataSize=285]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   Cumulative CPU: 2.97 sec   HDFS Read: 7370 HDFS Write: 361 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 970 msec
OKTime taken: 22.406 seconds
hive>

Install CDH 5.3.2 using Cloudera Manager

Prerequesties

1. Install java

sudo apt-get update
sudo apt-get install openjdk-7-jdk
java -version

2. Install and configure ssh
sudo apt-get install ssh
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

3. <<here ram refers the the su user or the machine name>>

su root

gedit /etc/sudoers
ram ALL =(ALL) NOPASSWD: ALL

4.
/etc/hosts to be updated as
127.0.0.1 localhost
127.0.0.1 ram

cloudera cdh manager download link

http://www.cloudera.com/content/cloudera/en/downloads/cloudera_manager/cm-5-4-3.html

steps

Pre-requisites: multiple, Internet-connected Linux machines, with SSH access, and significant free space in /var and /opt.

    $ wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
    $ chmod u+x cloudera-manager-installer.bin
    $ sudo ./cloudera-manager-installer.bin

----------------------------

error 1: failed to detect root privileges

fix:

su root

gedit /etc/sudoers
ram ALL =(ALL) NOPASSWD: ALL

error 2:
Installation failed.Failed to receive heartbeat from agent.
Ensure that the host's hostname is configured properly.
Ensure that port 7182 is accesible on the Cloudera Manager server (check firewall rules).
Ensure that ports 9000 an 9001 are free on the host being added.

fix:

/etc/hosts was looking like this when I had the problem
127.0.0.1 localhost
127.0.1.1 ubuntu

I changed it like this:
127.0.0.1 localhost
127.0.0.1 ubuntu

error 3:

Failed to perform First Run of services.
Command (22) has failed
Failed to execute command Deploy Client Configuration on service HDFS
Failed to execute command Deploy Client Configuration on service Solr
Failed to execute command Deploy Client Configuration on service HBase

fix to follow:
https://community.cloudera.com/t5/Cloudera-Manager-Installation/Error-during-service-installation/td-p/7542/page/3

error 4:
7180 port issue (connection refused)

Troubleshooting:

ps -ef | grep 7180
sudo netstat -anp | grep 7180
sudo curl localhost:7180
sudo service cloudera-scm-server restart

Fix:
ufw allow 7180

Uninstall Cloudera manager (with package and db configurations)
sudo /usr/share/cmf/uninstall-cloudera-manager.sh
rm -r /etc/cloudera-scm-server/
rm -r /etc/cloudera-scm-agent/

Learn Hadoop and NoSQL

Monday, August 31, 2015

Hive DDL and DML Commands Quick Reference

Saturday, August 29, 2015

How to Install Pig on Ubuntu 14.04

Ubuntu 14.04 post installation add-ins

Monday, August 3, 2015

Steps to Install Cloudera 5.3.2 on Ubuntu 14.04 single node cluster