Learn Hadoop and NoSQL: June 2015

Wednesday, June 17, 2015

Installing Neo4j 2.2.2 in Ubuntu 14.0.4

ram@ram-pc:~$ sudo apt-get update && sudo apt-get install python-software-properties - See more at: http://codiply.com/blog/standalone-neo4j-server-setup-on-ubuntu-14-04#sthash.hEBz5eJk.dpuf

[sudo] password for ram:
Ign http://extras.ubuntu.com trusty InRelease
Ign http://security.ubuntu.com trusty-security InRelease
Hit http://extras.ubuntu.com trusty Release.gpg
Ign http://in.archive.ubuntu.com trusty InRelease
Ign http://repo.mongodb.org trusty/mongodb-org/3.0 InRelease
Hit http://extras.ubuntu.com trusty Release
Hit http://repo.mongodb.org trusty/mongodb-org/3.0 Release.gpg
Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
Ign http://in.archive.ubuntu.com trusty-updates InRelease
Hit http://extras.ubuntu.com trusty/main Sources
Hit http://repo.mongodb.org trusty/mongodb-org/3.0 Release
Get:2 http://security.ubuntu.com trusty-security Release [63.5 kB]
Hit http://extras.ubuntu.com trusty/main amd64 Packages
Ign http://in.archive.ubuntu.com trusty-backports InRelease
Hit http://extras.ubuntu.com trusty/main i386 Packages
Hit http://in.archive.ubuntu.com trusty Release.gpg
Get:3 http://in.archive.ubuntu.com trusty-updates Release.gpg [933 B]
Hit http://in.archive.ubuntu.com trusty-backports Release.gpg
Hit http://in.archive.ubuntu.com trusty Release
Get:4 http://security.ubuntu.com trusty-security/main Sources [85.8 kB]
Get:5 http://in.archive.ubuntu.com trusty-updates Release [63.5 kB]
Ign http://extras.ubuntu.com trusty/main Translation-en_IN
Ign http://extras.ubuntu.com trusty/main Translation-en
Get:6 http://security.ubuntu.com trusty-security/restricted Sources [2,061 B]
Hit http://in.archive.ubuntu.com trusty-backports Release
Get:7 http://security.ubuntu.com trusty-security/universe Sources [25.7 kB]
Hit http://in.archive.ubuntu.com trusty/main Sources
Get:8 http://security.ubuntu.com trusty-security/multiverse Sources [2,333 B]
Hit http://in.archive.ubuntu.com trusty/restricted Sources
Get:9 http://security.ubuntu.com trusty-security/main amd64 Packages [299 kB]
Hit http://in.archive.ubuntu.com trusty/universe Sources
Hit http://repo.mongodb.org trusty/mongodb-org/3.0/multiverse amd64 Packages
Hit http://in.archive.ubuntu.com trusty/multiverse Sources
Hit http://repo.mongodb.org trusty/mongodb-org/3.0/multiverse i386 Packages
Hit http://in.archive.ubuntu.com trusty/main amd64 Packages
Hit http://in.archive.ubuntu.com trusty/restricted amd64 Packages
Hit http://in.archive.ubuntu.com trusty/universe amd64 Packages
Get:10 http://security.ubuntu.com trusty-security/restricted amd64 Packages [8,875 B]
Hit http://in.archive.ubuntu.com trusty/multiverse amd64 Packages
Get:11 http://security.ubuntu.com trusty-security/universe amd64 Packages [108 kB]
Hit http://in.archive.ubuntu.com trusty/main i386 Packages
Get:12 http://security.ubuntu.com trusty-security/multiverse amd64 Packages [3,686 B]
Hit http://in.archive.ubuntu.com trusty/restricted i386 Packages
Get:13 http://security.ubuntu.com trusty-security/main i386 Packages [285 kB]
Hit http://in.archive.ubuntu.com trusty/universe i386 Packages
Hit http://in.archive.ubuntu.com trusty/multiverse i386 Packages
Ign http://repo.mongodb.org trusty/mongodb-org/3.0/multiverse Translation-en_IN
Hit http://in.archive.ubuntu.com trusty/main Translation-en
Ign http://repo.mongodb.org trusty/mongodb-org/3.0/multiverse Translation-en
Get:14 http://security.ubuntu.com trusty-security/restricted i386 Packages [8,846 B]
Hit http://in.archive.ubuntu.com trusty/multiverse Translation-en
Get:15 http://security.ubuntu.com trusty-security/universe i386 Packages [108 kB]
Hit http://in.archive.ubuntu.com trusty/restricted Translation-en
Get:16 http://security.ubuntu.com trusty-security/multiverse i386 Packages [3,841 B]
Hit http://security.ubuntu.com trusty-security/main Translation-en
Hit http://in.archive.ubuntu.com trusty/universe Translation-en
Hit http://security.ubuntu.com trusty-security/multiverse Translation-en
Get:17 http://in.archive.ubuntu.com trusty-updates/main Sources [207 kB]
Hit http://security.ubuntu.com trusty-security/restricted Translation-en
Hit http://security.ubuntu.com trusty-security/universe Translation-en
Get:18 http://in.archive.ubuntu.com trusty-updates/restricted Sources [3,433 B]
Get:19 http://in.archive.ubuntu.com trusty-updates/universe Sources [121 kB]
Get:20 http://in.archive.ubuntu.com trusty-updates/multiverse Sources [5,143 B]
Get:21 http://in.archive.ubuntu.com trusty-updates/main amd64 Packages [541 kB]
Get:22 http://in.archive.ubuntu.com trusty-updates/restricted amd64 Packages [11.8 kB]
Get:23 http://in.archive.ubuntu.com trusty-updates/universe amd64 Packages [287 kB]
Get:24 http://in.archive.ubuntu.com trusty-updates/multiverse amd64 Packages [12.0 kB]
Get:25 http://in.archive.ubuntu.com trusty-updates/main i386 Packages [528 kB]
Ign http://ppa.launchpad.net trusty InRelease
Hit http://ppa.launchpad.net trusty Release.gpg
Hit http://ppa.launchpad.net trusty Release
Get:26 http://in.archive.ubuntu.com trusty-updates/restricted i386 Packages [11.8 kB]
Hit http://ppa.launchpad.net trusty/main amd64 Packages
Hit http://ppa.launchpad.net trusty/main i386 Packages
Hit http://ppa.launchpad.net trusty/main Translation-en
Get:27 http://in.archive.ubuntu.com trusty-updates/universe i386 Packages [288 kB]
Get:28 http://in.archive.ubuntu.com trusty-updates/multiverse i386 Packages [12.1 kB]
Hit http://in.archive.ubuntu.com trusty-updates/main Translation-en
Hit http://in.archive.ubuntu.com trusty-updates/multiverse Translation-en
Hit http://in.archive.ubuntu.com trusty-updates/restricted Translation-en
Get:29 http://in.archive.ubuntu.com trusty-updates/universe Translation-en [150 kB]
Hit http://in.archive.ubuntu.com trusty-backports/main Sources
Hit http://in.archive.ubuntu.com trusty-backports/restricted Sources
Hit http://in.archive.ubuntu.com trusty-backports/universe Sources
Hit http://in.archive.ubuntu.com trusty-backports/multiverse Sources
Hit http://in.archive.ubuntu.com trusty-backports/main amd64 Packages
Hit http://in.archive.ubuntu.com trusty-backports/restricted amd64 Packages
Hit http://in.archive.ubuntu.com trusty-backports/universe amd64 Packages
Hit http://in.archive.ubuntu.com trusty-backports/multiverse amd64 Packages
Hit http://in.archive.ubuntu.com trusty-backports/main i386 Packages
Hit http://in.archive.ubuntu.com trusty-backports/restricted i386 Packages
Hit http://in.archive.ubuntu.com trusty-backports/universe i386 Packages
Hit http://in.archive.ubuntu.com trusty-backports/multiverse i386 Packages
Hit http://in.archive.ubuntu.com trusty-backports/main Translation-en
Hit http://in.archive.ubuntu.com trusty-backports/multiverse Translation-en
Hit http://in.archive.ubuntu.com trusty-backports/restricted Translation-en
Hit http://in.archive.ubuntu.com trusty-backports/universe Translation-en
Ign http://in.archive.ubuntu.com trusty/main Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/multiverse Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/restricted Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/universe Translation-en_IN
Fetched 3,248 kB in 27s (117 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package See
E: Unable to locate package more
E: Unable to locate package http
E: Couldn't find any package by regex 'http://codiply.com/blog'
ram@ram-pc:~$

ram@ram-pc:~$ wget -O - http://debian.neo4j.org/neotechnology.gpg.key >> key.pgp - See more at: http://codiply.com/blog/standalone-neo4j-server-setup-on-ubuntu-14-04#sthash.hEBz5eJk.dpuf
--2015-06-17 22:31:49-- http://debian.neo4j.org/neotechnology.gpg.key
Resolving debian.neo4j.org (debian.neo4j.org)... 52.0.233.188
Connecting to debian.neo4j.org (debian.neo4j.org)|52.0.233.188|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1679 (1.6K) [application/octet-stream]
Saving to: ‘STDOUT’

100%[======================================================================================================>] 1,679       --.-K/s   in 0s

2015-06-17 22:31:49 (135 MB/s) - written to stdout [1679/1679]

--2015-06-17 22:31:49-- http://-/
Resolving - (-)... failed: Name or service not known.
wget: unable to resolve host address ‘-’
--2015-06-17 22:31:59-- http://see/
Resolving see (see)... failed: Name or service not known.
wget: unable to resolve host address ‘see’
--2015-06-17 22:32:09-- http://more/
Resolving more (more)... failed: Name or service not known.
wget: unable to resolve host address ‘more’
--2015-06-17 22:32:19-- ftp://at/
           => ‘.listing’
Resolving at (at)... failed: Name or service not known.
wget: unable to resolve host address ‘at’
--2015-06-17 22:32:29-- http://codiply.com/blog/standalone-neo4j-server-setup-on-ubuntu-14-04
Resolving codiply.com (codiply.com)... 80.90.202.26
Connecting to codiply.com (codiply.com)|80.90.202.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17730 (17K) [text/html]
Saving to: ‘STDOUT’

100%[======================================================================================================>] 17,730      55.0KB/s   in 0.3s

2015-06-17 22:32:45 (55.0 KB/s) - written to stdout [17730/17730]

FINISHED --2015-06-17 22:32:45--
Total wall clock time: 56s
Downloaded: 2 files, 19K in 0.3s (60.2 KB/s)
ram@ram-pc:~$

ram@ram-pc:~$ sudo apt-key add key.pgp
OK

ram@ram-pc:~$ sudo apt-get update && sudo apt-get install neo4j
Ign http://extras.ubuntu.com trusty InRelease
Ign http://in.archive.ubuntu.com trusty InRelease
Ign http://repo.mongodb.org trusty/mongodb-org/3.0 InRelease
Hit http://extras.ubuntu.com trusty Release.gpg
Ign http://in.archive.ubuntu.com trusty/universe Translation-en_IN
Fetched 2,374 kB in 17s (132 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
daemon

The following NEW packages will be installed:
daemon neo4j0 upgraded, 2 newly installed, 0 to remove and 304 not upgraded.
Need to get 53.6 MB of archives.
After this operation, 62.8 MB of additional disk space will be used.Do you want to continue? [Y/n] y
Get:1 http://debian.neo4j.org/repo/ stable/ neo4j 2.2.2 [53.5 MB]
Get:2 http://in.archive.ubuntu.com/ubuntu/ trusty/universe daemon amd64 0.6.4-1 [98.2 kB]
Fetched 53.6 MB in 1min 15s (709 kB/s)
Selecting previously unselected package daemon.
(Reading database ... 169071 files and directories currently installed.)
Preparing to unpack .../daemon_0.6.4-1_amd64.deb ...
Unpacking daemon (0.6.4-1) ...
Selecting previously unselected package neo4j.
Preparing to unpack .../archives/neo4j_2.2.2_all.deb ...
Unpacking neo4j (2.2.2) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up daemon (0.6.4-1) ...
Setting up neo4j (2.2.2) ...
Adding system user `neo4j' (UID 119) ...
Adding new user `neo4j' (UID 119) with group `nogroup' ...
Not creating home directory `/var/lib/neo4j'.
Adding system startup for /etc/init.d/neo4j-service ...
   /etc/rc0.d/K20neo4j-service -> ../init.d/neo4j-service
   /etc/rc1.d/K20neo4j-service -> ../init.d/neo4j-service
   /etc/rc6.d/K20neo4j-service -> ../init.d/neo4j-service
   /etc/rc2.d/S20neo4j-service -> ../init.d/neo4j-service
   /etc/rc3.d/S20neo4j-service -> ../init.d/neo4j-service
   /etc/rc4.d/S20neo4j-service -> ../init.d/neo4j-service
   /etc/rc5.d/S20neo4j-service -> ../init.d/neo4j-service
WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server...WARNING: not changing user
process [6662]... waiting for server to be ready........... OK.
http://localhost:7474/ is ready.
Processing triggers for ureadahead (0.100.0-16) ...
ram@ram-pc:~$

ram@ram-pc:~$ sudo service neo4j-service restart
^[[A * Restarting Neo4j Graph Database neo4j                                                                                                    WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual.
Starting Neo4j Server...WARNING: not changing user
process [6914]... waiting for server to be ready...... OK.
http://localhost:7474/ is ready.

ram@ram-pc:~$ service neo4j-service status
* neo4j is running

ram@ram-pc:~$ neo4j-shell
Welcome to the Neo4j Shell! Enter 'help' for a list of commands
NOTE: Remote Neo4j graph database service 'shell' at port 1337

neo4j-sh (?)$ help
Available commands: alias begin cd commit create cypher dbinfo drop dump env explain export gsh help index jsh load ls man match merge mknode mkrel mv optional paths planner profile pwd return rm rmnode rmrel rollback schema set start trav unwind using with
Use man <command> for info about each command.
neo4j-sh (?)$

Reference:
http://codiply.com/blog/standalone-neo4j-server-setup-on-ubuntu-14-04

Redis 3.0.2 shell commands in Ubuntu 14.0.4

ram@ram-pc:~$ redis-cli INFO
# Server
redis_version:3.0.2
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:9a95cedf214a3630
redis_mode:standalone
os:Linux 3.16.0-30-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:5171
run_id:37b0cd8c28fc3303789eac3e2b08b62866a25804
tcp_port:6379
uptime_in_seconds:623
uptime_in_days:0
hz:10
lru_clock:8494215
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:508784
used_memory_human:496.86K
used_memory_rss:7081984
used_memory_peak:508784
used_memory_peak_human:496.86K
used_memory_lua:36864
mem_fragmentation_ratio:13.92
mem_allocator:jemalloc-3.6.0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1434556952
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:11
total_commands_processed:13
instantaneous_ops_per_sec:0
total_net_input_bytes:330
total_net_output_bytes:15769
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.41
used_cpu_user:0.17
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Cluster
cluster_enabled:0

# Keyspace
ram@ram-pc:~$

ram@ram-pc:~$ redis-cli

127.0.0.1:6379> help
redis-cli 3.0.2
Type: "help @<group>" to get a list of commands in <group>
      "help <command>" for help on <command>
      "help <tab>" to get a list of possible help topics
      "quit" to exit

127.0.0.1:6379> CONFIG GET *
1) "dbfilename"
2) "dump.rdb"
3) "requirepass"
4) ""
5) "masterauth"
6) ""
7) "unixsocket"
8) ""
9) "logfile"
..
..
127) "notify-keyspace-events"
128) ""
129) "bind"
130) "127.0.0.1"

Single Key Value Pair

127.0.0.1:6379[1]> set emp_name "ramkumar"
OK
127.0.0.1:6379[1]> set emp_name "Karthik"
OK
127.0.0.1:6379[1]> get emp_name
"Karthik"
127.0.0.1:6379[1]>

Multiple Key Value Pair

127.0.0.1:6379[1]> mset emp_no "1001" emp_name "Ramkumar" emp_deptid "dept01" location "Chennai"
OK
127.0.0.1:6379[1]> mget emp_no emp_name emp_deptid
1) "1001"
2) "Ramkumar"
3) "dept01"
127.0.0.1:6379[1]>

Some Numeric Operations

127.0.0.1:6379[1]> set incr 1
OK
127.0.0.1:6379[1]> set counter 1
OK
127.0.0.1:6379[1]> incr counter
(integer) 2
127.0.0.1:6379[1]> incr counter
(integer) 3
127.0.0.1:6379[1]> get counter

Incrimenting string will result errors
127.0.0.1:6379[1]> set strcounter "a"
OK
127.0.0.1:6379[1]> incr strcounter
(error) ERR value is not an integer or out of range
127.0.0.1:6379[1]> set strcounter a
OK
127.0.0.1:6379[1]> incr strcounter
(error) ERR value is not an integer or out of range
127.0.0.1:6379[1]> set strcounter 1a
OK
127.0.0.1:6379[1]> incr strcounter
(error) ERR value is not an integer or out of range
127.0.0.1:6379[1]>

Batch operation

127.0.0.1:6379[1]> MULTI
OK
127.0.0.1:6379[1]> set name "Karthik"QUEUED
127.0.0.1:6379[1]> set salary 100
QUEUED
127.0.0.1:6379[1]> incr salary
QUEUED
127.0.0.1:6379[1]> exec
1) OK
2) OK
3) (integer) 101
127.0.0.1:6379[1]>

127.0.0.1:6379[1]> mset emp:personal:name "Ram" emp:personal:native "Chennai" emp:personal:education "MSC"
OK
127.0.0.1:6379[1]> mget emp:personal
1) (nil)
127.0.0.1:6379[1]> mget emp:personal:name emp:personal:education
1) "Ram"
2) "MSC"
127.0.0.1:6379[1]>

or

127.0.0.1:6379[1]> hmset emp:personal name "Karthik" native "hyd" education "MCA"
OK
127.0.0.1:6379[1]> hvals emp:personal
1) "Karthik"
2) "hyd"
3) "MCA"
127.0.0.1:6379[1]>

127.0.0.1:6379[1]> hkeys emp:personal
1) "name"
2) "native"
3) "education"
127.0.0.1:6379[1]>

LISTs - multiple ordered sets (can act as queues or stacks)

127.0.0.1:6379[1]> rpush emp:info 1001 ram dept51 gritt chennai 27000
(integer) 6

0 - starting point
-1 - ending point (-1 indicates end)

127.0.0.1:6379[1]> lrange emp:info 0 -1
1) "1001"
2) "ram"
3) "dept51"
4) "gritt"
5) "chennai"
6) "27000"
127.0.0.1:6379[1]> lrange emp:info 2 2
1) "dept51"
127.0.0.1:6379[1]> lrange emp:info 2 3
1) "dept51"
2) "gritt"
127.0.0.1:6379[1]>

127.0.0.1:6379[1]> lrem emp:info 1 "gritt"
(integer) 1

127.0.0.1:6379[1]> lrange emp:info 0 -1
1) "1001"
2) "ram"
3) "dept51"
4) "chennai"
5) "27000"

127.0.0.1:6379[1]> lpop emp:info
"1001"
127.0.0.1:6379[1]> lpop emp:info
"ram"
127.0.0.1:6379[1]> lpop emp:info
"dept51"
127.0.0.1:6379[1]> lrange emp:info 0 -1
1) "chennai"
2) "27000"
127.0.0.1:6379[1]>

move keys between databases

127.0.0.1:6379[1]> select 2
OK
127.0.0.1:6379[2]> set name "ram"
OK
127.0.0.1:6379[2]> select 3
OK
127.0.0.1:6379[3]> get name
(nil)
127.0.0.1:6379[3]> select 2
OK
127.0.0.1:6379[2]> get name
"ram"
127.0.0.1:6379[2]> move name 3
(integer) 1
127.0.0.1:6379[2]> select 3
OK
127.0.0.1:6379[3]> get name
"ram"

reference:
Seven databases in seven weeks by by Eric Redmond and Jim R. Wilson (Author)

Installing Redis on Ubuntu 14.04

ram@ram-pc:~$ sudo add-apt-repository ppa:chris-lea/redis-server
[sudo] password for ram:
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
More info: https://launchpad.net/~chris-lea/+archive/ubuntu/redis-server
Press [ENTER] to continue or ctrl-c to cancel adding it
gpg: keyring `/tmp/tmpydr5ohrk/secring.gpg' created
gpg: keyring `/tmp/tmpydr5ohrk/pubring.gpg' created
gpg: requesting key C7917B12 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpydr5ohrk/trustdb.gpg: trustdb created
gpg: key C7917B12: public key "Launchpad chrislea" imported
gpg: Total number processed: 1
gpg:               imported: 1 (RSA: 1)
OK

ram@ram-pc:~$ sudo apt-get update
Ign http://repo.mongodb.org trusty/mongodb-org/3.0 InRelease
Ign http://extras.ubuntu.com trusty InRelease
Hit http://extras.ubuntu.com trusty Release.gpg
Ign http://security.ubuntu.com trusty-security InRelease
Ign http://in.archive.ubuntu.com trusty InRelease
...
...
Ign http://in.archive.ubuntu.com trusty/restricted Translation-en_IN
Ign http://in.archive.ubuntu.com trusty/universe Translation-en_IN
Fetched 3,115 kB in 26s (116 kB/s)
Reading package lists... Done

ram@ram-pc:~$ sudo apt-get install redis-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
libjemalloc1 redis-tools
The following NEW packages will be installed:
libjemalloc1 redis-server redis-tools
0 upgraded, 3 newly installed, 0 to remove and 297 not upgraded.
Need to get 485 kB of archives.
After this operation, 1,426 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu/ trusty/main libjemalloc1 amd64 3.6.0-1chl1~trusty1 [77.2 kB]
Get:2 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu/ trusty/main redis-tools amd64 3:3.0.2-1chl1~trusty1 [79.7 kB]
Get:3 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu/ trusty/main redis-server amd64 3:3.0.2-1chl1~trusty1 [329 kB]
Fetched 485 kB in 16s (28.6 kB/s)
Selecting previously unselected package libjemalloc1.
(Reading database ... 169044 files and directories currently installed.)
Preparing to unpack .../libjemalloc1_3.6.0-1chl1~trusty1_amd64.deb ...
Unpacking libjemalloc1 (3.6.0-1chl1~trusty1) ...
Selecting previously unselected package redis-tools.
Preparing to unpack .../redis-tools_3%3a3.0.2-1chl1~trusty1_amd64.deb ...
Unpacking redis-tools (3:3.0.2-1chl1~trusty1) ...
Selecting previously unselected package redis-server.
Preparing to unpack .../redis-server_3%3a3.0.2-1chl1~trusty1_amd64.deb ...
Unpacking redis-server (3:3.0.2-1chl1~trusty1) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for ureadahead (0.100.0-16) ...
ureadahead will be reprofiled on next reboot
Setting up libjemalloc1 (3.6.0-1chl1~trusty1) ...
Setting up redis-tools (3:3.0.2-1chl1~trusty1) ...
Setting up redis-server (3:3.0.2-1chl1~trusty1) ...
Starting redis-server: redis-server.
Processing triggers for libc-bin (2.19-0ubuntu6.5) ...
Processing triggers for ureadahead (0.100.0-16) ...
ram@ram-pc:~$

ram@ram-pc:~$ redis-cli ping
PONG

ram@ram-pc:~$ redis-cli
127.0.0.1:6379>

127.0.0.1:6379> shutdown
not connected> exit

ram@ram-pc:~$ sudo service redis-server restart
Stopping redis-server: redis-server.
Starting redis-server: redis-server.
ram@ram-pc:~$

ram@ram-pc:~$ redis-cli INFO
# Server
redis_version:3.0.2
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:9a95cedf214a3630
redis_mode:standalone
os:Linux 3.16.0-30-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:5171
run_id:37b0cd8c28fc3303789eac3e2b08b62866a25804
tcp_port:6379
uptime_in_seconds:264
uptime_in_days:0
hz:10
lru_clock:8493856
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

reference:
http://linuxg.net/how-to-install-redis-server-2-8-17-on-ubuntu-14-04-ubuntu-12-04-and-derivative-systems/

Tuesday, June 16, 2015

Mongodb shell commands

Open mongodb shell
ram@ram-pc:~$ mongo
MongoDB shell version: 3.0.4
connecting to: test
Server has startup warnings:
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten]
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten] **        We suggest setting it to 'never'
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten]
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten] **        We suggest setting it to 'never'
2015-06-17T09:48:50.525+0530 I CONTROL [initandlisten]

> db.help()
DB methods:
    db.adminCommand(nameOrDocument) - switches to 'admin' db, and runs command [ just calls db.runCommand(...) ]
    db.auth(username, password)
    db.cloneDatabase(fromhost)
    db.commandHelp(name) returns the help for the command
    db.copyDatabase(fromdb, todb, fromhost)
    db.createCollection(name, { size : ..., capped : ..., max : ... } )
    db.createUser(userDocument)
    db.currentOp() displays currently executing operations in the db
    db.dropDatabase()
    db.eval() - deprecated
    db.fsyncLock() flush data to disk and lock server for backups
    db.fsyncUnlock() unlocks server following a db.fsyncLock()
    db.getCollection(cname) same as db['cname'] or db.cname
    db.getCollectionInfos()
    db.getCollectionNames()
    db.getLastError() - just returns the err msg string
    db.getLastErrorObj() - return full status object
    db.getLogComponents()
    db.getMongo() get the server connection object
    db.getMongo().setSlaveOk() allow queries on a replication slave server
    db.getName()
    db.getPrevError()
    db.getProfilingLevel() - deprecated
    db.getProfilingStatus() - returns if profiling is on and slow threshold
    db.getReplicationInfo()
    db.getSiblingDB(name) get the db at the same server as this one
    db.getWriteConcern() - returns the write concern used for any operations on this db, inherited from server object if set
    db.hostInfo() get details about the server's host
    db.isMaster() check replica primary status
    db.killOp(opid) kills the current operation in the db
    db.listCommands() lists all the db commands
    db.loadServerScripts() loads all the scripts in db.system.js
    db.logout()
    db.printCollectionStats()
    db.printReplicationInfo()
    db.printShardingStatus()
    db.printSlaveReplicationInfo()
    db.dropUser(username)
    db.repairDatabase()
    db.resetError()
    db.runCommand(cmdObj) run a database command. if cmdObj is a string, turns it into { cmdObj : 1 }
    db.serverStatus()
    db.setLogLevel(level,<component>)
    db.setProfilingLevel(level,<slowms>) 0=off 1=slow 2=all
    db.setWriteConcern( <write concern doc> ) - sets the write concern for writes to the db
    db.unsetWriteConcern( <write concern doc> ) - unsets the write concern for writes to the db
    db.setVerboseShell(flag) display extra information in shell output
    db.shutdownServer()
    db.stats()
    db.version() current version of the server
>

> db.stats()
{
    "db" : "test",
    "collections" : 0,
    "objects" : 0,
    "avgObjSize" : 0,
    "dataSize" : 0,
    "storageSize" : 0,
    "numExtents" : 0,
    "indexes" : 0,
    "indexSize" : 0,
    "fileSize" : 0,
    "ok" : 1
}
>
> show dbs
local 0.078GB

> use abccompany
switched to db abccompany

> db.getName()
abccompany

> show dbs
local 0.078GB

> db.stats()
{
    "db" : "abccompany",
    "collections" : 0,
    "objects" : 0,
    "avgObjSize" : 0,
    "dataSize" : 0,
    "storageSize" : 0,
    "numExtents" : 0,
    "indexes" : 0,
    "indexSize" : 0,
    "fileSize" : 0,
    "ok" : 1
}
>
> db.employees.insert({empId:24123,name:'Ramkumar',gender:'M',dept:'GMOT',location:'chennai',salaray:23500})
WriteResult({ "nInserted" : 1 })

> db.employees.insert({empId:24121,name:'Nagaraj',gender:'M',dept:'GRITT',location:'Mumbai',salaray:73500})
WriteResult({ "nInserted" : 1 })

> db.employees.insert({empId:24125,name:'Sandhya',gender:'F',dept:'GMOT',location:'NJ',salaray:1273500})
WriteResult({ "nInserted" : 1 })

> db.employees.insert({
... empId:24131,
... name:'Prakash',
... gender:'M',
... dateofbirth:new Date(1980,2,12,10,12),
... favmovies:['OKKanmani','JurasicWorld'],
... location:'NJ',
... salaray:33500})
WriteResult({ "nInserted" : 1 })

> db.employees.find({gender:'F'})

{ "_id" : ObjectId("5580fb46bec5eb9369552319"), "empId" : 24125, "name" : "Sandhya", "gender" : "F", "dept" : "GMOT", "location" : "NJ", "salaray" : 1273500 }
>

> db.employees.find({gender:'M', salaray:{$gt:50000}})

{ "_id" : ObjectId("5580fb03bec5eb9369552318"), "empId" : 24121, "name" : "Nagaraj", "gender" : "M", "dept" : "GRITT", "location" : "Mumbai", "salaray" : 73500 }

> db.employees.find( {gender:'M',     $or: [ {salaray:{$gt:50000}},   {dept:'GMOT'}]})

{ "_id" : ObjectId("5580fabebec5eb9369552317"), "empId" : 24123, "name" : "Ramkumar", "gender" : "M", "dept" : "GMOT", "location" : "chennai", "salaray" : 23500 }
{ "_id" : ObjectId("5580fb03bec5eb9369552318"), "empId" : 24121, "name" : "Nagaraj", "gender" : "M", "dept" : "GRITT", "location" : "Mumbai", "salaray" : 73500 }
>

> db.employees.find({"_id" : ObjectId("5580fb03bec5eb9369552318")})
{ "_id" : ObjectId("5580fb03bec5eb9369552318"), "empId" : 24121, "name" : "Nagaraj", "gender" : "M", "dept" : "GRITT", "location" : "Mumbai", "salaray" : 73500 }
>

> db.employees.update({name:"Nagaraj"}, {location: "NY"})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
>

> db.employees.find()
{ "_id" : ObjectId("5580fabebec5eb9369552317"), "empId" : 24123, "name" : "Ramkumar", "gender" : "M", "dept" : "GMOT", "location" : "chennai", "salaray" : 23500 }
{ "_id" : ObjectId("5580fb03bec5eb9369552318"), "location" : "NY" }
{ "_id" : ObjectId("5580fb46bec5eb9369552319"), "empId" : 24125, "name" : "Sandhya", "gender" : "F", "dept" : "GMOT", "location" : "NJ", "salaray" : 1273500 }
{ "_id" : ObjectId("5580fc1dbec5eb936955231a"), "empId" : 24131, "name" : "Prakash", "gender" : "M", "dateofbirth" : ISODate("1980-03-12T04:42:00Z"), "favmovies" : [ "OKKanmani", "JurasicWorld" ], "location" : "NJ", "salaray" : 33500 }
>

> db.employees.update({name: 'Ramkumar'}, {$set: {location:'London'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

> db.employees.findOne()
{
    "_id" : ObjectId("5580fabebec5eb9369552317"),
    "empId" : 24123,
    "name" : "Ramkumar",
    "gender" : "M",
    "dept" : "GMOT",
    "location" : "London",
    "salaray" : 23500
}
>

> db.employees.remove({name: 'Ramkumar'})
WriteResult({ "nRemoved" : 1 })

> db.employees.remove({})
WriteResult({ "nRemoved" : 3 })

> db.employees.find()

> db.employees.drop()

Steps To Install MongoDb on Ubuntu 14.04

1.Import the public key used by the package management system. this is
to ensure package consistency and authenticity by requiring that distributors sign packages with GPG keys

ram@ram-pc:~$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

[sudo] password for ram:
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --homedir /tmp/tmp.T0369j2BIX --no-auto-check-trustdb --trust-model always --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
gpg: requesting key 7F0CEB10 from hkp server keyserver.ubuntu.com
gpg: key 7F0CEB10: public key "Richard Kreuter <richard@10gen.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1 (RSA: 1)
ram@ram-pc:~$

2. Create a list file for MongoDB.

ram@ram-pc:~$ echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list
deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.0 multiverse

3. Reload local package database

ram@ram-pc:~$ sudo apt-get update

4. Install the MongoDB packages.

ram@ram-pc:~$ sudo apt-get install -y mongodb-org
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
mongodb-org-mongos mongodb-org-server mongodb-org-shell mongodb-org-tools
The following NEW packages will be installed:
mongodb-org mongodb-org-mongos mongodb-org-server mongodb-org-shell
mongodb-org-tools
0 upgraded, 5 newly installed, 0 to remove and 290 not upgraded.
Need to get 50.7 MB of archives.
After this operation, 157 MB of additional disk space will be used.
Get:1 http://repo.mongodb.org/apt/ubuntu/ trusty/mongodb-org/3.0/multiverse mongodb-org-shell amd64 3.0.4 [4,245 kB]
Get:2 http://repo.mongodb.org/apt/ubuntu/ trusty/mongodb-org/3.0/multiverse mongodb-org-server amd64 3.0.4 [8,607 kB]
Get:3 http://repo.mongodb.org/apt/ubuntu/ trusty/mongodb-org/3.0/multiverse mongodb-org-mongos amd64 3.0.4 [4,029 kB]
Get:4 http://repo.mongodb.org/apt/ubuntu/ trusty/mongodb-org/3.0/multiverse mongodb-org-tools amd64 3.0.4 [33.8 MB]
Get:5 http://repo.mongodb.org/apt/ubuntu/ trusty/mongodb-org/3.0/multiverse mongodb-org amd64 3.0.4 [3,616 B]
Fetched 50.7 MB in 37s (1,355 kB/s)
Selecting previously unselected package mongodb-org-shell.
(Reading database ... 168992 files and directories currently installed.)
Preparing to unpack .../mongodb-org-shell_3.0.4_amd64.deb ...
Unpacking mongodb-org-shell (3.0.4) ...
Selecting previously unselected package mongodb-org-server.
Preparing to unpack .../mongodb-org-server_3.0.4_amd64.deb ...
Unpacking mongodb-org-server (3.0.4) ...
Selecting previously unselected package mongodb-org-mongos.
Preparing to unpack .../mongodb-org-mongos_3.0.4_amd64.deb ...
Unpacking mongodb-org-mongos (3.0.4) ...
Selecting previously unselected package mongodb-org-tools.
Preparing to unpack .../mongodb-org-tools_3.0.4_amd64.deb ...
Unpacking mongodb-org-tools (3.0.4) ...
Selecting previously unselected package mongodb-org.
Preparing to unpack .../mongodb-org_3.0.4_amd64.deb ...
Unpacking mongodb-org (3.0.4) ...
Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
Processing triggers for ureadahead (0.100.0-16) ...
ureadahead will be reprofiled on next reboot
Setting up mongodb-org-shell (3.0.4) ...
Setting up mongodb-org-server (3.0.4) ...
Adding system user `mongodb' (UID 117) ...
Adding new user `mongodb' (UID 117) with group `nogroup' ...
Not creating home directory `/home/mongodb'.
Adding group `mongodb' (GID 125) ...
Done.
Adding user `mongodb' to group `mongodb' ...
Adding user mongodb to group mongodb
Done.
mongod start/running, process 3003
Setting up mongodb-org-mongos (3.0.4) ...
Setting up mongodb-org-tools (3.0.4) ...
Processing triggers for ureadahead (0.100.0-16) ...
Setting up mongodb-org (3.0.4) ...
5. Install the latest stable version of MongoDB.
ram@ram-pc:~$ sudo apt-get install -y mongodb-org=3.0.4 mongodb-org-server=3.0.4 mongodb-org-shell=3.0.4 mongodb-org-mongos=3.0.4 mongodb-org-tools=3.0.4

Reading package lists... Done
Building dependency tree
Reading state information... Done
mongodb-org is already the newest version.
mongodb-org-mongos is already the newest version.
mongodb-org-mongos set to manually installed.
mongodb-org-server is already the newest version.
mongodb-org-server set to manually installed.
mongodb-org-shell is already the newest version.
mongodb-org-shell set to manually installed.
mongodb-org-tools is already the newest version.
mongodb-org-tools set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 290 not upgraded.
6. Start mongodb service

ram@ram-pc:~$ sudo service mongod start
start: Job is already running: mongod

6. Connect mongo shell

ram@ram-pc:~$ mongo
MongoDB shell version: 3.0.4
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
    http://docs.mongodb.org/
Questions? Try the support group
    http://groups.google.com/group/mongodb-user
Server has startup warnings:
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten]
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten] **        We suggest setting it to 'never'
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten]
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten] **        We suggest setting it to 'never'
2015-06-16T22:08:48.583+0530 I CONTROL [initandlisten]
>

Source/Reference:
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
http://docs.mongodb.org/getting-started/shell/client/

Monday, June 15, 2015

Hbase and Cassandra - Similarities and Differences

HBase vs Cassandra

Note: Entire content of this blog post is copied from below two sources.
please refer the sources for more details.

Source: http://bigdatanoob.blogspot.in/2012/11/hbase-vs-cassandra.html

Point

HBase

Cassandra

CAP Theorem Focus	Consistency, Availability	Availability, Partition-Tolerance
Consistency	Strong	Eventual (Strong is Optional)
Single Write Master	Yes	No (R+W+1 to get Strong Consistency)
Optimized For	Reads	Writes
Main Data Structure	CF, RowKey, Name Value Pair Set	CF, RowKey, Name Value Pair Set
Dynamic Columns	Yes	Yes
Column Names as Data	Yes	Yes
Static Columns	No	Yes
RowKey Slices	Yes	No
Static Column Value Indexes	No	Yes
Sorted Column Names	Yes	Yes
Cell Versioning Support	Yes	No

Bloom Filters	Yes	Yes(only on Key)
CoProcessors	Yes	No
Triggers	Yes(Part of Coprocessor)	No
Push Down Predicates	Yes(Part of Coprocessor)	No
Atomic Compare and Set	Yes	No
Explicit Row Locks	Yes	No
Row Key Caching	Yes	Yes
Partitioning Strategy	Ordered Partitioning	Random Partitioning recommended
Rebalancing	Automatic	Not Needed with Random Partitioning
Availability	N-Replicas across Nodes	N-Replicas across Nodes
Data Node Failure	Graceful Degredation	Graceful Degredation
Data Node Failure - Replication	N-Replicas Preserved	(N-1) Replicas Preserved + Hinted Handoff
Data Node Restoration	Same as Node Addition	Requires Node Repair Admin-action
Data Node Addition	Rebalancing Automatic	Rebalancing Requires Token-Assignment Adjustment
Data Node Management	Simple (Roll In, Role Out)	Human Admin Action Required
Cluster Admin Nodes	Zookeeper, NameNode, HMaster	All Nodes are Equal
SPOF	Now, all the Admin Nodes are Fault Tolerant	All Nodes are Equal
Write.ANY	No, but Replicas are Node Agnostic	Yes (Writes Never Fail if this option is used)
Write.ONE	Standard, HA, Strong Consistency	Yes (often used), HA, Weak Consistency
Write.QUORUM	No (not required)	Yes (often used with Read.QUORUM for Strong Consistency
Write.ALL	Yes (performance penalty)	Yes (performance penalty, not HA)
Asynchronous WAN Replication	Yes, but it needs testing on corner cases.	Yes (Replica's can span data centers)
Synchronous WAN Replication	No	Yes with Write.QUORUM or Write.EACH-QUORUM
Compression Support	Yes	Yes

Point

HBase

Cassandra

Foundations

HBase is based on BigTable (Google)

Cassandra is based on DynamoDB (Amazon). Initially developed at Facebook by former Amazon engineers. This is one reason why Cassandra supports multi data center. Rackspace is a big contributor to Cassandra due to multi data center support.

Infrastructure

HBase uses the Hadoop Infrastructure (Zookeeper, NameNode, HDFS). Organizations that will deploy Hadoop anyway may be comfortable with leveraging Hadoop knowledge by using HBase

Cassandra started and evolved separate from Hadoop and its infrastructure and Operational knowledge requirements are different than Hadoop. However, for analytics, many Cassandra deployments use Cassandra + Storm (which uses Zookeeper), and/or Cassandra + Hadoop.

Infrastructure Simplicity and SPOF

The HBase-Hadoop Infrastructure has several "moving parts" consisting of Zookeeper, Name Node, Hbase Master, and Data Nodes, Zookeeper is clustered and naturally fault tolerant. Name Node needs to be clustered to be fault tolerant.

Cassandra uses a a single Node-type. All nodes are equal and perform all functions. Any Node can act as a coordinator, ensuring no SPOF. Adding Storm or Hadoop, of course, adds complexity to the infrastructure.

Read Intensive Use Cases

HBase is optimized for reads, supported by single-write master, and resulting strict consistency model, as well as use of Ordered Partitioning which supports row-scans. HBase is well suited for doing Range based scans.

Cassandra has excellent single-row read performance as long as eventual consistency semantics are sufficient for the use-case. Cassandra quorum reads, which are required for strict consistency will naturally be slower than Hbase reads. Cassandra does not support Range based row-scans which may be limiting in certain use-cases. Cassandra is well suited for supporting single-row queries, or selecting multiple rows based on a Column-Value index.

Multi-Data Center Support and Disaster Recovery

HBase provides for asynchronous replication of an HBase Cluster across a WAN. HBase clusters cannot be set up to achieve zero RPO, but in steady-state HBase should be roughly failover-equivalent to any other DBMS that relies on asynchronous replication over a WAN. Fall-back processes and procedures (e.g. after failover) are TBD.

Cassandra Random Partitioning provides for row-replication of a single row across a WAN, either asynchronous (write.ONE, write.LOCAL_QUORUM), or synchronous (write.QUORUM, write.ALL). Cassandra clusters can therefore be set up to achieve zero RPO, but each write will require at least one wan-ACK back to the coordinator to achieve this capability.

Write.ONE Durability

Writes are replicated in a pipeline fashion: the first-data-node for the region persists the write, and then sends the write to the next Natural Endpoint, and so-on in a pipeline fashion. HBase’s commit log "acks" a write only after *all* of the nodes in the pipeline have written the data to their OS buffers. The first Region Server in the pipeline must also have persisted the write to its WAL.

Cassandra's coordinators will send parallel write-requests to all Natural Endpoints, The coordinator will "ack" the write after exactly one Natural Endpoint has "acked" the write, which means that node has also persisted the write to its WAL. The writes may or may not have committed to any other Natural Endpoint.

Ordered Partitioning

HBase only supports Ordered Partitoning. This means that Rows for a CF are stored in RowKey order in HFiles, where each Hfile contains a "block" or "shard" of all the rows in a CF. HFiles are distributed across all data-nodes in the Cluster

Cassandra officially supports Ordered Partitioning, but no production user of Cassandra uses Ordered Partitioning due to the "hot spots" it creates and the operational difficulties such hot-spots cause. Random Partitioning is the only recommended Cassandra partitioning scheme, and rows are distributed across all nodes in the cluster.

RowKey Range Scans

Because of ordered partitioning, HBase queries can be formulated with partial start and end row-keys, and can locate rows inclusive-of, or exclusive of these partial-rowkeys. The start and end row-keys in a range-scan need not even exist in Hbase.

Because of random partitioning, partial rowkeys cannot be used with Cassandra. RowKeys must be known exactly. Counting rows in a CF is complicated. It is highly recommended that for these types of use-cases, data should be stored in columns in Cassandra, not in rows.

Linear Scalability for large tables and range scans

Due to Ordered Partitioning, HBase will easily scale horizontally while still supporting rowkey range scans.

If data is stored in columns in Cassandra to support range scans, the practical limitation of a row size in Cassandra is 10's of Megabytes. Rows larger than that causes problems with compaction overhead and time.

Atomic Compare and Set

HBase supports Atomic Compare and Set. HBase supports supports transaction within a Row.

Cassandra does not support Atomic Compare and Set. Counters require dedicated counter column-families which because of eventual-consistency requires that all replicas in all natural end-points be read and updated with ACK. However, hinted-handoff mechanisms can make even these built-in counters suspect for accuracy. FIFO queues are difficult (if not impossible) to implement with Cassandra.

Read Load Balancing - single Row

Hbase does not support Read Load Balancing against a single row. A single row is served by exactly one region server at a time. Other replicas are used ony in case of a node failure. Scalability is primarily supported by Partitioning which statistically distributes reads of different rows across multiple data nodes.

Cassandra will support Read Load Balancing against a single row. However, this is primarily supported by Read.ONE, and eventual consistency must be taken into consideration. Scalability is primarily supported by Partitioning which distributes reads of different rows across multiple data nodes.

Bloom Filters

Bloom Filters can be used in HBase as another form of Indexing. They work on the basis of RowKey or RowKey+ColumnName to reduce the number of data-blocks that HBase has to read to satisfy a query. (Bloom Filters may exhibit false-positives (reading too much data), but never false negatives (reading not enough data).

Cassandra uses bloom filters for key lookup.

Triggers

Triggers are supported by the CoProcessor capability in HBase. They allow HBase to observe the get/put/delete events on a table (CF), and then execute the trigger-logic. Triggers are coded as java classes.

Cassandra does not support co-processor-like functionality (as far as we know)

Secondary Indexes

Hbase does not natively support secondary indexes, but one use-case of Triggers is that a trigger on a "put" can automatically keep a secondary index up-to-date, and therefore not put the burden on the application (client).

Cassandra supports secondary indexes on column families where the column name is known. (Not on dynamic columns).

Simple Aggregation

Hbase CoProcessors support out-of-the-box simple aggregations in HBase. SUM, MIN, MAX, AVG, STD. Other aggregations can be built by defining java-classes to perform the aggregation

Aggregations in Cassandra are not supported by the Cassandra nodes - client must provide aggregations. When the aggregation requirement spans multiple rows, Random Partitioning makes aggregations very difficult for the client. Recommendation is to use Storm or Hadoop for aggregations.

HIVE Integration

HIVE can access HBase tables directly (uses de-serialization under the hood that is aware of the HBase file format).

Work in Progress (https://issues.apache.org/jira/browse/CASSANDRA-4131)

PIG Integration

PIG has native support for writing into/reading from HBase.

Cassandra 0.7.4+

Source:http://www.javaworld.com/article/2140805/big-data/big-data-showdown-cassandra-vs-hbase.html

Similarities

- both Cassandra and HBase are open source projects managed under the Apache Software Foundation,
- both are available free under an Apache version 2 license
- Cassandra descends from both Bigtable and Amazon's Dynamo
- HBase describes itself as an "open source Bigtable implementation"

- Both Cassandra and HBase are NoSQL databases
- Generally, it means you cannot manipulate the database with SQL.
- However, Cassandra has implemented CQL (Cassandra Query Language), the syntax of which is obviously modeled after SQL.
- Both are designed to manage extremely large data sets (in billions).
- Anything less, and you're advised to stick with an RDBMS

- Both are distributed databases, not only in how data is stored, but also in how the data can be accessed.
- Clients can connect to any node in the cluster and access any data.

- Both claim near linear scalability. Need to manage twice the data? Then double the number of nodes in your cluster

- Both safeguard data loss from cluster node failure via replication
- If the primary node fails, its data can still be fetched from one of the replica nodes.

- Both are referred to as column-oriented databases
- unlike a relational database, no two rows in a column-oriented database need have the same columns.

- you can add columns to a row on the fly
- it's unlikely you'll hit the limit even if you add tens of thousands of columns.

- Both implement similar write paths that begin with logging write operations to a log file to ensure durability (WAL).
- The data is written next to a memory cache, then finally to disk via a large, sequential write (essentially a copy of the memory cache)
- The overall memory-and-disk data structure used by both Cassandra and HBase is more or less a log-structured merge tree.

- The disk component in Cassandra is the SSTable; in HBase it is the HFile.
- Both provide command-line shells implemented in JRuby. Both are written largely in Java

Differences:

1. Cassandra requires that you identify some nodes as seed nodes, which serve as concentration points for intercluster communication. Meanwhile, on HBase, you must press some nodes into serving as master nodes, whose job it is to monitor and coordinate the actions of region servers.
Thus, Cassandra guarantees high availability by allowing multiple seed nodes in a cluster, while HBase guarantees the same via standby master nodes -- one of which will become the new master should the current master fail.

2.
Cassandra uses the Gossip protocol for internode communications, and Gossip services are integrated with the Cassandra software.
HBase relies on Zookeeper -- an entirely separate distributed application -- to handle corresponding tasks

3. Cassandra lets you create additional, secondary indexes on column values. Hbase do not have secondary index option.

4. While the data manipulation commands of HBase are not as rich as CQL, HBase does have a "filter" capability that executes on the server side of a session and improves scanning (search) throughput.

5. HBase's reliance on Zookeeper -- a separate application -- introduces an additional point of failure (and the attendant difficulties troubleshooting the source of a problem) that Cassandra avoids.

6.