首先说一下HBase与传统的关系型数据库在逻辑层次上的不同:
1、HBase的表结构定义中是不需要定义列的,只需要定义列族(可以暂时把列族当成多个列的集合)。所以在建表的时候,只需要指定列族即可。在列族中新增列,是不需要任何事先声明的,直接使用就好了。
2、HBase中,行是通过key来定位的,扫描更是通过key来进行的。所以行的key值选择,就显得十分重要。表结构的定义及key值的选择,实际上决定了数据是否可以高效利用。
3、HBase中,行Key+列族名+列名,可以定义到唯一的一个Cell
其实,从这里大家可以看出:
1、HBase通过对行进行了一定的限制,实现了列的灵活操作,解决了列扩展的问题
2、我们实际应用中,往往将主表及多个关联表不计重复的一起记录到列中,通过对空间的浪费,来实现了时间的节省,解决了查询效率的问题。
换句话说,数据的增加速度,超出了硬件进步的水平,硬件处理速度已经无法满足如此大量数据的处理,只能通过分布式技术,通过并发处理,将任务分配到多个节点,才能满足速度的需要。
3、分布式处理也要付出代价,节点间的通信,再小也会是个技术瓶颈。从这个角度来说,如果数据量没有这么大的话,采用分布式处理,反而不如单节点优化的效果好。
4、分布式处理的优点是,可以通过一堆性能一般的电脑,达到一台高性能计算机的处理速度,同时自带了数据冗余机制,降低了维护量。但其维护量,总体上还是上升了。所以是否采用,就要权衡数据量及维护量之间的关系了。
哦,扯远了。。。咱们继续。
第一步当然是看一下帮助:
hadoop@hadoop-master:~/Deploy/hbase-1.1.2$ bin/hbase shell
hbase(main):061:0> help
HBase Shell, version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.
COMMAND GROUPS:
Group name: general
Commands: status, table_help, version, whoami
Group name: ddl
Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters
Group name: namespace
Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables
Group name: dml
Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve
Group name: tools
Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump
Group name: replication
Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs
Group name: snapshots
Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot
Group name: configuration
Commands: update_all_config, update_config
Group name: quotas
Commands: list_quotas, set_quota
Group name: security
Commands: grant, revoke, user_permission
Group name: visibility labels
Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility
SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:
{'key1' => 'value1', 'key2' => 'value2', ...}
and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.
If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:
hbase> get 't1', "key\x03\x3f\xcd"
hbase> get 't1', "key\003\023\011"
hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"
The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/book.html
hbase(main):062:0>
然后,我们来新增一个患者表patientvisit,该表有四个列族,一个是患者基本信息personinfo,一个是患者附加信息personinfoex,一个是患者就诊信息visitinfo,一个是就诊附加信息visitinfoex。
hbase(main):002:0> create 'patientvisit','personinfo','personinfoex','visitinfo','visitinfoex' 0 row(s) in 3.0060 seconds => Hbase::Table - patientvisit
然后,我们来增加一些数据:
hbase(main):003:0> put 'patientvisit','visit001','personinfo:empi','empi001' 0 row(s) in 0.1750 seconds hbase(main):004:0> put 'patientvisit','visit001','personinfo:name','zhangsan' 0 row(s) in 0.0150 seconds hbase(main):005:0> put 'patientvisit','visit001','personinfo:sex','male' 0 row(s) in 0.0070 seconds hbase(main):006:0> put 'patientvisit','visit001','personinfo:birthday','1999-12-31' 0 row(s) in 0.0060 seconds hbase(main):007:0> put 'patientvisit','visit001','personinfo:address','shanghai xxx road' 0 row(s) in 0.0110 seconds hbase(main):008:0> put 'patientvisit','visit001','personinfo:patid','pat001' 0 row(s) in 0.0140 seconds hbase(main):009:0> put 'patientvisit','visit001','visitinfo:visitid','visit001' 0 row(s) in 0.0190 seconds hbase(main):010:0> put 'patientvisit','visit001','visitinfo:visittime','2015-07-25 10:10:00' 0 row(s) in 0.0130 seconds hbase(main):011:0> put 'patientvisit','visit001','visitinfo:visitdocid','pat001' 0 row(s) in 0.0100 seconds hbase(main):012:0> put 'patientvisit','visit001','visitinfo:visitdocname','Dr. Yang' 0 row(s) in 0.0140 seconds hbase(main):014:0> put 'patientvisit','visit002','personinfo:empi','empi002' 0 row(s) in 0.0090 seconds hbase(main):015:0> put 'patientvisit','visit002','personinfo:name','lisi' 0 row(s) in 0.0120 seconds hbase(main):016:0> put 'patientvisit','visit002','personinfo:sex','male' 0 row(s) in 0.0140 seconds hbase(main):017:0> put 'patientvisit','visit002','personinfo:patid','pat002' 0 row(s) in 0.0150 seconds hbase(main):018:0> put 'patientvisit','visit002','personinfo:address','beijing' 0 row(s) in 0.0150 seconds hbase(main):019:0> put 'patientvisit','visit002','visitinfo:visitdocid','doc001' 0 row(s) in 0.0390 seconds hbase(main):020:0> put 'patientvisit','visit001','visitinfo:visitdocid','doc001' 0 row(s) in 0.0150 seconds hbase(main):021:0> put 'patientvisit','visit002','visitinfo:visitdocname','Dr. Yang' 0 row(s) in 0.0150 seconds hbase(main):022:0> put 'patientvisit','visit002','visitinfo:visitid','visit002' 0 row(s) in 0.0100 seconds hbase(main):023:0> put 'patientvisit','visit002','visitinfo:visittime','2015-07-26 11:11:00' 0 row(s) in 0.0150 seconds hbase(main):024:0> put 'patientvisit','visit003','visitinfo:visittime','2015-07-27 13:13:00' 0 row(s) in 0.0100 seconds hbase(main):025:0> put 'patientvisit','visit003','visitinfo:visitid','visit003' 0 row(s) in 0.0120 seconds hbase(main):026:0> put 'patientvisit','visit003','visitinfo:visitdocname','Dr. Li' 0 row(s) in 0.0110 seconds hbase(main):027:0> put 'patientvisit','visit003','visitinfo:visitdocid','doc002' 0 row(s) in 0.0070 seconds hbase(main):028:0> put 'patientvisit','visit003','personinfo:empi','empi003' 0 row(s) in 0.0110 seconds hbase(main):029:0> put 'patientvisit','visit003','personinfo:name','wangwu' 0 row(s) in 0.0100 seconds hbase(main):030:0> put 'patientvisit','visit003','personinfo:sex','female' 0 row(s) in 0.0100 seconds hbase(main):031:0> put 'patientvisit','visit003','personinfo:patid','pat002' 0 row(s) in 0.0070 seconds hbase(main):032:0> put 'patientvisit','visit003','personinfo:addresss','guangzhou' 0 row(s) in 0.0100 seconds
然后,我们看一下表相关信息:
hbase(main):001:0> list
TABLE
patientvisit
1 row(s) in 0.4150 seconds
=> ["patientvisit"]
hbase(main):047:0> describe 'patientvisit'
Table patientvisit is ENABLED
patientvisit
COLUMN FAMILIES DESCRIPTION
{NAME => 'personinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVE
R', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'personinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FORE
VER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'visitinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER
', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
{NAME => 'visitinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREV
ER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
4 row(s) in 0.0400 seconds
hbase(main):048:0> count 'patientvisit'
3 row(s) in 0.4010 seconds
=> 3
hbase(main):049:0> status 'patientvisit'
1 servers, 0 dead, 3.0000 average load
然后,我们来查询一下:
hbase(main):033:0> get 'patientvisit','visit003' COLUMN CELL personinfo:addresss timestamp=1454231560629, value=guangzhou personinfo:empi timestamp=1454231518425, value=empi003 personinfo:name timestamp=1454231527200, value=wangwu personinfo:patid timestamp=1454231549310, value=pat002 personinfo:sex timestamp=1454231537129, value=female visitinfo:visitdocid timestamp=1454231491395, value=doc002 visitinfo:visitdocname timestamp=1454231479591, value=Dr. Li visitinfo:visitid timestamp=1454231462038, value=visit003 visitinfo:visittime timestamp=1454231439570, value=2015-07-27 13:13:00 9 row(s) in 0.0380 seconds
额,发现了什么没?
这就是一个超级大的三层KV数据库啊,只要用行KEY,定位到一行,然后用列族名定位到列,然后用列名定位到一个CELL,数据就取到了哦。
然后,扫描一下表的address列
hbase(main):038:0> scan 'patientvisit',{COLUMNS=>'personinfo:address'}
ROW COLUMN+CELL
visit001 column=personinfo:address, timestamp=1454230894972, value=shanghai xxx road
visit002 column=personinfo:address, timestamp=1454231323278, value=beijing
2 row(s) in 0.0400 seconds
额,比预期少了一行,发现列名写错了,那就要把错误列名删除,然后put正确的列名哦
hbase(main):039:0> get 'patientvisit','visit003'
COLUMN CELL
personinfo:addresss timestamp=1454231560629, value=guangzhou
personinfo:empi timestamp=1454231518425, value=empi003
personinfo:name timestamp=1454231527200, value=wangwu
personinfo:patid timestamp=1454231549310, value=pat002
personinfo:sex timestamp=1454231537129, value=female
visitinfo:visitdocid timestamp=1454231491395, value=doc002
visitinfo:visitdocname timestamp=1454231479591, value=Dr. Li
visitinfo:visitid timestamp=1454231462038, value=visit003
visitinfo:visittime timestamp=1454231439570, value=2015-07-27 13:13:00
9 row(s) in 0.0430 seconds
hbase(main):040:0> put 'patientvisit','visit003','personinfo:address','guangzhou'
0 row(s) in 0.0140 seconds
hbase(main):054:0> delete 'patientvisit','visit003','personinfo:addresss'
0 row(s) in 0.0240 seconds
hbase(main):042:0> scan 'patientvisit',{COLUMNS=>'personinfo:address'}
ROW COLUMN+CELL
visit001 column=personinfo:address, timestamp=1454230894972, value=shanghai xxx road
visit002 column=personinfo:address, timestamp=1454231323278, value=beijing
visit003 column=personinfo:address, timestamp=1454232027650, value=guangzhou
3 row(s) in 0.0410 seconds
删除记录如何实现呢?
hbase(main):056:0> put 'patientvisit','visit004','personinfo:name','zhaoliu' 0 row(s) in 0.0220 seconds hbase(main):059:0> deleteall 'patientvisit','visit004' 0 row(s) in 0.0300 seconds hbase(main):060:0> get 'patientvisit','visit004' COLUMN CELL 0 row(s) in 0.0080 seconds
最后,如何删除表呢?
hbase(main):043:0> disable 'patientvisit' 0 row(s) in 2.7570 seconds hbase(main):044:0> drop 'patientvisit' 0 row(s) in 2.3230 seconds
这样,最基本的增删改查就完成了哦。