首先说一下HBase与传统的关系型数据库在逻辑层次上的不同:
1、HBase的表结构定义中是不需要定义列的,只需要定义列族(可以暂时把列族当成多个列的集合)。所以在建表的时候,只需要指定列族即可。在列族中新增列,是不需要任何事先声明的,直接使用就好了。
2、HBase中,行是通过key来定位的,扫描更是通过key来进行的。所以行的key值选择,就显得十分重要。表结构的定义及key值的选择,实际上决定了数据是否可以高效利用。
3、HBase中,行Key+列族名+列名,可以定义到唯一的一个Cell
其实,从这里大家可以看出:
1、HBase通过对行进行了一定的限制,实现了列的灵活操作,解决了列扩展的问题
2、我们实际应用中,往往将主表及多个关联表不计重复的一起记录到列中,通过对空间的浪费,来实现了时间的节省,解决了查询效率的问题。
换句话说,数据的增加速度,超出了硬件进步的水平,硬件处理速度已经无法满足如此大量数据的处理,只能通过分布式技术,通过并发处理,将任务分配到多个节点,才能满足速度的需要。
3、分布式处理也要付出代价,节点间的通信,再小也会是个技术瓶颈。从这个角度来说,如果数据量没有这么大的话,采用分布式处理,反而不如单节点优化的效果好。
4、分布式处理的优点是,可以通过一堆性能一般的电脑,达到一台高性能计算机的处理速度,同时自带了数据冗余机制,降低了维护量。但其维护量,总体上还是上升了。所以是否采用,就要权衡数据量及维护量之间的关系了。
哦,扯远了。。。咱们继续。
第一步当然是看一下帮助:
hadoop@hadoop-master:~/Deploy/hbase-1.1.2$ bin/hbase shell hbase(main):061:0> help HBase Shell, version 1.1.2, rcc2b70cf03e3378800661ec5cab11eb43fafe0fc, Wed Aug 26 20:11:27 PDT 2015 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, compact_rs, flush, major_compact, merge_region, move, split, trace, unassign, wal_roll, zk_dump Group name: replication Commands: add_peer, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, list_peers, list_replicated_tables, remove_peer, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs Group name: snapshots Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot Group name: configuration Commands: update_all_config, update_config Group name: quotas Commands: list_quotas, set_quota Group name: security Commands: grant, revoke, user_permission Group name: visibility labels Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility SHELL USAGE: Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this: {'key1' => 'value1', 'key2' => 'value2', ...} and are opened and closed with curley-braces. Key/values are delimited by the '=>' character combination. Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type 'Object.constants' to see a (messy) list of all constants in the environment. If you are using binary keys or values and need to enter them in the shell, use double-quote'd hexadecimal representation. For example: hbase> get 't1', "key\x03\x3f\xcd" hbase> get 't1', "key\003\023\011" hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40" The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added. For more on the HBase Shell, see http://hbase.apache.org/book.html hbase(main):062:0>
然后,我们来新增一个患者表patientvisit,该表有四个列族,一个是患者基本信息personinfo,一个是患者附加信息personinfoex,一个是患者就诊信息visitinfo,一个是就诊附加信息visitinfoex。
hbase(main):002:0> create 'patientvisit','personinfo','personinfoex','visitinfo','visitinfoex' 0 row(s) in 3.0060 seconds => Hbase::Table - patientvisit
然后,我们来增加一些数据:
hbase(main):003:0> put 'patientvisit','visit001','personinfo:empi','empi001' 0 row(s) in 0.1750 seconds hbase(main):004:0> put 'patientvisit','visit001','personinfo:name','zhangsan' 0 row(s) in 0.0150 seconds hbase(main):005:0> put 'patientvisit','visit001','personinfo:sex','male' 0 row(s) in 0.0070 seconds hbase(main):006:0> put 'patientvisit','visit001','personinfo:birthday','1999-12-31' 0 row(s) in 0.0060 seconds hbase(main):007:0> put 'patientvisit','visit001','personinfo:address','shanghai xxx road' 0 row(s) in 0.0110 seconds hbase(main):008:0> put 'patientvisit','visit001','personinfo:patid','pat001' 0 row(s) in 0.0140 seconds hbase(main):009:0> put 'patientvisit','visit001','visitinfo:visitid','visit001' 0 row(s) in 0.0190 seconds hbase(main):010:0> put 'patientvisit','visit001','visitinfo:visittime','2015-07-25 10:10:00' 0 row(s) in 0.0130 seconds hbase(main):011:0> put 'patientvisit','visit001','visitinfo:visitdocid','pat001' 0 row(s) in 0.0100 seconds hbase(main):012:0> put 'patientvisit','visit001','visitinfo:visitdocname','Dr. Yang' 0 row(s) in 0.0140 seconds hbase(main):014:0> put 'patientvisit','visit002','personinfo:empi','empi002' 0 row(s) in 0.0090 seconds hbase(main):015:0> put 'patientvisit','visit002','personinfo:name','lisi' 0 row(s) in 0.0120 seconds hbase(main):016:0> put 'patientvisit','visit002','personinfo:sex','male' 0 row(s) in 0.0140 seconds hbase(main):017:0> put 'patientvisit','visit002','personinfo:patid','pat002' 0 row(s) in 0.0150 seconds hbase(main):018:0> put 'patientvisit','visit002','personinfo:address','beijing' 0 row(s) in 0.0150 seconds hbase(main):019:0> put 'patientvisit','visit002','visitinfo:visitdocid','doc001' 0 row(s) in 0.0390 seconds hbase(main):020:0> put 'patientvisit','visit001','visitinfo:visitdocid','doc001' 0 row(s) in 0.0150 seconds hbase(main):021:0> put 'patientvisit','visit002','visitinfo:visitdocname','Dr. Yang' 0 row(s) in 0.0150 seconds hbase(main):022:0> put 'patientvisit','visit002','visitinfo:visitid','visit002' 0 row(s) in 0.0100 seconds hbase(main):023:0> put 'patientvisit','visit002','visitinfo:visittime','2015-07-26 11:11:00' 0 row(s) in 0.0150 seconds hbase(main):024:0> put 'patientvisit','visit003','visitinfo:visittime','2015-07-27 13:13:00' 0 row(s) in 0.0100 seconds hbase(main):025:0> put 'patientvisit','visit003','visitinfo:visitid','visit003' 0 row(s) in 0.0120 seconds hbase(main):026:0> put 'patientvisit','visit003','visitinfo:visitdocname','Dr. Li' 0 row(s) in 0.0110 seconds hbase(main):027:0> put 'patientvisit','visit003','visitinfo:visitdocid','doc002' 0 row(s) in 0.0070 seconds hbase(main):028:0> put 'patientvisit','visit003','personinfo:empi','empi003' 0 row(s) in 0.0110 seconds hbase(main):029:0> put 'patientvisit','visit003','personinfo:name','wangwu' 0 row(s) in 0.0100 seconds hbase(main):030:0> put 'patientvisit','visit003','personinfo:sex','female' 0 row(s) in 0.0100 seconds hbase(main):031:0> put 'patientvisit','visit003','personinfo:patid','pat002' 0 row(s) in 0.0070 seconds hbase(main):032:0> put 'patientvisit','visit003','personinfo:addresss','guangzhou' 0 row(s) in 0.0100 seconds
然后,我们看一下表相关信息:
hbase(main):001:0> list TABLE patientvisit 1 row(s) in 0.4150 seconds => ["patientvisit"] hbase(main):047:0> describe 'patientvisit' Table patientvisit is ENABLED patientvisit COLUMN FAMILIES DESCRIPTION {NAME => 'personinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVE R', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'personinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FORE VER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'visitinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER ', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'visitinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREV ER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 4 row(s) in 0.0400 seconds hbase(main):048:0> count 'patientvisit' 3 row(s) in 0.4010 seconds => 3 hbase(main):049:0> status 'patientvisit' 1 servers, 0 dead, 3.0000 average load
然后,我们来查询一下:
hbase(main):033:0> get 'patientvisit','visit003' COLUMN CELL personinfo:addresss timestamp=1454231560629, value=guangzhou personinfo:empi timestamp=1454231518425, value=empi003 personinfo:name timestamp=1454231527200, value=wangwu personinfo:patid timestamp=1454231549310, value=pat002 personinfo:sex timestamp=1454231537129, value=female visitinfo:visitdocid timestamp=1454231491395, value=doc002 visitinfo:visitdocname timestamp=1454231479591, value=Dr. Li visitinfo:visitid timestamp=1454231462038, value=visit003 visitinfo:visittime timestamp=1454231439570, value=2015-07-27 13:13:00 9 row(s) in 0.0380 seconds
额,发现了什么没?
这就是一个超级大的三层KV数据库啊,只要用行KEY,定位到一行,然后用列族名定位到列,然后用列名定位到一个CELL,数据就取到了哦。
然后,扫描一下表的address列
hbase(main):038:0> scan 'patientvisit',{COLUMNS=>'personinfo:address'} ROW COLUMN+CELL visit001 column=personinfo:address, timestamp=1454230894972, value=shanghai xxx road visit002 column=personinfo:address, timestamp=1454231323278, value=beijing 2 row(s) in 0.0400 seconds
额,比预期少了一行,发现列名写错了,那就要把错误列名删除,然后put正确的列名哦
hbase(main):039:0> get 'patientvisit','visit003' COLUMN CELL personinfo:addresss timestamp=1454231560629, value=guangzhou personinfo:empi timestamp=1454231518425, value=empi003 personinfo:name timestamp=1454231527200, value=wangwu personinfo:patid timestamp=1454231549310, value=pat002 personinfo:sex timestamp=1454231537129, value=female visitinfo:visitdocid timestamp=1454231491395, value=doc002 visitinfo:visitdocname timestamp=1454231479591, value=Dr. Li visitinfo:visitid timestamp=1454231462038, value=visit003 visitinfo:visittime timestamp=1454231439570, value=2015-07-27 13:13:00 9 row(s) in 0.0430 seconds hbase(main):040:0> put 'patientvisit','visit003','personinfo:address','guangzhou' 0 row(s) in 0.0140 seconds hbase(main):054:0> delete 'patientvisit','visit003','personinfo:addresss' 0 row(s) in 0.0240 seconds hbase(main):042:0> scan 'patientvisit',{COLUMNS=>'personinfo:address'} ROW COLUMN+CELL visit001 column=personinfo:address, timestamp=1454230894972, value=shanghai xxx road visit002 column=personinfo:address, timestamp=1454231323278, value=beijing visit003 column=personinfo:address, timestamp=1454232027650, value=guangzhou 3 row(s) in 0.0410 seconds
删除记录如何实现呢?
hbase(main):056:0> put 'patientvisit','visit004','personinfo:name','zhaoliu' 0 row(s) in 0.0220 seconds hbase(main):059:0> deleteall 'patientvisit','visit004' 0 row(s) in 0.0300 seconds hbase(main):060:0> get 'patientvisit','visit004' COLUMN CELL 0 row(s) in 0.0080 seconds
最后,如何删除表呢?
hbase(main):043:0> disable 'patientvisit' 0 row(s) in 2.7570 seconds hbase(main):044:0> drop 'patientvisit' 0 row(s) in 2.3230 seconds
这样,最基本的增删改查就完成了哦。