智慧筋肉人: HBase

顯示具有 HBase 標籤的文章。顯示所有文章

2014年12月1日星期一

Python 處理 generator所產生的Dictionary Result

在 Django 中，

當我們想將 python 透過 HappyBase scan function 取得的 HBase Table scan resualt，

呈現在網頁上時

Example: 'pytest' Table
RowKey f:id
"John" "a"
"Mary" "b"
"Tom" "c"

views.py (回傳 HBase scan result 給 detail.html)

...

connection = happybase.Connection('192.168.0.7')

connection.open()

table = connection.table('pytest')

result = table.scan(columns=['f:id'], filter="SingleColumnValueFilter('f', 'id', !=, 'binary:a')")

template = loader.get_template('app1/detail.html')

context = Context({'messages': result, })

return HttpResponse(template.render(context))

...

connection = happybase.Connection('192.168.0.7')
connection.open()
使用HappyHBase連結到HBase(實際上是連結到Zookeeper，尚不知為何不用設定zookeeper的Port)

table = connection.table('pytest')
result = table.scan(columns=['f:id'], filter="SingleColumnValueFilter('f', 'id', !=, 'binary:a')")
對HBase上的 "pytest" Table 做 scan，篩選出 'f:id'不等於"a" 的row，並只回傳row的 id那欄

template = loader.get_template('app1/detail.html')
context = Context({'messages': result, })
return HttpResponse(template.render(context))
將result回傳包成HttpResponse，並回傳給指定的html template

-----------------HTML 顯示方法 1 ---------------------
若直接在html中顯示值:
detail.html

{% for v in messages %}

    {{v}}

    <br>

{% endfor %}

會顯示
('Mary', {'f:id': 'b'})
('Tom', {'f:id': 'c'})

(X1,X2, ...) => python的 Tuple 型態

{a1:b1, a2:b2, ...} => python的 Dictionary型態(類似java的Map)
--------------------------------------------

----------------HTML 顯示方法 2 -------------------

但如果我們只需要呈現 rowkey以及 id的值則必須

在views.py中再加入

@register.filter

def get_item(dictionary, key):
   
     return dictionary.get(key)

並且於 detail.html中加入

{% for k, v in messages %}  <-- k, v對應tuple中的兩個值

    {{ k }}

    {% for id in v %}

        {{ v | get_item:id }} <-- Dictionary type的只能用這種方式取值

    {% endfor %}

{% endfor %}

就會顯示
Mary b
Tom c
--------------------------------------------

2014年10月16日星期四

Hive output result to file

[Problem]
欲將Hive計算或收集完的結果，輸出至 HDFS 或是 Local Disk

[Method] INSERT OVERWRITE
Reference:
http://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv

ex :(insert overwrite directory 預設是對應輸出到HDFS上的資料路徑)

hive> insert overwrite directory '/user/hduser/temp' select Avg(times) from hivetableerr;

Result:

$ hdfs dfs -cat /user/hduser/temp/000000_0

47.0

如果要輸出到Local Disk則使用

hive> insert overwrite directory local '/home/hduser/temp' select Avg(times) from hivetableerr;

另一個輸出到 Local Disk的方法，不須在Hive shell中，可直接透過 bash command來執行

$ hive -e 'select Avg(times) from hivetableerr;' > /home/hduser/temp

SQL Function

可透過對Hive 下一些 SQL function

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

去對HBase做一些計算例如AVG()

ex:

hive> select AVG(times) from hivetableerr;

[Result]

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2014-10-16 15:45:47,503 Stage-1 map = 0%, reduce = 0%

2014-10-16 15:46:05,165 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.19 sec

2014-10-16 15:46:19,847 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.49 sec

MapReduce Total cumulative CPU time: 3 seconds 490 msec

Ended Job = job_1412160247541_0074

MapReduce Jobs Launched:

Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.49 sec HDFS Read: 255 HDFS Write: 5 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 490 msec

47.0

Time taken: 44.88 seconds, Fetched: 1 row(s)

HBase Select with Join with Hive

[Problem]
當我們在HBase上創建了兩張Table如下，我們希望透過這兩張表找到Mary的錯誤次數(errorInfo.times)

"hivetable"

	cf
RowKey	id	id2
Jack	1423
Mary	1425	1745

"hivetableErr"

	errorInfo
RowKey	times
1423	43
1425	51

[Method]
Step1: 建立兩張Hive table與上面兩張Hbase table連結
create "hivetable"

create external table hivetable(name int, id int, id2 int)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties("hbase.columns.mapping" = "cf:id, cf,id2")

tblproperties("hbase.table.name" = "hivetable");

create "hivetableerr"

create external table hivetableerr(id int, times int)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties("hbase.columns.mapping" = "errorInfo:times")

tblproperties("hbase.table.name" = "hivetableErr");

查找指令如下

hive > select times from hivetableerr join hivetable on(hivetable.id = hivetableerr.id) where hivetable.key = "Mary";

SELECT FROM: 從 hivetableerr中找 times
JOIN ON: join兩張表格，將 hivetableerr 中的 id 對映到 hivetable中的 id
WHERE: 設定要查找錯誤次數的人為Mary

[Result]

Total jobs = 1
14/10/16 14:41:01 WARN conf.Configuration: file:/tmp/hduser/hive_2014-10-16_14-40-59_385_3743927975833041-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/16 14:41:01 WARN conf.Configuration: file:/tmp/hduser/hive_2014-10-16_14-40-59_385_3743927975833041-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: /tmp/hduser/hduser_20141016144040_963a7821-4ec5-4343-9a53-fc4a413057c1.log
2014-10-16 02:41:02     Starting to launch local task to process map join;      maximum memory = 477102080
2014-10-16 02:41:04     Dump the side-table into file: file:/tmp/hduser/hive_2014-10-16_14-40-59_385_3743927975833041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile10--.hashtable
2014-10-16 02:41:04     Uploaded 1 File to: file:/tmp/hduser/hive_2014-10-16_14-40-59_385_3743927975833041-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile10--.hashtable (282 bytes)
2014-10-16 02:41:04     End of local task; Time Taken: 1.425 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1412160247541_0072, Tracking URL = http://master:8088/proxy/application_1412160247541_0072/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1412160247541_0072
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2014-10-16 14:41:17,420 Stage-3 map = 0%,  reduce = 0%
2014-10-16 14:41:25,978 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.99 sec
MapReduce Total cumulative CPU time: 1 seconds 990 msec
Ended Job = job_1412160247541_0072
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.99 sec   HDFS Read: 258 HDFS Write: 3 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 990 msec
OK
51

Time taken: 27.719 seconds, Fetched: 1 row(s)

2014年10月15日星期三

HIve connect to HBase Table

[Problem]
我們已知如何利用 Hive 在 HBase 上創建 Table，
但如果想要直接從Hive連接到 HBase上已經存在的 Table呢?

[Table Connection]
Reference: http://item.iqadd.com/item/hive-hbase-integration

Hive Code

CREATE EXTERNAL TABLE hivetable(key String, value int)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES("hbase.columns.mapping" = "cf:id")

TBLPROPERTIES("hbase.table.name" = "hivetable");

解釋

CREATE EXTERNAL TABLE hive_table(key String, value int)
創建一個外部(External) Table名叫"hivetable"，意即Table本身在別的Database，Hive只存此Table的MetaData並與外部Database中的Table直接連結操作。

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
描述上個指令是透過Hive所提供 HBase的 Storage Handler來操作HBase，當然Hive也提供了其他database以及FileSystem的 Storage Handler

WITH SERDEPROPERTIES("hbase.columns.mapping" = "cf:id")
取得Table上特定的資料範圍(可選定特定欄位、起始、Filter)，以此行指令為例，即此Hive table只收集HBase table上的 cf:id 欄位(取得多個欄位ex: "cf:id, cf:id2, cf2:name" ...)

TBLPROPERTIES("hbase.table.name" = "hivetable");
指定要連結的HBase Table Name，此例為連結到HBase上的"hivetable"

如此一來，只要HBase上這張"hivetable"有資料上的變化，可以直接從Hive中 "hive_table"觀察到變化

2014年10月12日星期日

Put Data into Remote HBase by using Linux Bash Script

1. Start a TCP listening Server on 192.168.0.7:5002,
and cast received info to HBase Shell

start_colleciton.sh

while true;

do nc -l 192.168.0.7 5002

done | hbase shell

[程式意義]
nc -l : 會在 IP:192.168.0.7 開啟一個 port:5002，來 listen 任何發送到此 port的TCP封包，
(若使用 nc -lu 則是 listen UDP 封包)
while true; ... do ... done : 因為TCP處理一次後就會關掉，必須加上此指令來持續接收，
| hbase shell : 將 nc 指令 listen到的訊息(HBase Shell Operations) 丟給 hbase shell做

--------------------------------------------------------------------------------------------------------------------
2. Send Info to Server

tcollector.sh

#!/bin/bash

set -e

while true;

do awk -v now=`date +%s` \

'{ print "put " $1", " now", " $2", " $3}' HBase_Info

sleep 15

done | nc -w 30 192.168.0.7 5002

[程式意義]
set -e : 檢查 pipe執行( | ) 如果其中有例外則結束整個 pipe

awk -v now=`date +%s` \
'{ print "put " $1", " now", " $2", " $3}' HBase_Info
- 設定變數 now 為現在時間(date+%s ==> 取到秒)
- 將HBase_Info 此檔案內容一行一行讀出,
第一欄為 $1= 'hiveTableonHB'
第二欄為 $2= 'cf:time'
第三欄為 $3= 'test'

nc -w 30 192.168.0.7 5002 : 將上述awk 指令中 print 的內容傳送到 192.168.0.7:5002上

HBase_Info

'hiveTableonHB' 'cf:time' 'test'

--------------------------------------------------------------------------------------------------------------------
[RESULT]

1.8.7-p357 :004 > scan 'hiveTableonHB'
ROW COLUMN+CELL
1413177758 column=cf:time, timestamp=1413177976111, value=test
1413180302 column=cf:time, timestamp=1413180310949, value=test
1413180317 column=cf:time, timestamp=1413180324753, value=test
6 row(s) in 0.0270 seconds

2014年10月7日星期二

實作HBase 操作介面

[尚須改進]
- 使用 Pool connection
- 加入 Spark 計算

功能:
0. 操作選項清單 (HBaseGet.java)
[已有功能]
- 可選擇計算或查詢
- 指令核對，錯誤指令將重新詢問
- 可選擇是否結束

[期望改進]
- 導入圖形介面(選單、參數方塊...等)

1. 計算 (CountJob.java)
[基本功能]
- 自行選擇Table, Column, 及輸出位置, row range尚須手動修改code
- 自動檢驗輸入的Table name, Column Family Name, Column Name

[期望改進]

- 可計算:
- Average
- 輸出結果文字檔至 HDFS

[期望改進]
1. conf.set("mapred.jar", ...) 因為要local端位址，還要想辦法修改(是否可自動?)
2. 把平均或標準差的計算再拉到另外class

2. 查詢 (ScanHTable.java)
[已有功能]
- 可設定簡單filter

[期望改進]
- filter 值得輸入方式

3. Create Table
[已有功能]
- 可手動輸入 tablename, cf

[期望改進]
- 自動創建大量cf之表格

4. Delete Table
[已有功能]
- 可手動輸入 tablename, cf

[期望改進]
- 是否要獨立disable?

5. List Table
[已有功能]
- 列出現存table name

* 6. Import Data to Table
*- Online
- Offline (HBaseimporttsv_v4.jar)

2014年9月30日星期二

HBase Count Table Rows (Using Java Jar File)

[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
HBase0.98.5

/*

 * Version:

 *   v1 : count rows of appoint table, only map task, output: counter "ROWS"

 */

import java.io.IOException;


import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;


import org.apache.hadoop.util.GenericOptionsParser;


public class HbaseGet {
 private static byte[] tablename;
 private static byte[] familyname;
 private static byte[] columnname;
 
 public static class GetMap
 extends TableMapper<Text, LongWritable> {//in Java: Text=>String, LongWritable=>long

  public static enum Counters {Rows, Times};
  
  @Override
  public void map(ImmutableBytesWritable rowkey, Result result, Context context)
  throws IOException {
   byte[] b = result.getColumnLatest(Bytes.toBytes("m0"),  Bytes.toBytes("Tj.00")).getValue();
   String msg = Bytes.toString(b);
   if(msg != null && !msg.isEmpty())
    context.getCounter(Counters.Rows).increment(1);
  }
 }
 public static void main(String[] args) throws Exception {
  Configuration conf = HBaseConfiguration.create();
  String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
  if(otherArgs.length !=3){
   System.err.println("Wrong number of arguments:"+ otherArgs.length);
   System.err.println("Usage: hadoop jar HBaseGet.jar HbaseGet <tablename> <CF> <CN>");
   System.exit(-1);
  }  
  tablename  = Bytes.toBytes(otherArgs[0]);
  familyname = Bytes.toBytes(otherArgs[1]);
  columnname = Bytes.toBytes(otherArgs[2]);
  
  Job job = new Job(conf, otherArgs[0]);
  job.setJarByClass(HbaseGet.class);
  
  Scan scan = new Scan();
  scan.addColumn(familyname,columnname);
  TableMapReduceUtil.initTableMapperJob(
    Bytes.toString(tablename),
    scan,
    GetMap.class,
    ImmutableBytesWritable.class,
    Result.class, //Single row result of a Get or Scan query
    job);
  job.setOutputFormatClass(NullOutputFormat.class);
  job.setNumReduceTasks(0);
  System.exit(job.waitForCompletion(true)?0:1);
 }

}

Import CSV file to HBASE(Using Jar File)

[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
HBase0.98.5

Reference:
- http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/SampleUploader.html

Step:
- Create Table
$hbase shell
hbase> create 'TEST3','m0','m1','m2','m3','m4','m5','m6','m7','m8','m9','m10','m11','m12','m13','m14','m15'
- Input File in HDFS
- log file with:
- 98 columns
- 1 timestamp
- 1 ??
- 16 monitored servers( 6 info per server)
- Run the "HBimporttsv_v2.jar" to insert log file to HBase table
$ hadoop jar Downloads/HBimporttsv_v2.jar HBimporttsv.Hbaseimporttsv /user/hduser/test.log2 "TEST3"

[Code]
where Hbaseimporttsv.java is:

/*

 * This program is the operation of importing csv file into HBase

 *

 */

package HBimporttsv;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.hbase.io.*;

import org.apache.hadoop.hbase.mapreduce.*;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.*;

//import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapreduce.*;

import org.apache.hadoop.util.GenericOptionsParser;

public class Hbaseimporttsv {

 private static final String NAME = "SampleUploader";

 public  static int NUM_OF_SERVER = 16;// number of monitored server

 public  static int NUM_OF_VAR = 6;// number of info per server

 //public static String[]  VAR = {"var1","var2","var3","var4","var5","var6"};//server info type

 public static String[] VAR = {

 "Tj.00","Cal Tj.00","Tc.00","DutV00","DutA00","ErrCode00",

 "Tj.01","Cal Tj.01","Tc.01","DutV01","DutA01","ErrCode01",

 "Tj.02","Cal Tj.02","Tc.02","DutV02","DutA02","ErrCode02",

 "Tj.03","Cal Tj.03","Tc.03","DutV03","DutA03","ErrCode03",

 "Tj.04","Cal Tj.04","Tc.04","DutV04","DutA04","ErrCode04",

 "Tj.05","Cal Tj.05","Tc.05","DutV05","DutA05","ErrCode05",

 "Tj.06","Cal Tj.06","Tc.06","DutV06","DutA06","ErrCode06",

 "Tj.07","Cal Tj.07","Tc.07","DutV07","DutA07","ErrCode07",

 "Tj.08","Cal Tj.08","Tc.08","DutV08","DutA08","ErrCode08",

 "Tj.09","Cal Tj.09","Tc.09","DutV09","DutA09","ErrCode08",

 "Tj.10","Cal Tj.10","Tc.10","DutV10","DutA10","ErrCode10",

 "Tj.11","Cal Tj.11","Tc.11","DutV11","DutA11","ErrCode11",

 "Tj.12","Cal Tj.12","Tc.12","DutV12","DutA12","ErrCode12",

 "Tj.13","Cal Tj.13","Tc.13","DutV13","DutA13","ErrCode13",

 "Tj.14","Cal Tj.14","Tc.14","DutV14","DutA14","ErrCode14",

 "Tj.15","Cal Tj.15","Tc.15","DutV15","DutA15","ErrCode15",

 };

 public static int NUM_OF_TOTAL_COLUMNS = 98;

 static class Uploader

    extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

      //private long checkpoint = 100;

      //private long count = 0;

      @Override

      public void map(LongWritable key, Text line, Context context)

      throws IOException {

        // Input is a CSV file

        // Split CSV line

     // Ex:

     // Input: 14/09/15 18:20:35, Z00 B00 ,50.5,53.26,53.45,1251.06,291.25,FF

     // Output: values[0]="14/09/15 18:20:35", values[1]="Z00 B00",

     //         values[2]="50.5", values[3]="53.26", values[4]="53.45",

     //         values[5]="1251.06", values[6]="291.25", values[7]="FF"

     String [] values = line.toString().split(",");

        if(values.length != NUM_OF_TOTAL_COLUMNS)

          return;

        // Extract values[0] >> timestamp

        byte [] timestamp = Bytes.toBytes(values[0]);

        // Extract values[1] >> ??

        //byte [] ?? = Bytes.toBytes(values[1]);

        // Create Put

        Put put = new Put(timestamp);//Using first row(timestamp) as ROW_KEY

        //int var_index = 2; // server info star from values[2]

        for(int j = 0; j< NUM_OF_SERVER;j++){

         for(int i = 0; i< NUM_OF_VAR; i++){

             put.add(Bytes.toBytes("m"+j), // Column Family name

               Bytes.toBytes(VAR[(j*NUM_OF_VAR)+i]), // Column name

               Bytes.toBytes(values[2+(j*NUM_OF_VAR)+i])); // Value

            }

        }

        // Uncomment below to disable WAL. This will improve performance but means

        // you will experience data loss in the case of a RegionServer crash.

        // put.setWriteToWAL(false);

        try {

          context.write(new ImmutableBytesWritable(timestamp), put);

        } catch (InterruptedException e) {

          e.printStackTrace();

        }

        /*

        // Set status every checkpoint lines

        if(++count % checkpoint == 0) {

          context.setStatus("Emitting Put " + count);

        }

        */

      }

    }

 public static Job configureJob(Configuration conf, String [] args)

    throws IOException {

      Path inputPath = new Path(args[0]); // input path

      String tableName = args[1]; // Table name which is already in Database

      Job job = new Job(conf, NAME + "_" + tableName);

      job.setJarByClass(Uploader.class);

      FileInputFormat.setInputPaths(job, inputPath);

      job.setInputFormatClass(TextInputFormat.class);

      job.setMapperClass(Uploader.class);

      // No reducers.  Just write straight to table.  Call initTableReducerJob

      // because it sets up the TableOutputFormat. And Output write to table

      TableMapReduceUtil.initTableReducerJob(tableName, null, job);

      job.setNumReduceTasks(0);

      return job;

    }

 public static void main(String[] args) throws Exception{

  Configuration conf = HBaseConfiguration.create();

  String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();

  if(otherArgs.length !=2){

   System.err.println("Wrong number of arguments:"+ otherArgs.length);

   System.err.println("Usage:"+ NAME + " <input> <tablename>");

   System.exit(-1);

  }  

  Job job = configureJob(conf, otherArgs);

  System.exit(job.waitForCompletion(true) ? 0 : 1);  

 }

}

Check Result:
$hbase shell
hbase> get 'TEST3','14/09/16 06:35:38'

Import CSV file to HBASE(Using HBase Shell Command)

[Software]
Hadoop2.4.1
HBase 0.98.5
[Reference]
- http://www.openscg.com/2013/08/hadoop-hbase-tutorial/ (Operations)
- http://wiki.apache.org/hadoop/Hbase/Shell (HBase shell command)

Input type:

14/09/15 18:20:35, Z00 B00 ,0050.50 ,0053.26 ,0053.45 ,1251.06 ,0291.25 ,FF
14/09/15 18:20:35, Z00 B01 ,0053.50 ,0055.80 ,0056.03 ,1249.79 ,0357.45 ,FF
.......

Table:

|    | type | m1 |
------------------------------------------------------------------------------------
HBASE_ROW_KEY | states | deg | high | heat | lenght | avg |char|
------------------------------------------------------------------------------------
14/09/15 18:20:35 | Z00 B00 | 0050.50 | 0053.26 | 0053.45 | 1251.06 | 0291.25 | FF |
14/09/15 18:20:35 | Z00 B01 | 0053.50 | 0055.80 | 0056.03 | 1249.79 | 0357.45 | FF |
.......

Step:
$hbase shell
> create 'log_data', 'type','m1' //建立"log_data"，其中包含兩個 column family, "type" and "m1"
> quit

$hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,type:states,m1:deg,m1:high,m1:heat,m1:length,m1:avg,m1:char log_data /user/hduser/test_log.csv

- org.apache.hadoop.hbase.mapreduce.ImportTsv
執行 hbase-server-${version}-hadoop2.jar 中的 ImportTsv Class，這讓HBASE可以載入csv格式的data
- '-Dimporttsv.separator=,'
讓HBase知道每行資料值的分隔界線為","
- -Dimporttsv.columns
設定Columns Family(在hbase shell建立的'type'與'm1')，至少要有一個HBASE_ROW_KEY來當row key，
column格式則為 "columnfamilyname:columnname" ex: "m1:deg"
- log_data
此arg為input table name (即於hbase shell中建立的 "log_data")
- /user/hduser/test_log.csv
此arg為input file name ，對應位置為與HBASE連結的HDFS

$hbase shell
> scan 'log_data' // 查看輸入資料的table

[Future Work]
1. 尚未對完整log包含後面刪除欄位做輸入
2. 透過其他更簡便的介面或程式碼做Input

HBase create Table (Using Jar File)

[Software]
Hadoop2.4.1
HBase0.98.5

[Reference]
http://diveintodata.org/2009/11/27/how-to-make-a-table-in-hbase-for-beginners/

Running java program
$ hadoop jar hbop.jar HBoperation.HbaseOperation

which jar file contant:

package HBoperation;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class HbaseOperation { 

public static void main(String[] args) throws Exception{

 Configuration myConf = HBaseConfiguration.create(); //create hbase conf object

 //Conf generally will be set by $HBASE_HOME/conf(if it is set in $HADOOP_CLASSPATH)

 //myConf.set() isn't necessary if the conf has be set in  $HADOOP_CLASSPATH

 myConf.set("hbase.master", "192.168.0.7:60000");

 HBaseAdmin hbase = new HBaseAdmin(conf);//Create Admin to operate HBase

 /////////////////////

 //  Create Table   //

 /////////////////////

 //HTableDescriptor desc = new HTableDescriptor("TEST");//Deprecate from 0.98ver

 HTableDescriptor desc = new HTableDescriptor(TableName.valueOf("TEST"));

 HColumnDescriptor meta = new HColumnDescriptor("personal".getBytes());

 HColumnDescriptor pref = new HColumnDescriptor("account".getBytes());

 desc.addFamily(meta);

 desc.addFamily(pref);

 hbase.createTable(desc);

 ///////////////////////

 //  Connect Table    //

 ///////////////////////

 HConnection hconnect = HConnectionManager.createConnection(conf);

 HTableInterface testTable = hconnect.getTable("TEST");

 //////////////////////////

 //   Put Data to Table  //

 //////////////////////////

 Put p = new Put(Bytes.toBytes("student1"));

 p.add(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("John"));

 p.add(Bytes.toBytes("account"), Bytes.toBytes("id"), Bytes.toBytes("3355454"));

 testTable.put(p);

 testTable.close();

 hbase.close();

   }

  }

- Check HBase
$hbase shell
hbase>list
- Result
TABLE
1 row(s) in 0.0390 seconds

[Problem]
When I run the jar file fisrt time, the error occurs as following:
"opening socket connection to server localhost 127.0.0.1:2181 will not attempt to authenticate using SASL"

[Solution]
- THINK:
We set the HBase locaiton(with Zookeeper) is "192.168.0.7" , so "server localhost 127.0.0.1" is weird. Maybe the Hbase conf doen't be include in HADOOP_CLASSPATH, because we used "hadoop jar" command.
-Method:
1. modify the bashrc file and add:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/conf
2. rerun the env conf:
$. ~/.bashrc

HBase create Table (Using Hive Script)

[Software]
HBase 0.98.5

[Reference]
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-Introduction (Hive HBase Integration)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveCommandLineOptions
(Hive Command and Hive Shell Command)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL (Hive DDL)

Step:

- Create Hive script called (hive-script.sql)
$nano hive-script.sql :

CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "hiveTableonHB");

- Execute the sql script :
$hive -f hive-script.sql #This command only execute the sql file and won't start up hive shell

- Check the result :
$hive
hive> DESCRIBE hbase_table_1; #then show following

OK
key int from deserializer
value string form diserializer
Time taken: 0.948 seconds, Fetched: 2 row(s)

hive> quit;
$hbase shell
hbase> list

hiveTableonHB
1 row(s) in 1.0810 seconds

Base Table Management Web UI : HareDB client (on HBase)

Introduction
[Reference]
http://www.haredb.com/haredb/file/tutorial/HBaseClient_Web_Version_Manual1.94.03.pdf
http://www.haredb.com/HareDB/src_ap/Product_HareDBClient_Install.aspx?l=4

[HareDB Client Features]
- Visualized client tool for HBase >> Better than the command mode
- Easily retrieve data from and put data into HBase
- Can transfer data from RDB to HBase (Using "Data Model Management" func in HBase Client)
- Design your HBase schema only through some configuring in some GUI pages

Install and Start up
[Download]
- http://sourceforge.net/projects/haredbhbaseclie/files/
- Version: HareDBClient_1.98.01s

[Startup]
1. Must setup hostnames of Hbase systems in /etc/hosts
(Otherwise, it will case the error " unknown hostname")
ex: $gedit /etc/hosts
...
192.168.0.7 master -------> HBase master/slaver
192.168.0.23 regionserver2 -------> HBase slaver
...
2. Execute the sh file
$sh Download/HareDBClient_1.98.01s/ startup.sh
Then a HareDB UI web page will appear:
http://localhost:8080/HareDBClient/index.html

3. Setting new Connection to HBase:
1. Click the upper left button on the web page
> "Manage Connections"
> Right Click "Allen"(Default Connection)
> "Clone" and input new name such as "HBase"
> Click "HBase" connection manager
2. Setting Conneciton Infomations:

Connection Name: HBase
ZooKeeper Host/ip: 192.168.0.7 (-> $HBASE_HOME/conf/hbase-site.xml)
ZooKeeper Client Port: 2222 (-> $HBASE_HOME/conf/hbase-site.xml)
fs.default.name: hdfs://192.168.0.7:9000 (-> $HADOOP_HOME/etc/hadoop/core-site.xml)
yarn.resourcemanager.address: 192.168.0.7:8032 (Default)
yarn.resourcemanager.scheduler.address: 192.168.0.7:8030 (Default)
yarn.resourcemanager.resource-tracker.address: 192.168.0.7:8031 (Default)
yarn.resourcemanager.admin.address: 192.168.0.7:8033 (Default)
mapreduce.jobhistory.address: 192.168.0.7:10020 (Default)
coprocessor folder: hdfs://192.168.0.7:9000/tmp (Default)

Hive metastore: "Embedded"

> Click "Apply"

4. Connect to the HBase
> click upper left button on the main page
> Select "HBase" connection which we just created
> the Left area will show Tables of HBase that we connect

[Important Note]
- All table in HBase must register a coprocessor before operating at First Time:
0. There is a cross mark "X" on the Table' icon.
You may found that all operations aren't work.
1. Right Click table > select "Coprocessor" > Selet "Register"

訂閱：文章 (Atom)

2014年12月1日 星期一

2014年10月16日 星期四

2014年10月15日 星期三

2014年10月12日 星期日

2014年10月7日 星期二

2014年9月30日 星期二