智慧筋肉人: Hadoop Sequence file 看起來怪怪的?

2014年12月11日星期四

Hadoop Sequence file 看起來怪怪的?

有時候，透過 Flume 傳送file (使用spool soruce) 給 HDFS 儲存，內容如下:
a.txt
a file is here
b.txt
b file is here

當查看 HDFS上的 Hadoop squence file 時
使用

$ hdfs dfs -cat <SEQUENCEFILE>

顯示

SEQ!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.TextK▒*▒▒ ▒▒▒▒͇J<▒▒a file is herJ<▒▒b file is here

這是因為存的是 sequence file ，所以前頭會顯示"SEQ" 而"!org.apache.hadoop.io.LongWritableorg.apache.hadoop.io.Text" 則是包含此sequence data中，key/value的 class type。

如果想要看到文字內容，可使用

$ hdfs dfs -text <SEQUENCEFILE>

1418355275278 a file is here
1418355275280 b file is here

如果下載下來(仍然是squence ，必須透過一些轉換指令才能觀看例如: strings, od .. 等)

$ hdfs dfs -get <SEQUENCEFILE>

$ strings <SEQUENCEFILE>

!org.apache.hadoop.io.LongWritable
org.apache.hadoop.io.Text
a file is here
b file is here

[Reference]
http://stackoverflow.com/questions/23827051/sequence-and-vectors-from-csv-file

智慧筋肉人

2014年12月11日星期四

Hadoop Sequence file 看起來怪怪的?

沒有留言:

張貼留言

2014年12月11日 星期四

Hadoop Sequence file 看起來怪怪的?

沒有留言:

張貼留言

2014年12月11日星期四