智慧筋肉人: Java

顯示具有 Java 標籤的文章。顯示所有文章

2014年10月13日星期一

Java TCP Server receive Linux nc command

[Problem]

Linux 的 nc 指令(netcat)可透過 TCP/UDP 來傳送資料或訊息，但我目前無法使用 nc 或其他Linux指令來創造一個MultiThread TCP Server，僅能透過 nc 做一對一的傳送資料
如 Put Data into Remote HBase by using Linux Bash Script ，因此改用Java 寫一個MultiThreaded Server

------------------------------------------------------------------------------------------------------------
[Concept]

Reference:
- How to write RAW data to a file using Java? e.g same as: nc -l 8000 > capture.raw
- Multithreaded Server in Java
-------------------------------------------------------------------------------------------------------------
[Code]

tcpserver.java (Main Class)

package tcpserver;

public class tcpserver {

 public static void main(String args[]){  

  MultiThreadedServer server = new MultiThreadedServer(9075);

  System.out.println("***************************");

  System.out.println("** TCP Collection Server **");

  System.out.println("***************************");

  new Thread(server).start();

 }

}

MultiThreadedServer.java

package tcpserver;

import java.io.IOException;

import java.net.ServerSocket;

import java.net.Socket;

public class MultiThreadedServer implements Runnable{

    protected int          serverPort   = 50200;

    protected ServerSocket serverSocket = null;

    protected boolean      isStopped    = false;

    protected Thread       runningThread= null;

    public MultiThreadedServer(int port){

        this.serverPort = port;

        System.out.println("System port setting : " + port);

    }

    public void run(){

     synchronized(this){

            this.runningThread = Thread.currentThread();  //Avoid another same Server(?)

        }

        openServerSocket();

        System.out.println("[INFO] Start Listening!");

        while(!isStopped()){

            Socket clientSocket = null;

            try {

                clientSocket = this.serverSocket.accept(); //wait client

            } catch (IOException e) {

                if(isStopped()) {

                    System.out.println("[ERROR] Server Stopped.") ;

                    return;

                }

                throw new RuntimeException("[ERROR] Error accepting client connection", e);

            }

            // Create a New Thread for client

            new Thread(

              new WorkerRunnable(clientSocket)

            ).start();

        }

        System.out.println("Server Stopped.") ;

    }

    private synchronized boolean isStopped() {

        return this.isStopped;

    }

    public synchronized void stop(){

        this.isStopped = true;

        try {

            this.serverSocket.close();

        } catch (IOException e) {

            throw new RuntimeException("Error closing server", e);

        }

    }

    private void openServerSocket() {

        try {

            this.serverSocket = new ServerSocket(this.serverPort);

            System.out.println("[INFO] Create a ServerSocket");

        } catch (IOException e) {

            throw new RuntimeException("Cannot open port 8080", e);

        }

    }

}

WorkerRunnable.java

package tcpserver;

import java.io.*;

import java.net.*;

public class WorkerRunnable implements Runnable{

 protected Socket clientSocket = null;

 public WorkerRunnable(Socket clientSocket){

  this.clientSocket = clientSocket; 

 }

 public void run(){

  try{

   byte[] buff = new byte[1024];

   int bytes_read=0;

   InputStream data = clientSocket.getInputStream(); // get input data

   while(( bytes_read = data.read(buff)) != -1){     // print out

    System.out.println(new String(buff));

    //String str = new String(buff);

   }

   data.close();

  }

  catch(IOException e){

   e.printStackTrace();

  }

 }

}

-------------------------------------------------------------------------------------------------------------
[Result]

2014年10月1日星期三

Java compare part of String method

Java 中判斷 String 中是否包含 subString，意即比對部分字串
比如要判斷Sting "ERROR543"中，是否包含 "ERROR"

可用

public class test {
 public static void main(String[] args){

  String a = "ERROR543";
  if(a.indexOf("ERRO") != -1)
   System.out.println("yes");
  else
   System.out.println("NO");
 } 

}

2014年9月30日星期二

Hadoop-2.4.1 Example(WordCount) on Eclipse

[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)

1. Open a Map/Reducer Project

2. Add lib jar:
- Right click the project > Build Path > Configure Build Path > Java Build Path > Libraries
> Add External JARS (including jars in following dir):
- share/hadoop/common
- share/hadoop/common/lib
- share/hadoop/mapreduce
- share/hadoop/mapreduce/lib
- share/hadoop/yarn
- share/hadoop/yarn/lib
--------additional-----------
- HDFS lib
- HBase lib

3. On this project, add new:
- Mapper: Mp.java

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Mapper.Context;

public class Mp extends Mapper<LongWritable, Text, Text, IntWritable> {

 private final static IntWritable one = new IntWritable(1);

 private Text word = new Text();

 public void map(LongWritable ikey, Text ivalue, Context context)

   throws IOException, InterruptedException {

  String line = ivalue.toString();

  StringTokenizer tokenizer = new StringTokenizer(line);

  while(tokenizer.hasMoreTokens()){

   word.set(tokenizer.nextToken());

   context.write(word, one);

  }

 }

}

- Reducer: Rd.java

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.Reducer.Context;

public class Rd extends Reducer<Text, IntWritable, Text, IntWritable> {

 public void reduce(Text _key, Iterable<IntWritable> values, Context context)

   throws IOException, InterruptedException {

  // process values

  int sum =0;

  for(IntWritable v : values){

   sum += v.get();

  }

  context.write(_key, new IntWritable(sum));

 }

}

- MapReduce Driver: WC.java

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WC {

 public static void main(String[] args) throws Exception {
  Configuration conf = new Configuration();
  @SuppressWarnings("deprecation")
  Job job = new Job(conf, "wordcount");
  //Job job = Job.getInstance(conf, "wordcount");
  job.setJarByClass(WC.class);
  // TODO: specify a mapper
  job.setMapperClass(Mp.class);
  // TODO: specify a reducer
  job.setReducerClass(Rd.class);

  // TODO: specify output types
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);

  // TODO: specify input and output DIRECTORIES (not files)
  FileInputFormat.setInputPaths(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));

  if (!job.waitForCompletion(true))
   return;
 }

}

4. Create jar file
- File > Export > JAR file
- Select resources and Jar File Loctaion

5. Run application
- Seclect "Run Configurations" > Check "Java Application", "Name", "Project", "Mainclass"
- Enter "Arguments" > add "file1 Output" in Program arguments
[備註] 因為main裡面沒有指定 input output, 所以這邊必須設定給app,
相當於用terminal 執行 $ hadoop jar project.jar file1 Output1 ,
如果不加路徑，預設input 及output位置在本機的 $ECLIPSE_WORKSPACE/PROJECT_FOLDER

- Click "Run"

[ 問題 ] 如何Run appliction 於現有的Hadoop系統上，而非 Local端

===> 2014/09/18 目前測試，必須export jar file 丟到master上運行是OK的。

[Solution]

HBase Count Table Rows (Using Java Jar File)

[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
HBase0.98.5

/*

 * Version:

 *   v1 : count rows of appoint table, only map task, output: counter "ROWS"

 */

import java.io.IOException;


import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;


import org.apache.hadoop.util.GenericOptionsParser;


public class HbaseGet {
 private static byte[] tablename;
 private static byte[] familyname;
 private static byte[] columnname;
 
 public static class GetMap
 extends TableMapper<Text, LongWritable> {//in Java: Text=>String, LongWritable=>long

  public static enum Counters {Rows, Times};
  
  @Override
  public void map(ImmutableBytesWritable rowkey, Result result, Context context)
  throws IOException {
   byte[] b = result.getColumnLatest(Bytes.toBytes("m0"),  Bytes.toBytes("Tj.00")).getValue();
   String msg = Bytes.toString(b);
   if(msg != null && !msg.isEmpty())
    context.getCounter(Counters.Rows).increment(1);
  }
 }
 public static void main(String[] args) throws Exception {
  Configuration conf = HBaseConfiguration.create();
  String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
  if(otherArgs.length !=3){
   System.err.println("Wrong number of arguments:"+ otherArgs.length);
   System.err.println("Usage: hadoop jar HBaseGet.jar HbaseGet <tablename> <CF> <CN>");
   System.exit(-1);
  }  
  tablename  = Bytes.toBytes(otherArgs[0]);
  familyname = Bytes.toBytes(otherArgs[1]);
  columnname = Bytes.toBytes(otherArgs[2]);
  
  Job job = new Job(conf, otherArgs[0]);
  job.setJarByClass(HbaseGet.class);
  
  Scan scan = new Scan();
  scan.addColumn(familyname,columnname);
  TableMapReduceUtil.initTableMapperJob(
    Bytes.toString(tablename),
    scan,
    GetMap.class,
    ImmutableBytesWritable.class,
    Result.class, //Single row result of a Get or Scan query
    job);
  job.setOutputFormatClass(NullOutputFormat.class);
  job.setNumReduceTasks(0);
  System.exit(job.waitForCompletion(true)?0:1);
 }

}

Import CSV file to HBASE(Using Jar File)

[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
HBase0.98.5

Reference:
- http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/SampleUploader.html

Step:
- Create Table
$hbase shell
hbase> create 'TEST3','m0','m1','m2','m3','m4','m5','m6','m7','m8','m9','m10','m11','m12','m13','m14','m15'
- Input File in HDFS
- log file with:
- 98 columns
- 1 timestamp
- 1 ??
- 16 monitored servers( 6 info per server)
- Run the "HBimporttsv_v2.jar" to insert log file to HBase table
$ hadoop jar Downloads/HBimporttsv_v2.jar HBimporttsv.Hbaseimporttsv /user/hduser/test.log2 "TEST3"

[Code]
where Hbaseimporttsv.java is:

/*

 * This program is the operation of importing csv file into HBase

 *

 */

package HBimporttsv;

import java.io.IOException;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.hbase.io.*;

import org.apache.hadoop.hbase.mapreduce.*;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.*;

//import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapreduce.*;

import org.apache.hadoop.util.GenericOptionsParser;

public class Hbaseimporttsv {

 private static final String NAME = "SampleUploader";

 public  static int NUM_OF_SERVER = 16;// number of monitored server

 public  static int NUM_OF_VAR = 6;// number of info per server

 //public static String[]  VAR = {"var1","var2","var3","var4","var5","var6"};//server info type

 public static String[] VAR = {

 "Tj.00","Cal Tj.00","Tc.00","DutV00","DutA00","ErrCode00",

 "Tj.01","Cal Tj.01","Tc.01","DutV01","DutA01","ErrCode01",

 "Tj.02","Cal Tj.02","Tc.02","DutV02","DutA02","ErrCode02",

 "Tj.03","Cal Tj.03","Tc.03","DutV03","DutA03","ErrCode03",

 "Tj.04","Cal Tj.04","Tc.04","DutV04","DutA04","ErrCode04",

 "Tj.05","Cal Tj.05","Tc.05","DutV05","DutA05","ErrCode05",

 "Tj.06","Cal Tj.06","Tc.06","DutV06","DutA06","ErrCode06",

 "Tj.07","Cal Tj.07","Tc.07","DutV07","DutA07","ErrCode07",

 "Tj.08","Cal Tj.08","Tc.08","DutV08","DutA08","ErrCode08",

 "Tj.09","Cal Tj.09","Tc.09","DutV09","DutA09","ErrCode08",

 "Tj.10","Cal Tj.10","Tc.10","DutV10","DutA10","ErrCode10",

 "Tj.11","Cal Tj.11","Tc.11","DutV11","DutA11","ErrCode11",

 "Tj.12","Cal Tj.12","Tc.12","DutV12","DutA12","ErrCode12",

 "Tj.13","Cal Tj.13","Tc.13","DutV13","DutA13","ErrCode13",

 "Tj.14","Cal Tj.14","Tc.14","DutV14","DutA14","ErrCode14",

 "Tj.15","Cal Tj.15","Tc.15","DutV15","DutA15","ErrCode15",

 };

 public static int NUM_OF_TOTAL_COLUMNS = 98;

 static class Uploader

    extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

      //private long checkpoint = 100;

      //private long count = 0;

      @Override

      public void map(LongWritable key, Text line, Context context)

      throws IOException {

        // Input is a CSV file

        // Split CSV line

     // Ex:

     // Input: 14/09/15 18:20:35, Z00 B00 ,50.5,53.26,53.45,1251.06,291.25,FF

     // Output: values[0]="14/09/15 18:20:35", values[1]="Z00 B00",

     //         values[2]="50.5", values[3]="53.26", values[4]="53.45",

     //         values[5]="1251.06", values[6]="291.25", values[7]="FF"

     String [] values = line.toString().split(",");

        if(values.length != NUM_OF_TOTAL_COLUMNS)

          return;

        // Extract values[0] >> timestamp

        byte [] timestamp = Bytes.toBytes(values[0]);

        // Extract values[1] >> ??

        //byte [] ?? = Bytes.toBytes(values[1]);

        // Create Put

        Put put = new Put(timestamp);//Using first row(timestamp) as ROW_KEY

        //int var_index = 2; // server info star from values[2]

        for(int j = 0; j< NUM_OF_SERVER;j++){

         for(int i = 0; i< NUM_OF_VAR; i++){

             put.add(Bytes.toBytes("m"+j), // Column Family name

               Bytes.toBytes(VAR[(j*NUM_OF_VAR)+i]), // Column name

               Bytes.toBytes(values[2+(j*NUM_OF_VAR)+i])); // Value

            }

        }

        // Uncomment below to disable WAL. This will improve performance but means

        // you will experience data loss in the case of a RegionServer crash.

        // put.setWriteToWAL(false);

        try {

          context.write(new ImmutableBytesWritable(timestamp), put);

        } catch (InterruptedException e) {

          e.printStackTrace();

        }

        /*

        // Set status every checkpoint lines

        if(++count % checkpoint == 0) {

          context.setStatus("Emitting Put " + count);

        }

        */

      }

    }

 public static Job configureJob(Configuration conf, String [] args)

    throws IOException {

      Path inputPath = new Path(args[0]); // input path

      String tableName = args[1]; // Table name which is already in Database

      Job job = new Job(conf, NAME + "_" + tableName);

      job.setJarByClass(Uploader.class);

      FileInputFormat.setInputPaths(job, inputPath);

      job.setInputFormatClass(TextInputFormat.class);

      job.setMapperClass(Uploader.class);

      // No reducers.  Just write straight to table.  Call initTableReducerJob

      // because it sets up the TableOutputFormat. And Output write to table

      TableMapReduceUtil.initTableReducerJob(tableName, null, job);

      job.setNumReduceTasks(0);

      return job;

    }

 public static void main(String[] args) throws Exception{

  Configuration conf = HBaseConfiguration.create();

  String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();

  if(otherArgs.length !=2){

   System.err.println("Wrong number of arguments:"+ otherArgs.length);

   System.err.println("Usage:"+ NAME + " <input> <tablename>");

   System.exit(-1);

  }  

  Job job = configureJob(conf, otherArgs);

  System.exit(job.waitForCompletion(true) ? 0 : 1);  

 }

}

Check Result:
$hbase shell
hbase> get 'TEST3','14/09/16 06:35:38'

HBase create Table (Using Jar File)

[Software]
Hadoop2.4.1
HBase0.98.5

[Reference]
http://diveintodata.org/2009/11/27/how-to-make-a-table-in-hbase-for-beginners/

Running java program
$ hadoop jar hbop.jar HBoperation.HbaseOperation

which jar file contant:

package HBoperation;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.*;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class HbaseOperation { 

public static void main(String[] args) throws Exception{

 Configuration myConf = HBaseConfiguration.create(); //create hbase conf object

 //Conf generally will be set by $HBASE_HOME/conf(if it is set in $HADOOP_CLASSPATH)

 //myConf.set() isn't necessary if the conf has be set in  $HADOOP_CLASSPATH

 myConf.set("hbase.master", "192.168.0.7:60000");

 HBaseAdmin hbase = new HBaseAdmin(conf);//Create Admin to operate HBase

 /////////////////////

 //  Create Table   //

 /////////////////////

 //HTableDescriptor desc = new HTableDescriptor("TEST");//Deprecate from 0.98ver

 HTableDescriptor desc = new HTableDescriptor(TableName.valueOf("TEST"));

 HColumnDescriptor meta = new HColumnDescriptor("personal".getBytes());

 HColumnDescriptor pref = new HColumnDescriptor("account".getBytes());

 desc.addFamily(meta);

 desc.addFamily(pref);

 hbase.createTable(desc);

 ///////////////////////

 //  Connect Table    //

 ///////////////////////

 HConnection hconnect = HConnectionManager.createConnection(conf);

 HTableInterface testTable = hconnect.getTable("TEST");

 //////////////////////////

 //   Put Data to Table  //

 //////////////////////////

 Put p = new Put(Bytes.toBytes("student1"));

 p.add(Bytes.toBytes("personal"), Bytes.toBytes("name"), Bytes.toBytes("John"));

 p.add(Bytes.toBytes("account"), Bytes.toBytes("id"), Bytes.toBytes("3355454"));

 testTable.put(p);

 testTable.close();

 hbase.close();

   }

  }

- Check HBase
$hbase shell
hbase>list
- Result
TABLE
1 row(s) in 0.0390 seconds

[Problem]
When I run the jar file fisrt time, the error occurs as following:
"opening socket connection to server localhost 127.0.0.1:2181 will not attempt to authenticate using SASL"

[Solution]
- THINK:
We set the HBase locaiton(with Zookeeper) is "192.168.0.7" , so "server localhost 127.0.0.1" is weird. Maybe the Hbase conf doen't be include in HADOOP_CLASSPATH, because we used "hadoop jar" command.
-Method:
1. modify the bashrc file and add:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/conf
2. rerun the env conf:
$. ~/.bashrc

Hadoop Simple WordCount Example

[Software]
Hadoop2.4.1

[Reference]
http://azure.microsoft.com/zh-tw/documentation/articles/hdinsight-develop-deploy-java-mapreduce/

1. Open Notepad.
2. Copy WordCount.java and paste the following program into notepad.

package org.apache.hadoop.examples;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper 

       extends Mapper<object intwritable="" text="">{

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

    public void map(Object key, Text value, Context context

                    ) throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString());

      while (itr.hasMoreTokens()) {

        word.set(itr.nextToken());

        context.write(word, one);

      }

    }

  }

  public static class IntSumReducer 

       extends Reducer<text ext="" ntwritable=""> {

    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<intwritable> values, 

                       Context context

                       ) throws IOException, InterruptedException {

      int sum = 0;

      for (IntWritable val : values) {

        sum += val.get();

      }

      result.set(sum);

      context.write(key, result);

    }

  }

  public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();

    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

    if (otherArgs.length != 2) {

      System.err.println("Usage: wordcount <in> <out>");

      System.exit(2);

    }

    Job job = new Job(conf, "word count");

    job.setJarByClass(WordCount.class);

    job.setMapperClass(TokenizerMapper.class);

    job.setCombinerClass(IntSumReducer.class);

    job.setReducerClass(IntSumReducer.class);

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

  }

}

3.Make Dir:
$mkdir Word_Count
4.Compile java:
$javac -classpath $HADOOP_INSTALL/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_INSTALL/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_INSTALL/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_INSTALL/share/hadoop/common/lib/hadoop-annotations-2.4.1.jar -d Word_Count WordCount.java

5.Create jar file:
$jar -cvf WordCount.jar -C Word_Count/ .

Then the WordCount.jar will be created at the current DIR.

6.Put input file to HDFS:
$hdfs dfs -put WordCount.java /WordCount/Input/file1

6.Execute the jar
$hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount /WordCount/Input /WordCount/Output

7.Check Result:
$hdfs dfs -cat /WordCount/Output/part-r-00000

2013年2月26日星期二

如何對Map 集合做sort

Java所提供的Map class並無提供任何排序(sort)的函式
所以必需自行撰寫，範例參考如下

   
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.LinkedHashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.TreeMap;


public class sortMap {
 public static void main(String[] args){
  
  Map<String, Integer> sortMap = new TreeMap<String, Integer>();
  sortMap.put("N1", 5);
  sortMap.put("N2", 3);
  sortMap.put("N3", 1);
  sortMap.put("N4", 2);
  sortMap.put("N5", 4);
  Collection<Integer> sortMapTmp = sortMap.values();
  
  System.out.println("--unsort Map--");
  printSortMap(sortMap);
  
  sortMap = sortMapByComparator(sortMap);
  
  System.out.println("--sorted Map--");
  printSortMap(sortMap);
 }
 
 private static Map<String, Integer> sortMapByComparator(Map<String, Integer> unsortMap){
  
  List<Entry<String, Integer>> list = new LinkedList<Entry<String, Integer>>(unsortMap.entrySet());
  
  // Sorting the list based on values
  Collections.sort(list, new Comparator<Entry<String, Integer>>()
  {
    public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2)
    {
      //若要由小排到大
      //如果 o1 > o2 => 回傳  1
      //     o1 = o2 => 回傳  0 
      //     o1 < o2 => 回傳 -1
      return o1.getValue().compareTo(o2.getValue());
    }
  });
  
  Map<String, Integer> sortedMap = new LinkedHashMap<String, Integer>();
  for (Entry<String, Integer> entry : list)
        {
            sortedMap.put(entry.getKey(), entry.getValue());
        }
  
  return sortedMap;
 }
 
 private static void printSortMap(Map<String, Integer> sortMap){
  
  for(Entry<String, Integer> entry : sortMap.entrySet()){
   System.out.println("Key : " + entry.getKey() + " Value : "+ entry.getValue());
  }
 }
}

[Reference]
http://stackoverflow.com/questions/1448369/how-to-sort-a-treemap-based-on-its-values

2013年2月25日星期一

Dimensions expected after this token

在宣告 Map 物件時, 需注意內部的<key, value>宣告用法
裡頭不得為常用的 int , 而必須是 Integer

否則會出現"Syntax error on token "int", Dimensions expected after this token "

Example:
(X) Map<String, int> list_test = new TreeMap<String, int>();
(O) Map<String, Integer> list_test = new TreeMap<String, Integer>();

2012年10月31日星期三

java.lang.NullPointerException 問題

在開發大型程式時, 經常會用到物件導向的概念,
但有時會不小心忘了使用到一個"尚未存在的物件", 存取使用其物件所屬類別的函式,
就會造成
java.lang.NullPointerException 的問題

ex:
...
Job job;
job.setconf(); //setconf() 為Job class中的一個函式
...

就會有問題
因為實際上 Job job; 只是宣告了一個是宣告了一個Job type的變數

應該要宣告出一個實際的物件, 才能去使用其函式

...
Job job = new Job();
job.setconf();
...

[Note] 所以當發生java.lang.NullPointerException此問題時, 可以去檢查一下是否有物件尚未被new出來