[Reference]
https://github.com/winghc/hadoop2x-eclipse-plugin (package download)
http://blog.jourk.com/compile-hadoop-eclipse-plugin-2-4-1-plugin.html (method)
http://blog.csdn.net/yueritian/article/details/23868175 (Debug)
Software:
eclpise: eclipse-java-luna-R-linux-gtk-x86_64.tar.gz
Hadoop: 2.4.1
#Because Hadoop2.4.1 doesn't provide the plugin for eclipse, so we need to create one
Plugin create:
Download: https://github.com/winghc/hadoop2x-eclipse-plugin
1.Modify:
(this tool now(2014/09/15)is set for hadoop2.2.0 version, so it need to be modify for 2.4.1v.)
[Change 1]
Find ~/hadoop2x-eclipse-plungin-master/src/contrib/eclipse-plugin/build.xml
Add commons-collections-3.2.1.jar setting :
...
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<!-- Add this Line -->
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/>
<!-- -->
<copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar" todir="${build.dir}/lib" verbose="true"/>
...
Change
lib/commons-lang-2.5.jar
to
lib/commons-lang-2.6.jar
Add
lib/commons-collections-${commons-collections.version}.jar,
[Change 2]
Find ~/hadoop2x-eclipse-plungin-master/ivy/libraries.properties
Add
commons-collections.version=3.2.1
2.Jar file
$cd src/contrib/eclipse-plugin
$ant jar -Dversion=2.4.1 -Declipse.home=/home/hdp2/eclipse -Dhadoop.home=/home/hdp2/hadoop-2.4.1
Then the jar file will appear in the ~/hadoop2x-eclipse-plungin-master/build/contrib/eclipse-plugin
# -Dversion : the installed Hadoop version
-Declipse.home : the dir of ECLIPSE_HOME
-Dhadoop.home : the dir of HADOOP_HOME
3. Move the plugin to eclipse/plugins
$cp build/contrib/eclipse-plugin/hadoop-eclipse-plugin.2.4.1.jar /home/hdp2/eclipse/plugins
4. Start eclipse with debug parameter:
$ eclipse/eclipse -clean -consolelog -debug
[備註] 此指令很重要,若eclipse開啟後點選 map/reduce location 沒反應,可以從terminal去看error info!
5. Eclipse開啟後
- Windows > Open Perspective > Other > Map/Reduce
- Windows > Show View > Other > MapReduce Tools > Map/Reduce Locations
- 點選右下角藍色大象 > 設定Hadoop server Location
[備註] 若點選大象時,沒跳出設定畫面,請查詢terminal上面錯誤訊息為何
- Setting:
Location name : master
MapReduce Master >> Host: 192.168.0.7 Port: 8032
DFS Master >> Host:(use M/R Master host) Port:9000
User name : hduser (hadoop user name)
2014年10月1日 星期三
Eclipse Submit Job from Hadoop Client to remote Hadoop Master
[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
[Problem]
If you want to submit job from local side to remote Hadoop server through runngin java application on eclipse directly without send jar file to Hadoop server(ex. eclipce on 192.168.0.51 and Hadoop master on 192.168.0.7)
1. Point to the jar file which will be create by the following step
add following code to your which set conf. to point to the jar file which will be create by the following step
conf.set("mapred.jar", "/home/hdp2/workspace/HBaseGet/HbaseGet_v3.jar");
[Reference]
http://stackoverflow.com/questions/21793565/class-not-found-exception-in-eclipse-wordcount-program
[NOTICE 1]
The "/home/hdp2/workspace/HBaseGet/HbaseGet_v3.jar" is the jar file location at local side
[NOTICE 2]
If you don't do this step, Eclipse may occur following error:
[Error]
... Class org.apache.... Map not found ...
2. Set yarn master locaiton
add following to " yarn-site.xml "
Select "Run Configuration" > "Classpath" > "Advanced" > "Add External Folder" > Select the Hadoop and HBase conf folder(ex. $HADOOP_HOME/etc/hadoop and $HBASE_HOME/conf)
[IMPORTANT !]
Conf settings in Both Hadoop and HBase conf folder MUST be consistent with the conf in the Hadoop and HBase master!!
# Simple method is copy the conf folder from master to local
[Notice]
Because the Eclipse is running locally without the correct server configurations,
so if you don't do this step, it will occur following errors:
[Error 1]
Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
[Error 2]
"eclipse" ... Retrying connect to server: 0.0.0.0/0.0.0.0:8032 ....
4. Export Jar File
"File" > "Export..." > "Java" > "JAR file" > select resource to export > "Finish"
5. Run Application
"Run Configuration" > Set "Arguments" > "Run"
(6. Staging problem ( Multiple users in Hadoop )
If you are the Hadoop client not master(ex. client_username="hdp2", Hadoop_username="hduser"), it may occur error when execute app like:
2012-10-09 10:06:31,233 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=READ_EXECUTE, inode=”system”:mapred:supergroup:rwx——
[NEED TO DO]
http://amalgjose.wordpress.com/2013/02/09/setting-up-multiple-users-in-hadoop-clusters/
1. Create a new user(hdp2) at Hadoop master server
$sudo adduser hdp2
2. Create a user for client on HDFS
$hdfs dfs -mkdir /user/hdp2
3. Using new username(hdp2) to login Hadoop Master server, and execute any example jar(such as WordCount)
It will create the staging folder of new user in HDFS/tmp/
(Don't know why......)
4. Change priority
$hdfs dfs –chown –R hdp2:hadoop /tmp/hadoop-yarn/staging/hdp2/
$hdfs dfs -chmod 777 /tmp/
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
[Problem]
If you want to submit job from local side to remote Hadoop server through runngin java application on eclipse directly without send jar file to Hadoop server(ex. eclipce on 192.168.0.51 and Hadoop master on 192.168.0.7)
1. Point to the jar file which will be create by the following step
add following code to your which set conf. to point to the jar file which will be create by the following step
conf.set("mapred.jar", "/home/hdp2/workspace/HBaseGet/HbaseGet_v3.jar");
[Reference]
http://stackoverflow.com/questions/21793565/class-not-found-exception-in-eclipse-wordcount-program
[NOTICE 1]
The "/home/hdp2/workspace/HBaseGet/HbaseGet_v3.jar" is the jar file location at local side
[NOTICE 2]
If you don't do this step, Eclipse may occur following error:
[Error]
... Class org.apache.... Map not found ...
2. Set yarn master locaiton
add following to " yarn-site.xml "
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
3. Set Configuration for Client Side<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
Select "Run Configuration" > "Classpath" > "Advanced" > "Add External Folder" > Select the Hadoop and HBase conf folder(ex. $HADOOP_HOME/etc/hadoop and $HBASE_HOME/conf)
[IMPORTANT !]
Conf settings in Both Hadoop and HBase conf folder MUST be consistent with the conf in the Hadoop and HBase master!!
# Simple method is copy the conf folder from master to local
[Notice]
Because the Eclipse is running locally without the correct server configurations,
so if you don't do this step, it will occur following errors:
[Error 1]
Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
[Error 2]
"eclipse" ... Retrying connect to server: 0.0.0.0/0.0.0.0:8032 ....
4. Export Jar File
"File" > "Export..." > "Java" > "JAR file" > select resource to export > "Finish"
5. Run Application
"Run Configuration" > Set "Arguments" > "Run"
(6. Staging problem ( Multiple users in Hadoop )
If you are the Hadoop client not master(ex. client_username="hdp2", Hadoop_username="hduser"), it may occur error when execute app like:
2012-10-09 10:06:31,233 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=READ_EXECUTE, inode=”system”:mapred:supergroup:rwx——
[NEED TO DO]
http://amalgjose.wordpress.com/2013/02/09/setting-up-multiple-users-in-hadoop-clusters/
1. Create a new user(hdp2) at Hadoop master server
$sudo adduser hdp2
2. Create a user for client on HDFS
$hdfs dfs -mkdir /user/hdp2
3. Using new username(hdp2) to login Hadoop Master server, and execute any example jar(such as WordCount)
It will create the staging folder of new user in HDFS/tmp/
(Don't know why......)
4. Change priority
$hdfs dfs –chown –R hdp2:hadoop /tmp/hadoop-yarn/staging/hdp2/
$hdfs dfs -chmod 777 /tmp/
2014年9月30日 星期二
Hadoop-2.4.1 Example(WordCount) on Eclipse
[Software]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
1. Open a Map/Reducer Project
2. Add lib jar:
- Right click the project > Build Path > Configure Build Path > Java Build Path > Libraries
> Add External JARS (including jars in following dir):
- share/hadoop/common
- share/hadoop/common/lib
- share/hadoop/mapreduce
- share/hadoop/mapreduce/lib
- share/hadoop/yarn
- share/hadoop/yarn/lib
--------additional-----------
- HDFS lib
- HBase lib
3. On this project, add new:
- Mapper: Mp.java
- Reducer: Rd.java
- MapReduce Driver: WC.java
4. Create jar file
- File > Export > JAR file
- Select resources and Jar File Loctaion
5. Run application
- Seclect "Run Configurations" > Check "Java Application", "Name", "Project", "Mainclass"
- Enter "Arguments" > add "file1 Output" in Program arguments
[備註] 因為main裡面沒有指定 input output, 所以這邊必須設定給app,
相當於用terminal 執行 $ hadoop jar project.jar file1 Output1 ,
如果不加路徑,預設input 及output位置在本機的 $ECLIPSE_WORKSPACE/PROJECT_FOLDER
- Click "Run"
[ 問題 ] 如何Run appliction 於現有的Hadoop系統上,而非 Local端
===> 2014/09/18 目前測試,必須export jar file 丟到master上運行是OK的。
[Solution]
Hadoop2.4.1
Eclipse IDE for Java Developers Luna Release (4.4.0)
1. Open a Map/Reducer Project
2. Add lib jar:
- Right click the project > Build Path > Configure Build Path > Java Build Path > Libraries
> Add External JARS (including jars in following dir):
- share/hadoop/common
- share/hadoop/common/lib
- share/hadoop/mapreduce
- share/hadoop/mapreduce/lib
- share/hadoop/yarn
- share/hadoop/yarn/lib
--------additional-----------
- HDFS lib
- HBase lib
3. On this project, add new:
- Mapper: Mp.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class Mp extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String line = ivalue.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
public class Mp extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String line = ivalue.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
- Reducer: Rd.java
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class Rd extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text _key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// process values
int sum =0;
for(IntWritable v : values){
sum += v.get();
}
context.write(_key, new IntWritable(sum));
}
}
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
public class Rd extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text _key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// process values
int sum =0;
for(IntWritable v : values){
sum += v.get();
}
context.write(_key, new IntWritable(sum));
}
}
- MapReduce Driver: WC.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WC {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
@SuppressWarnings("deprecation")
Job job = new Job(conf, "wordcount");
//Job job = Job.getInstance(conf, "wordcount");
job.setJarByClass(WC.class);
// TODO: specify a mapper
job.setMapperClass(Mp.class);
// TODO: specify a reducer
job.setReducerClass(Rd.class);
// TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
if (!job.waitForCompletion(true))
return;
}
}
4. Create jar file
- File > Export > JAR file
- Select resources and Jar File Loctaion
5. Run application
- Seclect "Run Configurations" > Check "Java Application", "Name", "Project", "Mainclass"
- Enter "Arguments" > add "file1 Output" in Program arguments
[備註] 因為main裡面沒有指定 input output, 所以這邊必須設定給app,
相當於用terminal 執行 $ hadoop jar project.jar file1 Output1 ,
如果不加路徑,預設input 及output位置在本機的 $ECLIPSE_WORKSPACE/PROJECT_FOLDER
- Click "Run"
[ 問題 ] 如何Run appliction 於現有的Hadoop系統上,而非 Local端
===> 2014/09/18 目前測試,必須export jar file 丟到master上運行是OK的。
[Solution]
訂閱:
文章 (Atom)