Hadoop Java jar如何运行与调试？-杰瑞科技汇

Of course! The hadoop java jar command is the fundamental way to execute a Java JAR file on a Hadoop cluster. It's the bridge between your compiled Java code and the Hadoop Distributed File System (HDFS) and the MapReduce/YARN execution framework.

（图片来源网络，侵删）

Let's break it down from the basics to advanced usage.

The Basic Command Structure

The core command is straightforward:

hadoop jar <your-jar-file.jar> <main-class> [main-args]

hadoop: The Hadoop command-line interface script.
jar: The subcommand telling Hadoop to execute a Java JAR file.
<your-jar-file.jar>: The path to your compiled Java archive (JAR file) containing your MapReduce (or other Hadoop) job.
<main-class>: The fully qualified name of the main class inside your JAR file that contains the public static void main(String[] args) method. This is the entry point for your application.
[main-args]: Any arguments you want to pass to your main method.

A Complete Step-by-Step Example

Let's create a simple "Word Count" application and run it.

Step 1: Write the Java Code (`WordCount.java`)

This is the classic example. It reads text files, splits them into words, counts the occurrences of each word, and writes the output.

（图片来源网络，侵删）

// WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
  // Mapper Class
  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
  // Reducer Class
  public static class IntSumReducer
       extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
  // Main Method
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Step 2: Compile and Package the Code

You need Hadoop's libraries in your classpath to compile. The easiest way is to use the hadoop classpath.

# Create a directory for your project
mkdir wordcount
cd wordcount
# Save the code above as WordCount.java
# Compile the Java file
# The -cp flag adds all Hadoop libraries to the classpath
javac -cp $(hadoop classpath) WordCount.java
# Package the compiled .class files into a JAR file
# The 'e' flag specifies the entry point (main class)
jar -cvfe WordCount.jar WordCount *.class
# You should now have a WordCount.jar file
ls -l

Step 3: Prepare Input Data on HDFS

Let's put some sample text files into HDFS.

# Start Hadoop (if not already running)
start-dfs.sh
start-yarn.sh
# Create an input directory in HDFS
hdfs dfs -mkdir -p /user/hadoop/input
# Put some local text files into HDFS
# For example, if you have files file1.txt, file2.txt in your local directory
hdfs dfs -put file1.txt /user/hadoop/input/
hdfs dfs -put file2.txt /user/hadoop/input/
# Verify the files are in HDFS
hdfs dfs -ls /user/hadoop/input

Step 4: Run the Job with `hadoop jar`

Now, execute the JAR file. We need to provide the input and output paths as arguments to our main method.

# Run the WordCount job
# The output directory must NOT exist before running the job.
hadoop jar WordCount.jar WordCount /user/hadoop/input /user/hadoop/output

Explanation of the command:

（图片来源网络，侵删）

hadoop jar WordCount.jar: Execute the WordCount.jar file.
WordCount: The name of the main class to run.
/user/hadoop/input: The first argument passed to main, which our code uses as the input path.
/user/hadoop/output: The second argument passed to main, which our code uses as the output path.

Step 5: Check the Results

After the job completes, you can see the results in the output directory on HDFS.

# Check the job's output directory
hdfs dfs -cat /user/hadoop/output/part-r-00000
# Example output:
# hadoop    2
# hello     1
# world     2

You can also view the Hadoop UI (usually at http://<namenode-host:8088) to see job status, logs, and counters.

Advanced Options and Common Flags

The hadoop jar command supports many options to control job execution. You can see all of them with hadoop jar --help.

Here are the most important ones:

Specifying a Different Main Class

If your JAR has multiple main classes, you can specify which one to use without re-packaging the JAR using the -D flag.

hadoop jar MyBigApp.jar -Dmapreduce.job.main.class=com.company.alternative.MainClass

Managing Resources

You can control how much memory and CPU your job uses.

Memory for Map and Reduce Tasks:

# Set map task memory to 2GB and reduce task memory to 4GB
hadoop jar MyJob.jar MyJob /input /output \
  -D mapreduce.map.memory.mb=2048 \
  -D mapreduce.reduce.memory.mb=4096

Number of Map and Reduce Tasks:

# Set the number of map tasks to 50 and reduce tasks to 10
hadoop jar MyJob.jar MyJob /input /output \
  -D mapreduce.job.maps=50 \
  -D mapreduce.job.reduces=10

Debugging and Logging

Verbose Mode: Prints more detailed logs during execution.
```
hadoop jar MyJob.jar MyJob /input /output -verbose
```
Viewing Job Logs: The standard output and error streams from your mappers and reducers are written to log files on the HDFS. You can view them with:
```
# Get the standard output logs
hdfs dfs -cat /user/hadoop/logs/hadoop-hadoop-tasktracker-<hostname>.log.stdout
# Get the error logs
hdfs dfs -cat /user/hadoop/logs/hadoop-hadoop-tasktracker-<hostname>.log.stderr
```
(Note: The exact path may vary slightly depending on your Hadoop version and configuration.)

Managing JAR Dependencies

If your job depends on other libraries, you have two main options:

Package Everything in One Fat JAR (Recommended): Use a build tool like Maven or Gradle with a "shade" or "assembly" plugin. This packages all dependencies into a single, executable JAR.
```
# Example with Maven (after running 'mvn package')
hadoop jar target/my-fat-jar-1.0.jar com.company.MyDriver /input /output
```
Use the -libjars Option: Pass additional JAR files as a comma-separated list. Hadoop will distribute these JARs to the cluster nodes for your job.
```
hadoop jar MyJob.jar MyJob /input /output -libjars /path/to/lib1.jar,/path/to/lib2.jar
```

Modern Alternative: `yarn jar`

For Hadoop 2.x and later, which uses YARN (Yet Another Resource Negotiator) as its resource management layer, the yarn jar command is functionally identical to hadoop jar but is more semantically correct.

# These two commands do the same thing
hadoop jar WordCount.jar WordCount /input /output
yarn jar WordCount.jar WordCount /input /output

In practice, hadoop jar is still widely used and understood, but you might see yarn jar in more modern documentation or scripts.

Hadoop Java jar如何运行与调试？

The Basic Command Structure

A Complete Step-by-Step Example

Step 1: Write the Java Code (`WordCount.java`)

Step 2: Compile and Package the Code

Step 3: Prepare Input Data on HDFS

Step 4: Run the Job with `hadoop jar`

Step 5: Check the Results

Advanced Options and Common Flags

Specifying a Different Main Class

Managing Resources

Debugging and Logging

Managing JAR Dependencies

Modern Alternative: `yarn jar`

99ANYc3cd6

富士mini25拍立得怎么用？

Java中Queue与List如何选择？

Python print格式化有哪些常用方法？

卡西欧eqb501怎么调时间？

Linux下Java环境变量Path如何配置？

Java Socket编程实例具体怎么实现？

python-docx如何高效操作Word文档？

Java字符串如何转16进制数组？

CAD2007三维建模怎么学？新手入门教程指南

Windows下Java如何加载SO库？

MySQL与Java数据类型如何对应转换？

文化视窗高级德语教程1答案怎么找？

Eclipse Python插件哪里下载？

美的油烟机自动清洗怎么操作？

Java int转byte为何会丢失数据？

MongoDB Java更新操作如何实现？

Hadoop Java jar如何运行与调试？

The Basic Command Structure

A Complete Step-by-Step Example

Step 1: Write the Java Code (WordCount.java)

Step 2: Compile and Package the Code

Step 3: Prepare Input Data on HDFS

Step 4: Run the Job with hadoop jar

Step 5: Check the Results

Advanced Options and Common Flags

Specifying a Different Main Class

Managing Resources

Debugging and Logging

Managing JAR Dependencies

Modern Alternative: yarn jar

相关推荐

Java Socket编程实例具体怎么实现？

Step 1: Write the Java Code (`WordCount.java`)

Step 4: Run the Job with `hadoop jar`

Modern Alternative: `yarn jar`