spark集群运行错误

来源:2-13 使用IDEA和Maven开发第一个Spark应用程序

慕容3565349

2022-07-03

package com.wsx.spark

import org.apache.spark.{SparkConf, SparkContext}

object SparkTest {

  val SPARK_URL = "spark://192.168.72.132:7077"
  val HADOOP_URI = "hdfs://192.168.72.132:9000"
  val FILE_PATH = "/input/wc.input"
  val OUTPUT = "output"

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("SparkHomework").setMaster(SPARK_URL)
    conf.setJars(Seq("C:\\Users\\wusx\\Desktop\\wsx\\Code\\review\\hadoop-train\\target\\hadoop-train-1.0.jar"))

    val sc = new SparkContext(conf)
    println(sc)
    println(HADOOP_URI + FILE_PATH)

    val counts = sc.textFile(HADOOP_URI+FILE_PATH).flatMap(line => line.split(","))
      .foreach(println)
//      .map(x => (x, 1))
//      .reduceByKey((x, y) => x + y)
    //    判断文件是否已经存在
    //    val file = new File(OUTPUT)
    //    if(file.exists) {
    //      file.delete()
    //    }
//    counts.foreach(println)

    sc.stop()

  }

}

错误:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (192.168.72.134 executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

老师请问这个问题怎么解决啊?

写回答

1回答

Michael_PK

2022-07-04

强烈建议:写代码的时候不要连到比如standalone或者yarn上去,而是直接使用local模式。

测试通过后,打包再提交到不同的运行模式上去。

你按照上课的方式先把代码测试通过,看看有什么异常贴出来

0
2
Michael_PK
回复
慕容3565349
代码里面不要带任何硬编码,比如master啥的,全部在sparksubmit提交的时候指定
2022-07-07
共2条回复

SparkSQL入门 整合Kudu实现广告业务数据分析

大数据工程师干货课程 带你从入门到实战掌握SparkSQL

540 学习 · 192 问题

查看课程