Run Test Case on Spark，casespark

文章由LinuxBoy分享于2019-03-27 05:03:27热评（564）

Run Test Case on Spark，casespark

今天有哥们问到如何对Spark进行单元测试。现在将Sbt的测试方法写出来，如下：

对Spark的test case进行测试的时候可以用sbt的test命令：

一、测试全部test case

sbt/sbt test

二、测试单个test case

sbt/sbt "test-only *DriverSuite*"

下面举个例子：

这个Test Case是位于$SPARK_HOME/core/src/test/scala/org/apache/spark/DriverSuite.scala

FunSuit是scalatest里面的测试Suit，要继承它。这里主要是一个回归测试，测试Spark程序正常结束后，Driver会不会正常退出。

注：我就拿这个例子模拟一下，测试成功和测试失败的情景，这个例子和DriverSuite的测试目的完全不一致，只是演示作用。：）

下面是正常运行退出的例子：

package org.apache.spark

import java.io.File

import org.apache.log4j.Logger
import org.apache.log4j.Level

import org.scalatest.FunSuite
import org.scalatest.concurrent.Timeouts
import org.scalatest.prop.TableDrivenPropertyChecks._
import org.scalatest.time.SpanSugar._

import org.apache.spark.util.Utils

import scala.language.postfixOps

class DriverSuite extends FunSuite with Timeouts {

  test("driver should exit after finishing") {
    val sparkHome = sys.env.get("SPARK_HOME").orElse(sys.props.get("spark.home")).get
    // Regression test for SPARK-530: "Spark driver process doesn't exit after finishing"
    val masters = Table(("master"), ("local"), ("local-cluster[2,1,512]"))
    forAll(masters) { (master: String) =>
      failAfter(60 seconds) {
        Utils.executeAndGetOutput(
          Seq("./bin/spark-class", "org.apache.spark.DriverWithoutCleanup", master),
          new File(sparkHome),
          Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome))
      }
    }
  }
}

/**
 * Program that creates a Spark driver but doesn't call SparkContext.stop() or
 * Sys.exit() after finishing.
 */
object DriverWithoutCleanup {
  def main(args: Array[String]) {
    Logger.getRootLogger().setLevel(Level.WARN)
    val sc = new SparkContext(args(0), "DriverWithoutCleanup")
    sc.parallelize(1 to 100, 4).count()
  }
}

executeAndGetOutput方法接受一个command命令，调用spark-class来运行DriverWithoutCleanup类。

 /**
   * Execute a command and get its output, throwing an exception if it yields a code other than 0.
   */
  def executeAndGetOutput(command: Seq[String], workingDir: File = new File("."),
                          extraEnvironment: Map[String, String] = Map.empty): String = {
    val builder = new ProcessBuilder(command: _*) 
        .directory(workingDir)
    val environment = builder.environment()
    for ((key, value) <- extraEnvironment) {
      environment.put(key, value)
    }
    val process = builder.start()  //启动一个进程来运行spark job
    new Thread("read stderr for " + command(0)) {
      override def run() {
        for (line <- Source.fromInputStream(process.getErrorStream).getLines) {
          System.err.println(line)
        }
      }
    }.start()
    val output = new StringBuffer
    val stdoutThread = new Thread("read stdout for " + command(0)) { //读取spark job的输出
      override def run() {
        for (line <- Source.fromInputStream(process.getInputStream).getLines) {
          output.append(line)
        }
      }
    }
    stdoutThread.start()
    val exitCode = process.waitFor()
    stdoutThread.join()   // Wait for it to finish reading output
    if (exitCode != 0) {
      throw new SparkException("Process " + command + " exited with code " + exitCode)
    }
    output.toString //返回spark job的输出
  }

运行第二个命令可以看到运行结果：

sbt/sbt "test-only *DriverSuite*"

执行结果：

[info] Compiling 1 Scala source to /app/hadoop/spark-1.0.1/core/target/scala-2.10/test-classes...
[info] DriverSuite: //执行DriverSuit这个TestSuit
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:15 WARN spark.SparkConf: 
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:15 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:20:19 WARN spark.SparkConf: 
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:20:19 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Spark assembly has been built with Hive, including Datanucleus jars on classpath
[info] - driver should exit after finishing
[info] ScalaTest
[info] Run completed in 12 seconds, 586 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 76 s, completed Aug 14, 2014 6:20:26 PM

测试通过， Total 1， Failed 0， Errors 0， Passed 1。

这里如果我们稍微将test case 改改，让spark job抛异常，那么这个，这样test case 就会failed掉，如下：

object DriverWithoutCleanup {
  def main(args: Array[String]) {
    Logger.getRootLogger().setLevel(Level.WARN)
    val sc = new SparkContext(args(0), "DriverWithoutCleanup")
    sc.parallelize(1 to 100, 4).count()
    throw new RuntimeException("OopsOutOfMemory, haha, not real OOM, don't worry!") //添加此行
  }

那么，再次运行测试：

会发现错误

 [info] DriverSuite:
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/lib_managed/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/hadoop/spark-1.0.1/assembly/target/scala-2.10/spark-assembly-1.0.1-hadoop0.20.2-cdh3u5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/08/14 18:40:07 WARN spark.SparkConf: 
SPARK_CLASSPATH was detected (set to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.executor.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
14/08/14 18:40:07 WARN spark.SparkConf: Setting 'spark.driver.extraClassPath' to '/home/hadoop/src/hadoop/lib/:/app/hadoop/sparklib/*:/app/hadoop/spark-1.0.1/lib_managed/jars/*' as a work-around.
Exception in thread "main" java.lang.RuntimeException: OopsOutOfMemory, haha, not real OOM, don't worry! //自定义抛异常使spark job运行失败，打印出了异常堆栈，测试用例失败
        at org.apache.spark.DriverWithoutCleanup$.main(DriverSuite.scala:60)
        at org.apache.spark.DriverWithoutCleanup.main(DriverSuite.scala)
[info] - driver should exit after finishing *** FAILED ***
[info]   SparkException was thrown during property evaluation. (DriverSuite.scala:40)
[info]     Message: Process List(./bin/spark-class, org.apache.spark.DriverWithoutCleanup, local) exited with code 1
[info]     Occurred at table row 0 (zero based, not counting headings), which had values (
[info]       master = local
[info]     )
[info] ScalaTest
[info] Run completed in 4 seconds, 765 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed: Total 1, Failed 1, Errors 0, Passed 0
[error] Failed tests:
[error]         org.apache.spark.DriverSuite
[error] (core/test:testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 14 s, completed Aug 14, 2014 6:40:10 PM

可以看到TEST FAILED。

三、总结：

本文主要讲解了，如何运行spark的测试用例，运行全部test case，和运行单个test case的命令，并通过一个例子讲解其运行正常和失败的详细情景，具体细节还需要继续摸索。如果想做contributor，这一关必须过了。

——EOF——

原创文章，转载请注明，出自http://blog.csdn.net/oopsoom/article/details/38555173

selenium 中怎run已失败的case

在我们的CI环境下的自动化项目，加之webservice test case 已经有了6个节点了，由于每天忙于其他，一直没有关注其中有一个run failed test case的节点，偶然看到，发现之前写的build脚本完全是shit，我不知道这个东西是如何一直存在下来的，先不说是否完美，单单是主要功能，就没有实现，或许是开始做这个任务的队友没有了解我的目的是什么。 ok, 4天之后，我看到了一个极为瘫痪的ant脚本，至此我不得不去自己关注这个问题。我们令人欣喜的使用了selenium grid和testng的集成来使得所有的case可以支持多任务并发，ok，被CI执行过服务器里，找到我们的项目，看看到底生成了什么？我们只看重要的，embedded.html 打开一看，哇塞，testng模式的自动化报告生成，再一看失败了30个case，咋办呢？看看detail吧，o shit，timeout，鄙视一下美国的service以及各种环境。我看到这30个失败的case但我依然对代码有信心，因为我认为大部分问题是环境以及机器性能导致的。我需要一键触发我的失败用例。再看另一个有价值的东西，${basedir}/target/reports/testng-failed.xml，这个也是很重要的信息，记录了所有失败的用例。由于开始的种种原因，整个项目的所有case 我没有放在xml文档中，而是单独建立了一个class来让这些case自得其所，进行统一管理，当然起初我在这样设计的时候固然是有其他的考虑在里面的，易定位，方便调试等。当然testng可以很好的来execute装在xml里的case，这也是我最后解决这个问题的灵感。????../reports/testng-failed.xml失败则存在，成功则不生成，我的队友在做这个run failed需求的时候，只是想着如何把testng-failed.xml里面的case抽取出来，这导致最后各种问题的不能解决。??ok 主要问题一下子解决了，大家可以看到我在一个参数的设计上做了一个变化。但是大家思考一下，主要问题解决了之后衍生出一个新问题，如果最后run了多次，经过各种修改，所有的失败case全都通过了，那么这个时候CI依然会报错。这是因为runtime.AnalysisRunFailedReport这个小工具只会分析是否生成了testng-failed.xml，../reports下的testng-failed.xml只会保留最后一次出错时的记录。没有关系，一句话搞定问题目前为止已经全部解决了。???? ??其实很多问题不是太难解决，只是我们在解决问题前不能一味的去埋头解决，而不关乎方式，我要的当然仅仅是一个结果，我上司也是一样，问题是我们如何去分析问题，从而四两拨千斤的去解决他，我觉得这个比有多少年开发经验或者是编程能力神马的要重要的多。先思考，在做事。??????

run on的用法

run on
继续，继续下去；连续不断；流逝；涉及

run on
1. 继续行进；继续航行：
The boat ran on smoothly.
小船顺利地继续向前航行。
2. 喋喋讲个不休：
She will run on for hours about her romantic deeds.
提起她的风流韵事她能连续讲好几个小时。
3. (时间)流逝：
How time runs on!
时光过得多快呀!
4. 继续下去：
Don't interrupt him.Let him run on.
别打断他的话，让他讲下去。
5. 连写(字母等)：
The pupils are learning to run their letters on.
学生们正在学习把字母连起来写。
6. 靠(某种动力或燃料)运转：
This kind of walking tractor runs on diesel oil.
这种手扶拖拉机靠柴油运转。
7. (使)撞在…上：
The ship ran on rocks.
轮船触礁了。
8. (谈话等)涉及；(脑子里)总是想着：
His talk ran on recent developments in science and technology.
他的讲话涉及科技的新动态。
Her mind keeps running on the college entrance examination.
她的脑子里总是想着高考。

推荐文章：

Run Test Case on Spark，casespark