本篇内容介绍了“Spark Eclipse开发环境的搭建方法”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
创新互联建站坚持“要么做到,要么别承诺”的工作理念,服务领域包括:成都网站设计、网站建设、企业官网、英文网站、手机端网站、网站推广等服务,满足客户于互联网时代的海盐网站设计、移动媒体设计的需求,帮助企业找到有效的互联网解决方案。努力成为您成熟可靠的网络建设合作伙伴!
首先下载与集群 Hadoop 版本对应的 Spark 编译好的版本,解压缩到指定位置,注意用户权限
进入解压缩之后的 SPARK_HOME 目录
配置 /etc/profile 或者 ~/.bashrc 中配置 SPARK_HOME
cd $SPARK_HOME/conf cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SCALA_HOME=/home/hadoop/cluster/scala-2.10.5 export JAVA_HOME=/home/hadoop/cluster/jdk1.7.0_79 export HADOOP_HOME=/home/hadoop/cluster/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #注意这个地方一定要指定为IP,否则下面的eclipse去连接的时候会报: #All masters are unresponsive! Giving up. 这个错误的。 SPARK_MASTER_IP=10.16.112.121 SPARK_LOCAL_DIRS=/home/hadoop/cluster/spark-1.4.0-bin-hadoop2.6 SPARK_DRIVER_MEMORY=1G
sbin/start-master.sh sbin/start-slave.sh
此时可以在浏览器中输入:http://yourip:8080 查看Spark集群的情况
此时默认的 Spark-Master为: spark://10.16.112.121:7077
首先下载 Scala-Eclipse IDE 去 scala 官网下载即可
打开IDE, 新建 Maven 项目, pom.xml 填写如下:
4.0.0 spark.test FirstTrySpark 0.0.1-SNAPSHOT 2.6.0 1.4.0 org.apache.hadoop hadoop-client ${hadoop.version} provided javax.servlet * org.apache.hadoop hadoop-common 2.6.0 org.apache.hadoop hadoop-mapreduce-client-jobclient 2.6.0 org.apache.spark spark-core_2.10 ${spark.version} src/main/java net.alchim31.maven scala-maven-plugin 3.2.0 compile testCompile 2.10 org.apache.maven.plugins maven-assembly-plugin 2.5.5 jar-with-dependencies package single org.apache.maven.plugins maven-compiler-plugin 1.7 src/main/resources
新建几个 Source Folder
src/main/java #编写 java 代码 src/main/scala #编写 scala 代码 src/main/resources #存放资源文件 src/test/java #编写测试 java 代码 src/test/scala #编写测试 scala 代码 src/test/resources #存放资源文件
此时环境全部搭建完毕!
测试代码如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark.SparkContext /** * @author clebeg */ object FirstTry { def main(args: Array[String]): Unit = { val conf = new SparkConf conf.setMaster("spark://yourip:7077") conf.set("spark.app.name", "first-tryspark") val sc = new SparkContext(conf) val rawblocks = sc.textFile("hdfs://yourip:9000/user/hadoop/linkage") println(rawblocks.first) } }
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
分析问题:点开运行ID对应的运行日志发现下面的错误:
15/10/10 08:49:01 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/10/10 08:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/10 08:49:02 INFO spark.SecurityManager: Changing view acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: Changing modify acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, Administrator); users with modify permissions: Set(hadoop, Administrator) 15/10/10 08:49:02 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/10/10 08:49:02 INFO Remoting: Starting remoting 15/10/10 08:49:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.16.112.121:58708] 15/10/10 08:49:02 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58708. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more 15/10/10 08:51:02 INFO util.Utils: Shutdown hook called
仔细一看原来是权限的问题:立马关闭 Hadoop, 在 etc/hadoop/core-site.xml 中添加:
hadoop.security.authorization false
设置任何人都可以读取,问题立马搞定。
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
到地址http://www.barik.net/archive/2015/01/19/172716/ 下载包含 winutils.exe 的 hadoop2.6 重新编译的版本。注意一定要下载对应自己的Hadoop版本。
减压缩到指定位置,设置 HADOOP_HOME 环境变量。注意一定要重新启动 eclipse。 搞定!
本文中提到的数据在哪里获取? http://bit.ly/1Aoywaq 操作代码如下:
mkdir linkage cd linkage/ curl -o donation.zip http://bit.ly/1Aoywaq unzip donation.zip unzip "block_*.zip" hdfs dfs -mkdir /user/hadoop/linkage hdfs dfs -put block_*.csv /user/hadoop/linkage
“Spark Eclipse开发环境的搭建方法”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注创新互联网站,小编将为大家输出更多高质量的实用文章!