第2章Spark集群搭建 Ver1.4-20230515

Imagemap
第2章Spark集群搭建
Ver1.4-20230515搭建Spark完全分布式环境学习视频启动和检查Hadoop集群启动虚拟机和hadoop集群启动软件安装安装spark&scalaSpark配置python3&Spark配置启动与测试Spark启动&Pi计算其它参考视频视频一软件准备spark官网下载: http://spark.apache.orghttps://spark.apache.org/downloads.htmlhttps://mirrors.tuna.tsinghua.edu.cn/apa ...国内镜像https://mirrors.tuna.tsinghua.edu.cn/apa ...https://mirrors.tuna.tsinghua.edu.cn/apa ...wget -c https://dlcdn.apache.org/spark/s ...HDD下载http://home.hddly.cn:90/soft/hadoop/spar ...wget -c http://home.hddly.cn:90/soft/had ...tar zxvf ./spark-3.2.4-bin-hadoop3.2.tgzscale根据配套关系,Spark3.1.2->Scala2.12官网下载:https://www.scala-lang.org/download ...wget https://downloads.lightbend.com/sca ...python3官网下载:wget https://www.python.org/ftp/pyt ...镜像下载:wget https://cdn.npmmirror.com/bina ...pysparkpip install pyspark -i http://mirrors.al ...官网下载:https://pypi.org/project/pyspark/3. ...镜像下载:https://mirrors.tuna.tsinghua.edu.c ...mongdb官方文档https://docs.mongoing.com/mongodb-spark官方文档https://www.mongodb.com/docs/spark-conne ...版本关系MongoDB Connector for Spark : 10.0.3
Spa ...mysql准备mysql-connectormysql-connector-java-5.1.45-bin.jarwget -c  http://bigdata.hddly.cn/b46488/ ...mysql-connector-java-5.1.48-bin.jarwget -c https://mirrors.tuna.tsinghua.ed ...软件版本配套参考:https://spark.apache.org/docs/3.1.2/Spark runs on 
Java 8/11, 
Scala 2.12.x, ...软件安装1,通过 secureCRT进入Hadoop集群2,运行批命令安装Sparkwget https://mirrors.tuna.tsinghua.edu.c ...3,运行批命令安装Scalacd /root/hadoop
wget https://downloads.l ...4,安装pyspark方法一yum install -y python3
pip3 install pysp ...方法二wget https://mirrors.tuna.tsinghua.edu.c ...5,配置Scala&Spark
环境变量vi /etc/profile 输入i进入编辑状态,然后粘贴下面几行到文件尾部:export SCALA_HOME=/usr/local/scala
expor ...使配置生效 source /etc/profile6,验证scala版本[hadoop@master ~]$ java -version
openjdk ...Spark配置1,修改conf下的配置文件cd /usr/local/spark/conf
cp ./spark-env. ...export JAVA_HOME=/usr/lib/jvm/java-1.8.0 ...cp ./workers.template ./workers
vi ./wor ...在hadoop3中是workers,在hadoop2中是slavesc22
c23视集群的从机进行修改,如
slave1
slave2cp ./spark-defaults.conf.template ./spar ...spark.master                     spark:/ ...2,复制spark到从机scp  -r /usr/local/scala/ c22:/usr/local ...scp   -r /usr/local/spark/ c22:/usr/loca ...3,复制系统环境到从机 scp  /etc/profile c22:/etc/
 scp  /etc/ ...然后到各从机使用生效配置:
source /etc/profile3,在hdfs 上创建spark日志目录hdfs dfs -mkdir /spark-logs4,常见问题如何修改ssh默认端口参考常见问题启动Spark主机/usr/local/spark/sbin/start-all.sh jps验证,会多了个Master进程
[hadoop@master ~]$ jp ...从机:jps验证,会多了个Worker进程
20374 Jps
19897 DataN ...jps若没有显示worker进程,在从机上执行:
/usr/local/spar ...测试Spark查看版本能显示版本号spark-shell运行结果截图验证Web是否可以打开http://master:8080运行结果截图/usr/local/spark/bin/run-example SparkPi ...运行结果截图其它参考python3.8安装安装依赖:
sudo yum -y install gcc zlib zlib- ...测试python3[root@master Python-3.8.13]# python3
Pyt ...spark 读取mongodb官方参考https://www.mongodb.com/docs/spark-conne ...https://spark.apache.org/third-party-pro ...https://www.javadoc.io/doc/org.mongodb.s ...https://github.com/mongodb/mongo-spark#d ...pycharm中使用pyspark在windows的cmd上安装 pyspark:
pip3 install py ...集群上安装mongdo-connect
连接connect下载 cd /usr/local/spark/jars/
wget http://bi ...wget https://repo1.maven.org/maven2/org/ ...mkdir -p /home/hadoop/python
vi ./mon.py ...from pyspark.sql import SparkSession
fro ...cd /usr/local/spark/bin
spark-submit /ho ...参考:命令方式:
pysparkpyspark --conf "spark.mongodb.read.conne ...如图Spark支持多种运行模式本地运行模式 (单机)本地伪集群运行模式(单机模拟集群)Standalone Client模式(集群)Standalone Cluster模式(集群)YARN Client模式(集群)YARN Cluster模式(集群)基于yarn搭建sparkhttps://spark.apache.org/docs/3.1.2/runn ...常见问题启动workder异常woker日志异常cd /usr/local/spark/logs
more ./spark-ha ...WARN Utils: Service 'sparkWorker' could  ...原因:./spark-env.sh配置错误,设置了locaip,由于scp分发到从机后 ...处理修改spark-env.sh,删除localip项/spark-defaults.conf增加spark.ui.port      ...主机上start-all.sh未能拉起从机从机start-worker.sh未能拉机从机进程,无错误信息spark->logs下的日志文件内容偏少,仅一两行日志内容[root@c22 logs]# more ./spark-root-org.a ...Spark连Mongodb 报Unspecialised MongoConfig异常错误信息java.lang.UnsupportedOperationException: ...截图处理使用mongo-spark-connector_2.12-3.0.2.jar#format使用mongo: 
   device_statis_df = m ...ModuleNotFoundError: No module named ‘_c ...错误截图处理yum install libffi-devel -y    ----需要换成默 ...java.lang.NoSuchFieldError: DECIMAL128错误截图版本历史Ver1.3-20220907初始版本Ver1.4-20230515更新spark版本
hide
第2章Spark集群搭建
Ver1.4-20230515
hide
搭建Spark完全分布式环境
hide
软件准备
hide
软件安装
hide
Spark配置
hide
其它参考
hide
spark 读取mongodb
hide
常见问题