目录
现有的大数据平台版本
组件 | 版本 |
hadoop | 2.7.3 |
hive | 1.2.1 |
因业务需要引入spark组件,引入的为最新版本spark3.0.1。
业务上做测试,将Dataframe数据写入hive中。
scala代码
package com.shenyun.scala.exchange
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
object DfToHive {
def main(args: Array[String]){
System.setProperty("HADOOP_USER_NAME","root");
//初始化参数
val builder = SparkSession.builder()
//本地模式需要设置该行
//也可以VM OPTION设置 :-Dspark.master=local
if(args!=null&&args.size>0&&args(0).equals("dev")){
builder.master("local[2]")
}
builder.appName(this.getClass.getName)
//启用hive
builder.enableHiveSupport()
//设置工作目录,正式环境可以不用设置
// builder.config("spark.sql.warehouse.dir", "D:\\tmp\\spark")
val spark = builder.getOrCreate();
import spark.implicits._
// val studentRDD
// studentRDD=spark.sparkContext.textFile("file:///D:\\workplace\\scala\\scala-spark\\src\\main\\resources\\exchange\\student.data")
val studentRDD = spark.sparkContext.textFile("/user/luoyong/data/student.data")
val head = studentRDD.first()
val studentRDDData = studentRDD
.filter(row => row!=head)
.map(x => x.split("\\|"))
.map(x => Row(x(0),x(1),x(2),x(3)))
val schema=StructType(Array(
StructField ("id", StringType, nullable = true)
,StructField ("name", StringType, nullable = true)
,StructField ("phone", StringType, nullable = true)
,StructField ("email", StringType, nullable = true))
)
val studentDf=spark.createDataFrame(studentRDDData,schema)
studentDf.createOrReplaceTempView("student_tmp");
spark.sql("insert into luoyong.student_ly select id,name,phone,email from student_tmp");
//关闭
spark.stop()
}
}
pom.xml,
注意这种引用方式访问高版本的hive应该是没有问题的(为测试,理论应该是的)
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
WARN metadata.Hive: Failed to register all functions.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
20/12/17 11:19:56 INFO hive.metastore: Connected to metastore.
20/12/17 11:19:56 WARN metadata.Hive: Failed to register all functions.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:260)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:286)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:389)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:225)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:225)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:127)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:157)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:155)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:93)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:93)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupGlobalTempView(SessionCatalog.scala:789)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$.lookupTempView(Analyzer.scala:858)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:840)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:836)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$5(AnalysisHelper.scala:94)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:94)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$.apply(Analyzer.scala:836)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:962)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:934)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:176)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:170)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:116)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:116)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:154)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:133)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:133)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
at com.shenyun.scala.exchange.DfToHive$.main(DfToHive.scala:43)
at com.shenyun.scala.exchange.DfToHiveTest$.main(DfToHiveTest.scala:9)
at com.shenyun.scala.exchange.DfToHiveTest.main(DfToHiveTest.scala)
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy33.getAllFunctions(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
at com.sun.proxy.$Proxy33.getAllFunctions(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
... 71 more
20/12/17 11:19:56 WARN client.HiveClientImpl: HiveClient got thrift exception, destroying client and retrying (23 tries remaining)
主要原因spark版本与hive版本的兼容问题。
spark3.0.1的默认编译的hive版本依赖的包为hive2.3.7
而集群中使用的hive版本为1.2.1
修改pom依赖低版本的hive jar
最终的pom依赖如下,注意scope是provided或者是test。是因为线上集群中在spark3的jars目录有这些包,或者放置。
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<!-- 去掉依赖两个jar的依赖-->
<!-- <dependency>-->
<!-- <groupId>org.apache.spark</groupId>-->
<!-- <artifactId>spark-hive_2.12</artifactId>-->
<!-- <version>3.0.1</version>-->
<!-- <scope>provided</scope>-->
<!-- </dependency>-->
<!-- <dependency>-->
<!-- <groupId>org.apache.spark</groupId>-->
<!-- <artifactId>spark-hive-thriftserver_2.12</artifactId>-->
<!-- <version>3.0.1</version>-->
<!-- <scope>provided</scope>-->
<!-- </dependency>-->
<!-- spark-hive_2.12,及spark-hive-thriftserver_2.12重新依赖,并排除对hive的传递依赖-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.0.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-shims</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.12</artifactId>
<version>3.0.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-cli</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-beeline</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- hive版本降级为1.2.1,spark3原本依赖2.3.7-->
<!-- spark-hive-thriftserver_2.12的传递依赖-->
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-cli</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-beeline</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<!-- spark-hive_2.12的传递依赖-->
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-common</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- hive-exec中传递依赖的commons-lang3需要升级回3.9版本-->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.9</version>
<scope>test</scope>
</dependency>
${SPARK3_HOME}/jars目录包替换。
将处理hive的包替换成一下版本的
包名 | groupId |
artifactId |
version |
hive-cli.1.2.1.spark2.jar | org.spark-project.hive |
hive-cli | 1.2.1.spark2 |
hive-jdbc.1.2.1.spark2.jar | hive-jdbc | ||
hive-beeline.1.2.1.spark2.jar | hive-beeline | ||
hive-common.1.2.1.spark2.jar | hive-common | ||
hive-metastore.1.2.1.spark2.jar | hive-metastore | ||
hive-exec.1.2.1.spark2.jar | hive-exec | ||
没测样例一遍A这真是……题意:每个点都有且仅有一个出边(可以出现自环),然后这样一个点出发就会走过且一定走过某些点。问每个点出发都会走过几个点。首先这是基环树无疑。然后就是裸DP了。这个的关键就是找环,仅此。#include #include #include #include #define N 101000using namespace
后台管理系统示例:通过bootstrap快速搭建html页面,jQuery实现选择菜单切换内容
JAVA_HOME 变量:一是为了方便引用,比如,JDK安装在C:\jdk1.6.0目录里,则设置JAVA_HOME为该目录路径, 那么以后要使用这个路径的时候, 只需输入%JAVA_HOME%即可, 避免每次引用都输入很长的路径串;二则是归一原则, 当JDK路径改变的时候, 仅需更改JAVA_HOME的变量值即可, 否则,就要更改任何用绝对路径引用JDK目录的文档, 要是万一没有改全, ...
在WinCC图形编辑器右边的对象选项板里,选择“控件”选项标签,然后在里面点右键,选“添加/删除”,在弹出的对话框中会列出所有可用ocx,在里面找到"Microsoft TreeView Control",勾选后点OK,该控件就会出现在对象选择板的控件里,将其拖放到画面中即可。
-fvisibility=default|internal|hidden|protectedgcc的visibility是说,如果编译的时候用了这个属性,那么动态库的符号都是hidden的,除非强制声明。1.创建一个c源文件,内容简单#include#include__attribute ((visibility("default"))) void not_hidd
1 什么是Spring Cloud?Spring cloud流应用程序启动器是基于Spring Boot的Spring集成应用程序,提供与外部系统的集成。Spring cloud Task,一个生命周期短暂的微服务框架,用于快速构建执行有限数据处理的应用程序。2 使用Spring Cloud有什么优势?使用Spring Boot开发分布式微服务时,我们面临以下问题:与分布式系统相关的复杂性-这种开销包括网络问题,延迟开销,带宽问题,安全问题。服务发现-服务发现工具管理群集中的流程和服务如何查找和互
可能包含或不包含非null值的容器对象。Optional概述。
学习VC++时经常会遇到链接错误LNK2001,该错误非常讨厌,因为对于编程者来说,最好改的错误莫过于编译错误,而一般说来发生连接错误时,编译都已通过。产生连接错误的原因非常多,尤其LNK2001错误,常常使人不明其所以然。如果不深入地学习和理解VC++,要想改正连接错误LNK2001非常困难。 初学者在学习VC++的过程中,遇到的LNK2001错误的错误消息主要为: unre
异步刷新实现方式有多种,也可以借助JS的多种框架,下面是使用jQuery框架实现的AJAX 验证用户名是否存在jQuery.ajax概述HTTP请求加载远程数据。通过jQuery底层AJAX实现。简单易用的高层实现见$.get, $.post等。$.ajax()返回其创建的XMLHttpRequest对象。大多数情况下你无需直接操作该对象,但特殊情况下可用于手动终止请求...
环境:centos7tinyproxy下载tinyproxyyum install tinyproxy -y修改配置vim /etc/tinyproxy/tinyproxy.conf# 端口Port 8888# 允许访问的ip,后面接允许的ip,比如我只需要我阿里云的公网ip能访问就行,于是用阿里云公网ip替换127.0.0.1。如果要所有ip均可访问就注释掉此配置Allow 127.0.0.1# 性能配置,一般不需要修改,根据需要进行调整。(这里我没有修改)MaxCli
今天给大家带来小程序懒加载的办法,来提高页面的体验性。理念:加入滚动,当页面滚动的时候计算列表数据的高度来判断该区域数据图片是否加载显示1.首先我们要在wxml要显示的列表里加入滚动条然后。。。wxml: <swiper indicator-dots="{{indicatorDots}}" autoplay="{{autoplay}}" interval="
一、UOS操作系统简介统信桌面操作系统(UOS)个人版是统信软件基于Linux5.3内核打造,专为个人用户推出的一款安全稳定、美观易用的国产桌面操作系统。支持x86架构,兼容主流外设。UOS个人版提供了丰富的应用生态,用户可以通过应用商店下载数百款应用,覆盖日常办公、通讯交流、影音娱乐、设计开发等各种场景需求。UOS将用户体验与设计美学完美结合,系统和应用界面均保持了界面的直观性、操作的即时性和便利性、活力的界面风格。拥有时尚模式和高效模式两种桌面风格,提供白色和黑色主题,适应不同用户使用习惯,为您