Hive查询题(上)_斯特凡今天也很帅的博客-程序员宅基地

技术标签: hive  HIVE  mysql  数据库  大数据  

《Hive高级查询》作业答案

一、每个店铺top3(分组求topN)

有50w个京东店铺,每个顾客访问任何一个店铺的任何一个商品时,都会产生一条访问日志,访问日志存储的表名为visit,访客用户id为user_id,被访问的店铺名称为shop。请统计:
数据:

u1 a
u2 b
u1 b
u1 a
u3 c
u4 b
u1 a
u2 c
u5 b
u4 b
u6 c
u2 c
u1 a
u2 a
u2 a
u3 a
u5 a
u5 a
u5 a

需求:
1.每个店铺UV(访客数)
2.每个店铺访问次数top3的访客信息。输出店铺名、访客id、访问次数。

建表:

create table visit(user_id string,shop string) row format delimited fields terminated by ' ';

加载数据:

load data local inpath '/opt/soft/files/jd.txt' into table visit;

需求1:
方法1:

select shop,count(distinct user_id) UV from visit group by shop;

方法2:

select shop,count(*) UV from (select shop,user_id from visit group by shop,user_id)t1 group by shop;

需求2:

select shop, user_id, ct from (select shop, user_id, ct, row_number() over(partition by shop order by ct desc) rk from (select shop, user_id, count(*) ct from visit group by shop,user_id)t1) t2 where rk<=3;

二、求月销售额和总销售额
1、数据说明
(1)数据格式

a,01,150
a,01,200
b,01,1000
b,01,800
c,01,250
c,01,220
b,01,6000
a,02,2000
a,02,3000
b,02,1000
b,02,1500
c,02,350
c,02,280
a,03,350
a,03,250

(2)字段含义
店铺,月份,金额
3、需求
编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额
2、数据准备
(1)创建数据库表t_store

create table t_store(
name string,
months int,money int
) 
row format delimited fields terminated by ",";

(2)导入数据

load data local inpath "/home/hadoop/store.txt" into table t_store;


with
r1 as
(
select
name,
months,
sum(money) money
from t_store
group by name,months
)
select
name,
months,
sum(money) over(partition by name,months order by months),
sum(money) over(partition by name order by months rows between unbounded preceding and current row)
from r1;

三、自关联练习

name,month,pv
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
A,2015-02,4
A,2015-02,6
B,2015-02,10
B,2015-02,5
A,2015-03,16
A,2015-03,22
B,2015-03,23
B,2015-03,10
B,2015-03,11

数据的字段意义是:
用户,月份,访问量
需求:
每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数。

参考答案:可以参考上一题的解题思考,使用窗口函数。也可以使用自连接。
具体的最终实现的SQL:

select a.name as aname, a.month as amonth, a.pv as apv, 
max(b.pv) as maxpv, sum(b.pv) as sumpv 
from 
(select a.name, a.month, sum(a.pv) as pv  from exercise01 a  group by a.name, a.month) a 
join 
(select a.name, a.month, sum(a.pv) as pv  from exercise01 a  group by a.name, a.month) b 
on a.name = b.name 
where a.month >= b.month 
group by a.name, a.month, a.pv;

实现思路:
第一步:由于每个用户在每个月份有多条数据访问记录,所以根据题意,首先得汇总每个用户在每个月份的总访问次数
SQL实现:

select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month;

结果数据:

A       2015-01 33
A       2015-02 10
A       2015-03 38
B       2015-01 30
B       2015-02 15
B       2015-03 44

第二步:由于要求得这种格式的数据:
用户 月份 当月访问次数 最大访问次数 总访问次数

A       2015-01        33        33        33
A       2015-02        10        33        43
A       2015-03        38        38        81
B       2015-01        30        30        30
B       2015-02        15        30        45
B       2015-03        44        44        89

需要如下这种格式的数据才能求出:

A       2015-01 33      A       2015-01 33
A       2015-02 10      A       2015-01 33
A       2015-02 10      A       2015-02 10
A       2015-03 38      A       2015-01 33
A       2015-03 38      A       2015-02 10
A       2015-03 38      A       2015-03 38
B       2015-01 30      B       2015-01 30
B       2015-02 15      B       2015-01 30
B       2015-02 15      B       2015-02 15
B       2015-03 44      B       2015-01 30
B       2015-03 44      B       2015-02 15
B       2015-03 44      B       2015-03 44

那如何得到这样的数据呢?执行如下的SQL:

select a.name as aname, a.month as amonth, a.pv as apv, 
b.name as bname, b.month as bmonth, b.pv as bpv 
from 
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a 
join 
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b 
on a.name = b.name  
where a.month >= b.month;

第三步:在得到上述数据的基础之上,然后直接进行聚合即可
SQL实现:

select a.aname, a.amonth, a.apv,  max(a.bpv) as maxpv, sum(a.bpv) as sumpv 
from 
(
select a.name as aname, a.month as amonth, a.pv as apv, 
b.name as bname, b.month as bmonth, b.pv as bpv 
from 
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) a 
join 
(select a.name, a.month, sum(a.pv) as pv from exercise01 a group by a.name, a.month) b 
on a.name = b.name  
where a.month >= b.month
) a 
group by a.aname, a.amonth, a.apv;

第四步:获得执行结果:

A       2015-01 33      33      33
A       2015-02 10      33      43
A       2015-03 38      38      81
B       2015-01 30      30      30
B       2015-02 15      30      45
B       2015-03 44      44      89

第五步:对于SQL语句进行一些优化得到最终SQL:

select a.name as aname, a.month as amonth, a.pv as apv, 
max(b.pv) as maxpv, sum(b.pv) as sumpv 
from 
(select a.name, a.month, sum(a.pv) as pv  from exercise01 a  group by a.name, a.month) a 
join 
(select a.name, a.month, sum(a.pv) as pv  from exercise01 a  group by a.name, a.month) b 
on a.name = b.name 
where a.month >= b.month 
group by a.name, a.month, a.pv;

四、综合练习

至少完成其中的十题。
1、查询"01"课程比"02"课程成绩高的学生的信息及课程分数:

select student.s_id,student.s_name,s1_score,s2_score from (
select s_id,s_score as s1_score from score where c_id = 01
)s1
join( 
select s_id,s_score as s2_score from score where c_id = 02  
)s2 on s1.s_id = s2.s_id
left join student on s1.s_id = student.s_id
where s1.s1_score > s2.s2_score

2、查询"01"课程比"02"课程成绩低的学生的信息及课程分数:

select student.s_id,student.s_name,s1_score,s2_score from (
select s_id,s_score as s1_score from score where c_id = 01
)s1
 join( 
select s_id,s_score as s2_score from score where c_id = 02  
)s2 on s1.s_id = s2.s_id
left join student on s1.s_id = student.s_id
where s1.s1_score < s2.s2_score

3、查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩:

select score.s_id,student.s_name,avg(score.s_score) from score 
left join student on score.s_id = student.s_id
group by score.s_id,student.s_name
having avg(score.s_score) >= 60

4、查询平均成绩小于60分的同学的学生编号和学生姓名和平均成绩: (包括有成绩的和无成绩的)

select * from (
select student.s_id,student.s_name,
case when avg(score.s_score) is null then 0 
else  avg(score.s_score)
end as avgscore
from student 
left join score on  student.s_id = score.s_id
group by student.s_id,student.s_name
)ss where ss.avgscore < 60

5、查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩:

select student.s_id,student.s_name,
count(score.s_score) as countscore,
sum(score.s_score) as total from student 
left join score on student.s_id = score.s_id
group by student.s_id,student.s_name

6、查询"李"姓老师的数量:

select count(1) from teacher where t_name like '李%'

7、查询学过"张三"老师授课的同学的信息:

select student.* from student 
join score on  student.s_id = score.s_id
join course on course.c_id = score.c_id
join teacher on teacher.t_id = course.t_id
where teacher.t_name = '张三';

8、查询没学过"张三"老师授课的同学的信息:

select * from student where s_id not in (
select student.s_id from student 
join score on  student.s_id = score.s_id
join course on course.c_id = score.c_id
join teacher on teacher.t_id = course.t_id
where teacher.t_name = '张三'
);
 

select student.* from student 
join teacher on  teacher.t_name = '张三'
join course on course.c_id = teacher.t_id
left join score on  student.s_id = score.s_id and score.c_id = course.c_id
where 1=1 
and score.s_score is null;

9、查询学过编号为"01"并且也学过编号为"02"的课程的同学的信息:

select student.* from (
select s_id,s_score as s1_score from score where c_id = 01
)s1
join( 
select s_id,s_score as s2_score from score where c_id = 02  
)s2 on s1.s_id = s2.s_id
left join student on s1.s_id = student.s_id

10、查询学过编号为"01"但是没有学过编号为"02"的课程的同学的信息:

select student.* from (
select s_id,s_score as s1_score from score where c_id = 01
)s1
left join( 
select s_id,s_score as s2_score from score where c_id = 02  
)s2 on s1.s_id = s2.s_id
left join  student on s1.s_id = student.s_id
where s2.s_id is null

11、查询没有学全所有课程的同学的信息:

select distinct stu.*
from student stu
join course a 
left join score b on b.s_id = stu.s_id and b.c_id = a.c_id 
where b.s_score is null
;

12、查询至少有一门课与学号为"01"的同学所学相同的同学的信息:

select distinct stu.*
from student stu
join score a on a.s_id = stu.s_id
where stu.s_id <> '01' and a.c_id in (
select c_id from score where s_id = '01')
;

13、查询和"01"号的同学学习的课程完全相同的其他同学的信息:

select student.*,tmp1.course_id 
from student
join (
select s_id ,concat_ws('|', collect_set(c_id)) course_id 
from score
where s_id <> '01'
group by s_id)tmp1
  on student.s_id = tmp1.s_id
join (
select concat_ws('|', collect_set(c_id)) course_id2
from score  
where s_id='01' and c_id <> '01')tmp2
on tmp1.course_id = tmp2.course_id2;

14、查询没学过"张三"老师讲授的任一门课程的学生姓名:

select *
from student stu
where stu.s_id not in (
select c.s_id 
from course a,teacher b,score c 
where b.t_name = '张三' and a.t_id = b.t_id and c.s_id = stu.s_id and c.c_id = a.c_id
);

15、查询两门及其以上不及格课程的同学的学号,姓名及其平均成绩:

select 
*
from student stu 
join score a on a.s_id = stu.s_id
where a.s_score < 60;

16、检索"01"课程分数小于60,按分数降序排列的学生信息:

select stu.*
from student stu
join score a on a.s_id = stu.s_id
where a.c_id = '01' and a.s_score < 60
order by a.s_score desc;

17、按平均成绩从高到低显示所有学生的所有课程的成绩以及平均成绩:

select * from 
score left join 
(
select s_id,avg(score.s_score) as avgscore
from score 
group by s_id
order by avgscore
)ss on ss.s_id = score.s_id;

18.查询各科成绩最高分、最低分和平均分:以如下形式显示:课程ID,课程name,最高分,最低分,平均分,及格率,中等率,优良率,优秀率:

select c_id,
course.c_course,
max(s_score),
min(s_score),
round(avg(s_score),2),
round(avg(jige),2),
round(avg(zhongdeng),2),
round(avg(youliang),2),
round(avg(youxiu),2)
from (
select score.*,
case when score.s_score >= 60 then 1 else 0 end as jige,
case when score.s_score >= 70 and score.s_score < 80 then 1 else 0 end as zhongdeng,
case when score.s_score >= 80 and score.s_score < 90 then 1 else 0 end as youliang,
case when score.s_score >= 90 then 1 else 0 end as youxiu
from score
)score
left join course on score.c_id = course.c_id
group by score.c_id,course.c_course

19、按各科成绩进行排序,并显示排名:– row_number() over()分组排序功能

select *,
row_number() over(distribute by c_id sort by s_score desc) as rn
from score ;

20、查询学生的总成绩并进行排名:

select s_id,sum(s_score) as sumScores
from score
group by s_id
order by sumScores desc;

21、查询不同老师所教不同课程平均分从高到低显示:
select t_id,a.c_id,round(avg(s_score),2) as avgscore
from score a
join course b on b.c_id = a.c_id
group by t_id,a.c_id
order by t_id,avgscore desc ;
22、查询所有课程的成绩第2名到第3名的学生信息及该课程成绩:

select student.,ss.s_score from (
select score.
,
row_number() over(distribute by c_id sort by s_score desc) as rn
from score
)ss
left join student on ss.s_id = student.s_id
where ss.rn between 2 and 3;
23、统计各科成绩各分数段人数:课程编号,课程名称,[100-85],[85-70],[70-60],[0-60]及所占百分比
select c_id,
round(sum(case when s_score >= 85 then 1 else 0 end)/count(1),2) as 85score,
round(sum(case when s_score between 70 and 84 then 1 else 0 end)/count(1),2) as 70score,
round(sum(case when s_score between 60 and 69 then 1 else 0 end)/count(1),2) as 60score,
round(sum(case when s_score < 60 then 1 else 0 end)/count(1),2) as 59score,
count(1) as totalstu
from score
group by c_id
;
24、查询学生平均成绩及其名次:
select *,
row_number() over(sort by a.avgscore desc) as rm
from (
select
s_id,
round(avg(s_score),2) as avgscore
from score
group by s_id ) a;
25、查询各科成绩前三名的记录三个语句

select * from
(
select *,
row_number() over(distribute by c_id sort by s_score desc) as rm,
rank() over(distribute by c_id sort by s_score desc) as rk,
dense_rank() over(distribute by c_id sort by s_score desc) as drk
from score
) a
where a.rm < 4;
26、查询每门课程被选修的学生数:
select c_id,count(1) as totalstu
from score
group by c_id;
27、查询出只有两门课程的全部学生的学号和姓名:
select stu.s_id,stu.s_name
from score
left join student stu on stu.s_id = score.s_id
group by stu.s_id,stu.s_name
having count(1) =2;
28、查询男生、女生人数:

select s_sex,count(1) as totalstu
from student
group by s_sex
;
29、查询名字中含有"风"字的学生信息:
select *
from student
where s_name like ‘%风%’;
30、查询同名同性学生名单,并统计同名人数:
select
s_name,
s_sex,
count(1) as totalstu
from
student
group by s_name,s_sex
having totalstu >1;

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_41311528/article/details/111186177

智能推荐

MongoDB学习笔记--数据管理_mongoexport -h 192.168.1.233 --port 27018 -d yourd-程序员宅基地

MongoDB数据管理转自:http://www.caiyiting.com/blog/2015/mongodb-data-management.html数据库备份 – mongodump备份本地所有MongoDB数据库: # mongodump -h 127.0.0.1 --port 27017 -o /root/db/alldb备份远程指定数据库:# mongodump -h 192._mongoexport -h 192.168.1.233 --port 27018 -d yourdb -c yourcoll -o /root/you

模拟cmos集成电路(2)_vgs vth vds三者的关系_steven_ifv的博客-程序员宅基地

模拟受PVT的影响;P 工艺 V 电压 T 温度参杂浓度和温度都会影响阈值即VTH 可以将要求匹配的管子放在近的地方 参杂浓度和温度较为接近会有更相近的阈值工艺提供了不同沟道参杂浓度的晶体管 以获得不同的阈值漏极电流在线性区(linear region /triode region)的表达式(推导过程见书)(对于nmos)前半部分又工艺决定 设计的部分是后半部分电路的尺寸和偏置电压ID同时受Vgs和Vds影响 W/L定义为宽长比(aspect ratio设计是考虑 改..._vgs vth vds三者的关系

GitLab 查看上次git commit的记录-程序员宅基地

命令: git log -n *q@s:~/Documents/Practice/sume/test$gitlog-1commitd08b9af30d387102299c533bd3c9f92b505398Author:sume<***.**@qq.com>Date:TueAug2111:59:122018+0800a...

c语言程序设计第五版实验报告九,C语言程序设计实验报告-实验九.doc-程序员宅基地

C语言程序设计实验报告-实验九.doc下载提示(请认真阅读)1.请仔细阅读文档,确保文档完整性,对于不预览、不比对内容而直接下载带来的问题本站不予受理。2.下载的文档,不会出现我们的网址水印。3、该文档所得收入(下载+内容+预览)归上传者、原创作者;如果您是本文档原作者,请点此认领!既往收益都归您。文档包含非法信息?点此举报后获取现金奖励!下载文档到电脑,查找使用更方便9.9积分还剩页未读,继续阅..._第五版c语言指针实验九实验报告总结

Python入门教程100天:Day10-图形用户界面和游戏开发-程序员宅基地

基于tkinter模块的GUIGUI是图形用户界面的缩写,图形化的用户界面对使用过计算机的人来说应该都不陌生,在此也无需进行赘述。Python默认的GUI开发模块是tkinter(在Python 3以前的版本中名为Tkinter),从这个名字就可以看出它是基于Tk的,Tk是一个工具包,最初是为Tcl设计的,后来被移植到很多其他的脚本语言中,它提供了跨平台的GUI控件。当然Tk并不是最新和最好的选择...

随便推点

基于nginx负载均衡下 Tomcat 集群的 Session 共享-程序员宅基地

一、前言  nginx 作为目前最流行的开源反向代理HTTP Server,用于实现资源缓存、web server负载均衡等功能,由于其轻量级、高性能、高可靠等特点在互联网项目中有着非常普遍的应用,相关概念网上有丰富的介绍。分布式web server集群部署后需要实现session共享,针对 tomcat 服务器的实现方案多种多样,比如 tomcat cluster session 广播、

C语言课程设计报告输出杨辉三角,C语言学习:在屏幕上输出杨辉三角_海蒂的万花镜的博客-程序员宅基地

杨辉三角的规律是:它的两条斜边都是由数字1组成的,而其余的数则是等于它肩上的两个数之和。 代码如下:#include#includeint main(){int i,j,k,arr[10][10]={0};/*arr[11][11]必须初始化,初始化为{0}*/printf("打印出杨辉三角:n");for(i=0;i<10;i++){/*先打印出第一列和对角线的数,均为1,同时第一二行已..._杨辉三角课程设计

QT操作文件夹(创建、复制、重命名、移除)_qt 重命名文件夹_~南柯一梦~的博客-程序员宅基地

头文件#include <QDir>#include <QFileInfo>#include <QFile>创建文件夹//************************************// 方法名称: CreateFolder// 概要: 创建文件夹// 返回值: void// 参数: QString folderPath 文件夹路径//************************************void CreateF_qt 重命名文件夹

git(管理项目代码工具)学习笔记总结——是什么有什么用及相关基本常识_git编辑器有什么用_不想想了的博客-程序员宅基地

前言提示:这类工具,像git、maven、IDEA等等,需要理解的知识不是太多。要想熟练使用,理解命令的含义即作用是不够的,需要我们不断的去使用它。 所以本篇作为git学习笔记总结将大概讲解git是什么有什么用及相关基本常识,其他实操还得靠平时。文章目录前言基本介绍(是什么)使用流程(有哪些作用)安装Git创建版本库版本利用管理(类似日志管理)工作区和暂存区(概念了解)撤销修改删除文件添加远程仓库(推上网)从远程库克隆(拉到本地)分支管理创建与合并分支解决冲突多人协作Rebase标签管理(版本号代替_git编辑器有什么用

几种古典密码学算法_古典密码算法有哪些-程序员宅基地

古典密码学可以分为代替密码(也叫做移位密码)和置换密码(也叫做换位密码)两种,其中代替密码典型的有Caesar密码,仿射变换等,置换密码有单表置换和多表置换等。下面是几种常见古典密码算法的实现。1.Caesar密码void encrypt(char* text,int k,char cipher[1024]){ int a[26];int A[26]; int m; for(in_古典密码算法有哪些

数字信号处理(DTFT、DFT、FFT、各种时域到频域的转换的相互关系)_时域到频域转换-程序员宅基地

DTFT、DFT、各种时域到频域的转换的相互关系时域信号与频域的信号的关系采样与奈奎斯特采样定理奈奎斯特采样定理与归一化角频率离散时间信号的傅里叶变换(DTFT)傅里叶变换、拉普拉斯变换、Z变换的联系是什么?离散时间系统时域信号与频域的信号的关系无论是连续的还是非连续的,周期信号用傅里叶级数来表示,非周期信号用傅里叶变换来表示时域信号是连续非周期的,则傅里叶变换后频域信号是连续非周期的时域信号是连续周期的,则傅里叶级数变换后频域信号是离散非周期的时域信号是离散的非周期时间信号,则DTFT之后,其频_时域到频域转换

推荐文章

热门文章

相关标签