目录
项目中埋点信息需要解析用户信息的地理位置信息,如果有经纬度信息,我们可以通过百度地图或者高德地图的API解析,但一般接口都是有次数限制或者收费的。这里我们通过解析IP地址获取用户(访问者)的地理位置。
通过解析IP地址获取用户(访问者)的地理位置。通过查询资料,一般解析IP地址使用映射和查询库是最优的方案。查询速度快,查询准确。这里使用ip2region项目。详细下面介绍。
- 参考项目:
- 使用ip2region项目[https://github.com/lionsoul2014/ip2region](https://github.com/lionsoul2014/ip2region),
- ip2region - 最自由的ip地址查询库,ip到地区的映射库;
- 数据聚合了一些知名ip到地名查询提供商的数据(淘宝IP地址库,GeoIP,纯真IP库)
- 使用
- 依赖ip2region-1.7.2.jar包
- ip地址库的资源文件=ip2region.db
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.build.targetJdk>1.7</project.build.targetJdk>
<project.report.outputEncoding>UTF-8</project.report.outputEncoding>
<project.report.inputEncoding>UTF-8</project.report.inputEncoding>
<hive.version>0.11.0</hive.version>
<hadoop.version>2.2.0</hadoop.version>
<junit.version>4.12</junit.version>
<commons-codec.version>1.10</commons-codec.version>
</properties>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>
<dependency>
<groupId>org.lionsoul</groupId>
<artifactId>ip2region</artifactId>
<version>1.7.2</version>
</dependency>
<dependency>
<groupId>nl.basjes.parse.useragent</groupId>
<artifactId>yauaa-hive</artifactId>
<classifier>udf</classifier>
<version>5.11</version>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>${commons-codec.version}</version>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>com.github.codesorcery</groupId>
<artifactId>juan</artifactId>
<version>0.2.0</version>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
</dependency>
<dependency>
<groupId>org.lionsoul</groupId>
<artifactId>ip2region</artifactId>
</dependency>
<dependency>
<groupId>nl.basjes.parse.useragent</groupId>
<artifactId>yauaa-hive</artifactId>
<classifier>udf</classifier>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
</dependency>
<dependency>
<groupId>com.github.codesorcery</groupId>
<artifactId>juan</artifactId>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.6</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude></exclude>
</excludes>
</artifactSet>
<shadedArtifactAttached>true</shadedArtifactAttached>
</configuration>
</execution>
</executions>
</plugin>
<!-- Tweak the compiler to use more memory and use UTF-8 for the source code. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${project.build.targetJdk}</source>
<target>${project.build.targetJdk}</target>
<encoding>${project.build.sourceEncoding}</encoding>
<showWarnings>true</showWarnings>
</configuration>
</plugin>
<!-- Resource plugins should always use UTF-8 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>2.6</version>
<configuration>
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>release</id> <!-- 部署要用到 -->
<build>
<plugins>
<!-- Source -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- Javadoc -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- GPG -->
<plugin> <!-- 进行延签 -->
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.6</version>
<executions>
<execution>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<distributionManagement>
<snapshotRepository>
<id>snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
</snapshotRepository>
<repository>
<id>releases</id>
<url>https://oss.sonatype.org/service/local/staging/deploy/maven2/</url>
</repository>
</distributionManagement>
</profile>
</profiles>
package com.my.hive.udf.ipgeo;
public class ConstantsGeoIp {
public static final String GEOLITE2_CITY_FILE= "GeoLite2-City.mmdb";
public static final String SEP= "\\|";//分隔
public static final String SEP_IP2REGION= "\\|";//分隔
public static final String FILE_IP2REGION= "ip2region.db";//分隔
public static final String KEY_COUNTRY_ID= "countryID"; //国家
public static final String KEY_COUNTRY_NAME= "countryName";
public static final String KEY_COUNTRY_NAME_EN= "countryNameEn";
public static final String KEY_PROVINCE_ID= "provinceID"; //省份
public static final String KEY_PROVINCE_NAME= "provinceName";
public static final String KEY_PROVINCE_NAME_EN= "provinceNameEn";
public static final String KEY_CITY_ID= "cityID"; //城市
public static final String KEY_CITY_NAME= "cityName";
public static final String KEY_CITY_NAME_EN= "cityNameEn";
public static final String KEY_ISP_ID= "ispID"; //运营商,如电信
public static final String KEY_ISP_NAME= "ispName";
public static final String KEY_ISP_NAME_EN= "ispNameEn";
public static final String KEY_REGION_ID= "regionID"; //华南
public static final String KEY_REGION_NAME= "regionName";
public static final String KEY_REGION_NAME_EN= "regionNameEn";
public static final String KEY_CONTINENT_ID= "continentID"; //大洲
public static final String KEY_CONTINENT_NAME= "continentName";
public static final String KEY_CONTINENT_NAME_EN= "continentNameEn";
public static final String KEY_LATITUDE= "latitude"; //经度纬度
public static final String KEY_LONGITUDE= "longitude";
}
package com.my.utils.ip;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class IpUtils {
public static boolean isIpV4(String ipAddress) {
String ip = "([1-9]|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])(\\.(\\d|[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5])){3}";
Pattern pattern = Pattern.compile(ip);
Matcher matcher = pattern.matcher(ipAddress);
return matcher.matches();
}
}
package com.my.hive.udf.ipgeo;
import java.io.*;
import java.net.URI;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
import org.apache.hadoop.io.Text;
import org.lionsoul.ip2region.DataBlock;
import org.lionsoul.ip2region.DbConfig;
import org.lionsoul.ip2region.DbMakerConfigException;
import org.lionsoul.ip2region.DbSearcher;
import com.my.utils.ip.IpUtils;
import org.apache.hadoop.hive.ql.session.SessionState;
@Description(name = "ip2geo", value = "_FUNC_(array) - Returns map type of the ip address.\n"
+ "Based on https://github.com/lionsoul2014/ip2region.\nThe ip address database has stored in local resource. \n"
+ " > Para1: Ipadress\n"
+ "Example:\n"
+ " > CREATE TEMPORARY FUNCTION ip2geo AS 'com.jet.hive.udf.ipgeo.UDFIp2Region' \n"
+ " > SELECT ip2geo('221.226.1.30' ),ip2geo('221.226.1.30' )['provinceName'] \n"
+ "")
public class UDFIp2Region extends GenericUDF{
PrimitiveObjectInspector inputOI;
private static DbSearcher searcher=null;
private static boolean isNonInit = true; //未被init过
private static List<String> fieldNames = null;
private static byte[] data;
private static Configuration conf;
private static FileSystem fs;
private static InputStream in;
static {
//加载数据
ByteArrayOutputStream out = null;
try {
//修改成你的HDFS上的文件ip2region.db地址路径,不用加ip和端口
String uri = "hdfs:///warehouse/dd/auxlib/ip2region.db";
conf = new Configuration();
fs = FileSystem.get(URI.create(uri), conf);
in = fs.open(new Path(uri));
out = new ByteArrayOutputStream();
byte[] b = new byte[1024];
while (in.read(b) != -1) {
out.write(b);
}
// 提高性能,将ip2region.db一次从hdfs中读取出来,缓存到data字节数组中以重用,
// 避免每来一条数据读取一次ip2region.db
data = out.toByteArray();
out.close();
in.close();
} catch (Exception e){
e.printStackTrace();
}
finally {
try {
if(out != null) {
out.close();
}
if(in != null) {
in.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
/**
* 构造searcher
* @throws UDFArgumentException
*/
private static synchronized void constructUDFIp2RegionByOut() throws UDFArgumentException {
try {
if(searcher==null){
DbConfig config = new DbConfig();
searcher = new DbSearcher(config, data);
}
isNonInit=false;
}catch (Exception e) {
throw new UDFArgumentException("Error: read file:"+ConstantsGeoIp.FILE_IP2REGION);
}
}
@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
if(isNonInit){
constructUDFIp2RegionByOut();
if(fieldNames==null){
fieldNames=new ArrayList<String>();
fieldNames.add(ConstantsGeoIp.KEY_COUNTRY_NAME);
fieldNames.add(ConstantsGeoIp.KEY_REGION_NAME);
fieldNames.add(ConstantsGeoIp.KEY_PROVINCE_NAME);
fieldNames.add(ConstantsGeoIp.KEY_CITY_NAME);
fieldNames.add(ConstantsGeoIp.KEY_ISP_NAME);
}
isNonInit=false;
}
// 存储在全局变量的ObjectInspectors元素的输入
inputOI = PrimitiveObjectInspectorFactory.javaStringObjectInspector;
// 返回变量输出类型
return ObjectInspectorFactory.getStandardMapObjectInspector(
PrimitiveObjectInspectorFactory.writableStringObjectInspector,
PrimitiveObjectInspectorFactory.writableStringObjectInspector);
}
/**
* 解析IP,返回map类型的地址信息
* @param arguments ip地址
* @return map类型的地址信息
* @throws HiveException
*/
@Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
String ip = PrimitiveObjectInspectorUtils.getString(arguments[0].get(), inputOI).trim();
String reString=null;
if(ip!=null && IpUtils.isIpV4(ip)){
try {
DataBlock dataBlock = searcher.memorySearch(ip);
reString=dataBlock.getRegion();
} catch (IOException e) {
reString="0|0|0|0|0";
}
}
else{
reString="0|0|0|0|0";
}
String[] ipArray = reString.split(ConstantsGeoIp.SEP_IP2REGION);
Map<Text, Text> reMap = new HashMap<Text, Text>();
for (int i = 0; i < fieldNames.size(); i++) {
Text t = ipArray[i].equals("0")?null:new Text(ipArray[i]);
reMap.put(new Text(fieldNames.get(i)), t);
}
return reMap;
}
@Override
public String getDisplayString(String[] arg0) {
return arg0[0];
}
}
##将ip2region.db这个文件上传到HDFS一个文件夹中使用
比如上传到/warehouse/dd/auxlib/ip2region.db,使用时将UDFIp2Region.java中的
String uri = "hdfs:///warehouse/dd/auxlib/ip2region.db";修改一下即可。
ip2region.db最新文件下载地址ip2region.db-Hive文档类资源-CSDN下载
cd ${project_home}
mvn clean package -DskipTests
命令执行完成后, 将会在target目录下生成[A=my-hive-udf-\${version}-shaded.jar,
B=my-hive-udf-\${version}.jar]文件.其中A是包括所有依赖包的jar, B是最小编译jar文件
#dd_database_bigdata为database名称,ipgeo为方法名称
#create temporary function 创建的是临时方法,仅对当前session有效
create temporary function dd_database_bigdata.ipgeo as 'com.my.hive.udf.ipgeo.UDFIp2Region' USING JAR 'hdfs:///warehouse/dd/auxlib/my-hive-udf-1.0.0-shaded.jar';
#create function 创建的是永久方法
create function dd_database_bigdata.ipgeo as 'com.my.hive.udf.ipgeo.UDFIp2Region' USING JAR 'hdfs:///warehouse/dd/auxlib/my-hive-udf-1.0.0-shaded.jar';
select ipgeo('221.226.1.11'), ipgeo('221.226.1.11')['provinceName']
##结果
{"cityName":"南京市","countryName":"中国","ispName":"电信","regionName":null,"provinceName":"江苏省"} 江苏省
文章浏览阅读645次。这个肯定是末尾的IDAT了,因为IDAT必须要满了才会开始一下个IDAT,这个明显就是末尾的IDAT了。,对应下面的create_head()代码。,对应下面的create_tail()代码。不要考虑爆破,我已经试了一下,太多情况了。题目来源:UNCTF。_攻防世界困难模式攻略图文
文章浏览阅读2.9k次,点赞3次,收藏10次。偶尔会用到,记录、分享。1. 数据库导出1.1 切换到dmdba用户su - dmdba1.2 进入达梦数据库安装路径的bin目录,执行导库操作 导出语句:./dexp cwy_init/[email protected]:5236 file=cwy_init.dmp log=cwy_init_exp.log 注释: cwy_init/init_123..._达梦数据库导入导出
文章浏览阅读1.9k次。1. 在官网上下载KindEditor文件,可以删掉不需要要到的jsp,asp,asp.net和php文件夹。接着把文件夹放到项目文件目录下。2. 修改html文件,在页面引入js文件:<script type="text/javascript" src="./kindeditor/kindeditor-all.js"></script><script type="text/javascript" src="./kindeditor/lang/zh-CN.js"_kindeditor.js
文章浏览阅读2.3k次,点赞6次,收藏14次。SPI的详情简介不必赘述。假设我们通过SPI发送0xAA,我们的数据线就会变为10101010,通过修改不同的内容,即可修改SPI中0和1的持续时间。比如0xF0即为前半周期为高电平,后半周期为低电平的状态。在SPI的通信模式中,CPHA配置会影响该实验,下图展示了不同采样位置的SPI时序图[1]。CPOL = 0,CPHA = 1:CLK空闲状态 = 低电平,数据在下降沿采样,并在上升沿移出CPOL = 0,CPHA = 0:CLK空闲状态 = 低电平,数据在上升沿采样,并在下降沿移出。_stm32g431cbu6
文章浏览阅读1.2k次,点赞2次,收藏8次。数据链路层习题自测问题1.数据链路(即逻辑链路)与链路(即物理链路)有何区别?“电路接通了”与”数据链路接通了”的区别何在?2.数据链路层中的链路控制包括哪些功能?试讨论数据链路层做成可靠的链路层有哪些优点和缺点。3.网络适配器的作用是什么?网络适配器工作在哪一层?4.数据链路层的三个基本问题(帧定界、透明传输和差错检测)为什么都必须加以解决?5.如果在数据链路层不进行帧定界,会发生什么问题?6.PPP协议的主要特点是什么?为什么PPP不使用帧的编号?PPP适用于什么情况?为什么PPP协议不_接收方收到链路层数据后,使用crc检验后,余数为0,说明链路层的传输时可靠传输
文章浏览阅读587次。软件测试工程师移民加拿大 无证移民,未受过软件工程师的教育(第1部分) (Undocumented Immigrant With No Education to Software Engineer(Part 1))Before I start, I want you to please bear with me on the way I write, I have very little gen...
文章浏览阅读304次。Thinkpad X250笔记本电脑,装的是FreeBSD,进入BIOS修改虚拟化配置(其后可能是误设置了安全开机),保存退出后系统无法启动,显示:secure boot failed ,把自己惊出一身冷汗,因为这台笔记本刚好还没开始做备份.....根据错误提示,到bios里面去找相关配置,在Security里面找到了Secure Boot选项,发现果然被设置为Enabled,将其修改为Disabled ,再开机,终于正常启动了。_安装完系统提示secureboot failure
文章浏览阅读10w+次,点赞93次,收藏352次。1、用strtok函数进行字符串分割原型: char *strtok(char *str, const char *delim);功能:分解字符串为一组字符串。参数说明:str为要分解的字符串,delim为分隔符字符串。返回值:从str开头开始的一个个被分割的串。当没有被分割的串时则返回NULL。其它:strtok函数线程不安全,可以使用strtok_r替代。示例://借助strtok实现split#include <string.h>#include <stdio.h&_c++ 字符串分割
文章浏览阅读2.3k次。1 .高斯日记 大数学家高斯有个好习惯:无论如何都要记日记。他的日记有个与众不同的地方,他从不注明年月日,而是用一个整数代替,比如:4210后来人们知道,那个整数就是日期,它表示那一天是高斯出生后的第几天。这或许也是个好习惯,它时时刻刻提醒着主人:日子又过去一天,还有多少时光可以用于浪费呢?高斯出生于:1777年4月30日。在高斯发现的一个重要定理的日记_2013年第四届c a组蓝桥杯省赛真题解答
文章浏览阅读851次,点赞17次,收藏22次。摘要:本文利用供需算法对核极限学习机(KELM)进行优化,并用于分类。
文章浏览阅读1.1k次。一、系统弱密码登录1、在kali上执行命令行telnet 192.168.26.1292、Login和password都输入msfadmin3、登录成功,进入系统4、测试如下:二、MySQL弱密码登录:1、在kali上执行mysql –h 192.168.26.129 –u root2、登录成功,进入MySQL系统3、测试效果:三、PostgreSQL弱密码登录1、在Kali上执行psql -h 192.168.26.129 –U post..._metasploitable2怎么进入
文章浏览阅读257次。本文将为初学者提供Python学习的详细指南,从Python的历史、基础语法和数据类型到面向对象编程、模块和库的使用。通过本文,您将能够掌握Python编程的核心概念,为今后的编程学习和实践打下坚实基础。_python人工智能开发从入门到精通pdf