python处理出租车轨迹数据_1-出租车数据的基础处理,由gps生成OD(pandas).ipynb...-程序员宅基地

技术标签: python处理出租车轨迹数据  

{

"cells": [

{

"cell_type": "markdown",

"metadata": {},

"source": [

"在这个教程中,你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗,识别出行OD\n",

"\n",

"

提供的基础数据是:

数据:
\n",

" 1.出租车原始GPS数据(在data-sample文件夹下,原始数据集的抽样500辆车的数据)

"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 读取数据"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"首先,读取出租车数据。"

]

},

{

"cell_type": "code",

"execution_count": 2,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:53.552930Z",

"start_time": "2020-01-18T04:51:52.397018Z"

}

},

"outputs": [],

"source": [

"import pandas as pd\n",

"#读取数据\n",

"data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",

"#给数据命名列\n",

"data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"

]

},

{

"cell_type": "code",

"execution_count": 3,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:58.299239Z",

"start_time": "2020-01-18T04:51:58.271312Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#显示数据的前5行\n",

"data.head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的格式:\n",

"\n",

">VehicleNum —— 车牌 \n",

"Stime —— 时间 \n",

"Lng —— 经度 \n",

"Lat —— 纬度 \n",

"OpenStatus —— 是否有乘客(0没乘客,1有乘客) \n",

"Speed —— 速度 "

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 基础的数据操作"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## DataFrame和Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"DataFrame和Series\n",

"\n",

" > 当我们读一个数据的时候,我们读进来的就是DataFrame格式的数据表,而一个DataFrame中的每一列,则为一个Series \n",

" 也就是说,DataFrame由多个Series组成\n"

]

},

{

"cell_type": "code",

"execution_count": 87,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:25.713432Z",

"start_time": "2020-01-18T04:52:25.708450Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 87,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"如果我们想取DataFrame的某一列,想得到的是Series,那么直接用以下代码\n",

"\n",

" > data[列名]"

]

},

{

"cell_type": "code",

"execution_count": 88,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:32.097575Z",

"start_time": "2020-01-18T04:52:32.090592Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.series.Series"

]

},

"execution_count": 88,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data['Lng'])"

]

},

{

"cell_type": "markdown",

"metadata": {

"ExecuteTime": {

"end_time": "2019-09-06T09:22:43.642625Z",

"start_time": "2019-09-06T09:22:43.638487Z"

}

},

"source": [

"如果我们想取DataFrame的某一列或者某几列,想得到的是DataFrame,那么直接用以下代码\n",

"\n",

"> data2[[列名,列名]]"

]

},

{

"cell_type": "code",

"execution_count": 89,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:33.013124Z",

"start_time": "2020-01-18T04:52:32.990186Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 89,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data[['Lng']])"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## 数据的筛选"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的筛选:\n",

"\n",

" 在筛选数据的时候,我们一般用data[条件]的格式\n",

" 其中的条件,是对data每一行数据的true和false布尔变量的Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

" 例如,我们想得到车牌照为22271的所有数据\n",

" 首先我们要获得一个布尔变量的Series,这个Series对应的是data的每一行,如果车牌照为\"粤B4H2K8\"则为true,不是则为false\n",

" 这样子的Series很容易获得,只需要\n",

" data['VehicleNum']==22271"

]

},

{

"cell_type": "code",

"execution_count": 90,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:44.078571Z",

"start_time": "2020-01-18T04:52:44.049646Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"0 True\n",

"1 True\n",

"2 True\n",

"3 True\n",

"4 True\n",

"Name: VehicleNum, dtype: bool"

]

},

"execution_count": 90,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"(data['VehicleNum']==22271).head(5)"

]

},

{

"cell_type": "code",

"execution_count": 92,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:51.723416Z",

"start_time": "2020-01-18T04:52:51.688510Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 92,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#得到车牌照为22271的所有数据\n",

"data[data['VehicleNum']==22271].head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"sou

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_39765840/article/details/110111915

智能推荐

【Linux篇】认识冯诺依曼体系结构-程序员宅基地

文章浏览阅读1.2k次,点赞35次,收藏22次。还记得我们说内存是连续的,一般来说,加载某些数据的时候,其周围的数据也会被加载进来。因此,操作系统的作用就发挥了,操作系统会把可能用到的数据提前加载到内存中,这样CPU就直接可以去内存中找数据啦。小红的网卡作为输入设备,接收到消息,然后将消息加载到内存,CPU去内存读取数据,把消息解包,得到消息里的内容,重新写回内存。当小明给小红在输入框中输入消息之后,就已经被写入了内存,然后当小明点击发送的时候,CPU读取内存中的数据,对消息进行打包(如指定发送到哪里),然后输出到网卡。如果要运行必须要加载到内存,

配置GOPRIVATE引用私有仓库_go_private-程序员宅基地

文章浏览阅读9.2k次,点赞2次,收藏8次。在使用gomod模式管理golang包的时候,下载开源的公共包还可以,但是一旦使用内部或者私有的包,就可能会出现如下所示的问题:server response: not found: git.xxx.com/xxxxxx/[email protected]: unrecognized import path "git.xxx.com/xxxxxx/king": https fetch: Get "https://git.xxx.com/xxxxxx/king?go-get=1": dial tcp xx.xx.xx._go_private

腾讯云4核8G服务器性能如何?轻量12M够用吗?_4核8g12m 与 2核8g8m 差距-程序员宅基地

文章浏览阅读593次。腾讯云轻量4核8G12M服务器配置具有100%CPU性能,系统盘为180GB SSD盘,12M带宽下载速度1536KB/秒,月流量2000GB,折合每天66.6GB流量,超出月流量包的流量另外计费,地域节点可选广州/上海/北京,腾讯云百科分享腾讯云轻量4核8G12M应用服务器CPU性能、限制条件_4核8g12m 与 2核8g8m 差距

Socket编程------TCP文件传输(文档、声音、图片、视频和压缩包等)_tcp能实现各种格式的文件传输吗-程序员宅基地

文章浏览阅读8.3k次,点赞5次,收藏38次。本程序是基于TCP稳定传输的文件传输,可以兼容任何类型任何格式的文件传输。☆基本思路(客户端)客户端需要明确服务器的ip地址以及端口,这样才可以去试着建立连接,如果连接失败,会出现异常。连接成功,说明客户端与服务端建立了通道,那么通过IO流就可以进行数据的传输,而Socket对象已经提供了输入流和输出流对象,通过getInputStream(), getOutputStream()_tcp能实现各种格式的文件传输吗

五年的Java程序员成长自述。-程序员宅基地

文章浏览阅读137次。恍然间,发现自己在这个行业里已经摸爬滚打了五年了,原以为自己就凭已有的项目经验和工作经历怎么着也应该算得上是一个业内比较资历的人士了,但是今年在换工作的过程中却遭到了重大的挫折。详细过程我就不再叙述,在此,只想给大家说一说被拒绝的原因,看看大家有没有相似的经历,和类似的感悟。面试官对我的答复大致是这样的,我们不需要熟练工,我们需要在某领域拥有超过常人的积累认知,和拥有整套完整思维模式和优秀认知事物..._java程序自述报告

蓝牙mesh协议的架构讲解_mesh 协议详解-程序员宅基地

文章浏览阅读4.8k次,点赞3次,收藏10次。BLE Mesh 的基础架构  BLE Mesh的架构一共可以分成8层,如图所示1.蓝牙低功耗(Bluetooth Low Energy Core Specification)  最底下的 蓝牙低功耗 这一层,我将它标成了浅蓝色与上面几层进行了区分,原因是 蓝牙低功耗 并非仅是mesh架构的其中一层,而是完整的蓝牙低功耗协议栈,是提供基础无线通信功能所必需的组件,这些功能可为位于其上的mes..._mesh 协议详解

随便推点

如何理解时钟(STM32)_stm32为什么时钟很重要-程序员宅基地

文章浏览阅读1.8k次,点赞45次,收藏43次。时钟基本概念要是想学习时钟,我们首先需要了解,什么是时钟1) 时钟是嵌入式系统的脉搏,处理器内核在时钟驱动下完成指令执行,状态变换等动作,外设部件在时钟的驱动下完成各种工作,例如:串口数据的发送、AD转换、定时器计数等因此时钟对于计算机系统是至关重要的,通常时钟系统出现问题也是致命的,比如振荡器不起振、振荡不稳、停振等。时钟信号推动单片机内各个部分执行相应的指令,时钟就像人的心跳一样。2)时钟系统的组成:时钟源(振荡源)、唤醒定时器、倍频器、分频器。_stm32为什么时钟很重要

全志of_get_named_gpio_flags参数不同-程序员宅基地

文章浏览阅读1.5k次。struct gpio_config config;10431044 rst_pin = of_get_named_gpio_flags(np, "reset-gpio", 0, (enum of_gpio_flags *)&config);1045 if (!gpio_is_valid(rst_pin)) {1046 printk("can rst_pin: %d is invalid\n", rst_pin); return -ENODEV;..._of_get_named_gpio_flags

git拉取代码报错:error:RPC failed; curl 56 OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 10054_git error: rpc failed; curl 56 openssl ssl_read: s-程序员宅基地

文章浏览阅读652次。百度了好多文章才找到的答案,分享一下并记录一下,避免自己忘记:终端输入:$ git config http.sslVerify “false”报错信息:fatal: not in a git directory执行下列命令解决:$ git config --global http.sslVerify “false”最后:再次克隆即可!..._git error: rpc failed; curl 56 openssl ssl_read: ssl_error_syscall, errno 0

【金融风险管理】python进行动态波动率的计算和时间序列的预测_python 已实现波动率计算-程序员宅基地

文章浏览阅读2k次,点赞9次,收藏26次。根据前一篇文章算计算出来的股票对数收益率,我们在这一篇文章在前文的基础上,分别用朴素法(平均法)简单移动平均法5日简单移动平均法10日移动平均法15日移动平均法来一次指数平滑法二次指数平滑法三次指数平滑法来预测。并且用RMSEADF检验对数据进行平稳性检验。简单平均法就是用过去所有的值的平均值来作为我们的预测值,也就是所谓的期望均值,简单易计算,但是由于方法简单,限制较少,对数据的敏感度较大,预测效果一般。_python 已实现波动率计算

获取OSS上视频的第一帧_获取oss的第一帧-程序员宅基地

文章浏览阅读8.7k次,点赞2次,收藏3次。视频地址+?x-oss-process=video/snapshot,t_10000,m_fast例:https://video.aliyunoss.com/1.mp4?x-oss-process=video/snapshot,t_10000,m_fast_获取oss的第一帧

Linux svn 导出diff修改文件命令_svn如何批量生成diff-程序员宅基地

文章浏览阅读1.1k次。svn st -q 查看在当前目录下的所有修改过的文件。svn diff ./ 查看文件修改内容svn diff > modify.dff ./ 导出当前目录下的所有修改文件diff文件, modify.diff : 自定义命名文件。就是修改的内容文件。svn diff a...._svn如何批量生成diff

推荐文章

热门文章

相关标签