当前位置:   article > 正文

python处理出租车轨迹数据_1-出租车数据的基础处理,由gps生成OD(pandas).ipynb...

点迹数据处理python

{

"cells": [

{

"cell_type": "markdown",

"metadata": {},

"source": [

"在这个教程中,你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗,识别出行OD\n",

"\n",

"

提供的基础数据是:

数据:
\n",

" 1.出租车原始GPS数据(在data-sample文件夹下,原始数据集的抽样500辆车的数据)

"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 读取数据"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"首先,读取出租车数据。"

]

},

{

"cell_type": "code",

"execution_count": 2,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:53.552930Z",

"start_time": "2020-01-18T04:51:52.397018Z"

}

},

"outputs": [],

"source": [

"import pandas as pd\n",

"#读取数据\n",

"data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",

"#给数据命名列\n",

"data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"

]

},

{

"cell_type": "code",

"execution_count": 3,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:51:58.299239Z",

"start_time": "2020-01-18T04:51:58.271312Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 3,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#显示数据的前5行\n",

"data.head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的格式:\n",

"\n",

">VehicleNum —— 车牌 \n",

"Stime —— 时间 \n",

"Lng —— 经度 \n",

"Lat —— 纬度 \n",

"OpenStatus —— 是否有乘客(0没乘客,1有乘客) \n",

"Speed —— 速度 "

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"# 基础的数据操作"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## DataFrame和Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"DataFrame和Series\n",

"\n",

" > 当我们读一个数据的时候,我们读进来的就是DataFrame格式的数据表,而一个DataFrame中的每一列,则为一个Series \n",

" 也就是说,DataFrame由多个Series组成\n"

]

},

{

"cell_type": "code",

"execution_count": 87,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:25.713432Z",

"start_time": "2020-01-18T04:52:25.708450Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 87,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"如果我们想取DataFrame的某一列,想得到的是Series,那么直接用以下代码\n",

"\n",

" > data[列名]"

]

},

{

"cell_type": "code",

"execution_count": 88,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:32.097575Z",

"start_time": "2020-01-18T04:52:32.090592Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.series.Series"

]

},

"execution_count": 88,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data['Lng'])"

]

},

{

"cell_type": "markdown",

"metadata": {

"ExecuteTime": {

"end_time": "2019-09-06T09:22:43.642625Z",

"start_time": "2019-09-06T09:22:43.638487Z"

}

},

"source": [

"如果我们想取DataFrame的某一列或者某几列,想得到的是DataFrame,那么直接用以下代码\n",

"\n",

"> data2[[列名,列名]]"

]

},

{

"cell_type": "code",

"execution_count": 89,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:33.013124Z",

"start_time": "2020-01-18T04:52:32.990186Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"pandas.core.frame.DataFrame"

]

},

"execution_count": 89,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"type(data[['Lng']])"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"## 数据的筛选"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

"数据的筛选:\n",

"\n",

" 在筛选数据的时候,我们一般用data[条件]的格式\n",

" 其中的条件,是对data每一行数据的true和false布尔变量的Series"

]

},

{

"cell_type": "markdown",

"metadata": {},

"source": [

" 例如,我们想得到车牌照为22271的所有数据\n",

" 首先我们要获得一个布尔变量的Series,这个Series对应的是data的每一行,如果车牌照为\"粤B4H2K8\"则为true,不是则为false\n",

" 这样子的Series很容易获得,只需要\n",

" data['VehicleNum']==22271"

]

},

{

"cell_type": "code",

"execution_count": 90,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:44.078571Z",

"start_time": "2020-01-18T04:52:44.049646Z"

}

},

"outputs": [

{

"data": {

"text/plain": [

"0 True\n",

"1 True\n",

"2 True\n",

"3 True\n",

"4 True\n",

"Name: VehicleNum, dtype: bool"

]

},

"execution_count": 90,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"(data['VehicleNum']==22271).head(5)"

]

},

{

"cell_type": "code",

"execution_count": 92,

"metadata": {

"ExecuteTime": {

"end_time": "2020-01-18T04:52:51.723416Z",

"start_time": "2020-01-18T04:52:51.688510Z"

}

},

"outputs": [

{

"data": {

"text/html": [

"

\n",

"

" .dataframe tbody tr th:only-of-type {\n",

" vertical-align: middle;\n",

" }\n",

"\n",

" .dataframe tbody tr th {\n",

" vertical-align: top;\n",

" }\n",

"\n",

" .dataframe thead th {\n",

" text-align: right;\n",

" }\n",

"\n",

"

" \n",

"

\n",

"

\n",

"

VehicleNum\n",

"

Stime\n",

"

Lng\n",

"

Lat\n",

"

OpenStatus\n",

"

Speed\n",

"

\n",

"

\n",

"

\n",

"

\n",

"

0\n",

"

22271\n",

"

22:54:04\n",

"

114.167000\n",

"

22.718399\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

1\n",

"

22271\n",

"

18:26:26\n",

"

114.190598\n",

"

22.647800\n",

"

0\n",

"

4\n",

"

\n",

"

\n",

"

2\n",

"

22271\n",

"

18:35:18\n",

"

114.201401\n",

"

22.649700\n",

"

0\n",

"

0\n",

"

\n",

"

\n",

"

3\n",

"

22271\n",

"

16:02:46\n",

"

114.233498\n",

"

22.725901\n",

"

0\n",

"

24\n",

"

\n",

"

\n",

"

4\n",

"

22271\n",

"

21:41:17\n",

"

114.233597\n",

"

22.720900\n",

"

0\n",

"

19\n",

"

\n",

"

\n",

"

\n",

"

"

],

"text/plain": [

" VehicleNum Stime Lng Lat OpenStatus Speed\n",

"0 22271 22:54:04 114.167000 22.718399 0 0\n",

"1 22271 18:26:26 114.190598 22.647800 0 4\n",

"2 22271 18:35:18 114.201401 22.649700 0 0\n",

"3 22271 16:02:46 114.233498 22.725901 0 24\n",

"4 22271 21:41:17 114.233597 22.720900 0 19"

]

},

"execution_count": 92,

"metadata": {},

"output_type": "execute_result"

}

],

"source": [

"#得到车牌照为22271的所有数据\n",

"data[data['VehicleNum']==22271].head(5)"

]

},

{

"cell_type": "markdown",

"metadata": {},

"sou

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小舞很执着/article/detail/741929
推荐阅读
相关标签
  

闽ICP备14008679号