赞
踩
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"在这个教程中,你将会学到如何使用python的pandas包对出租车GPS数据进行数据清洗,识别出行OD\n",
"\n",
"
"]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[pandas包的简介](https://baike.baidu.com/item/pandas/17209606?fr=aladdin)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 读取数据"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"首先,读取出租车数据。"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:51:53.552930Z",
"start_time": "2020-01-18T04:51:52.397018Z"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"#读取数据\n",
"data = pd.read_csv(r'data-sample/TaxiData-Sample',header = None)\n",
"#给数据命名列\n",
"data.columns = ['VehicleNum', 'Stime', 'Lng', 'Lat', 'OpenStatus', 'Speed']"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:51:58.299239Z",
"start_time": "2020-01-18T04:51:58.271312Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
"
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"
" \n",
"
\n","
\n","
VehicleNum\n","
Stime\n","
Lng\n","
Lat\n","
OpenStatus\n","
Speed\n","
\n","
\n","
\n","
\n","
0\n","
22271\n","
22:54:04\n","
114.167000\n","
22.718399\n","
0\n","
0\n","
\n","
\n","
1\n","
22271\n","
18:26:26\n","
114.190598\n","
22.647800\n","
0\n","
4\n","
\n","
\n","
2\n","
22271\n","
18:35:18\n","
114.201401\n","
22.649700\n","
0\n","
0\n","
\n","
\n","
3\n","
22271\n","
16:02:46\n","
114.233498\n","
22.725901\n","
0\n","
24\n","
\n","
\n","
4\n","
22271\n","
21:41:17\n","
114.233597\n","
22.720900\n","
0\n","
19\n","
\n","
\n","
\n","
],
"text/plain": [
" VehicleNum Stime Lng Lat OpenStatus Speed\n",
"0 22271 22:54:04 114.167000 22.718399 0 0\n",
"1 22271 18:26:26 114.190598 22.647800 0 4\n",
"2 22271 18:35:18 114.201401 22.649700 0 0\n",
"3 22271 16:02:46 114.233498 22.725901 0 24\n",
"4 22271 21:41:17 114.233597 22.720900 0 19"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#显示数据的前5行\n",
"data.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"数据的格式:\n",
"\n",
">VehicleNum —— 车牌 \n",
"Stime —— 时间 \n",
"Lng —— 经度 \n",
"Lat —— 纬度 \n",
"OpenStatus —— 是否有乘客(0没乘客,1有乘客) \n",
"Speed —— 速度 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 基础的数据操作"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame和Series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"DataFrame和Series\n",
"\n",
" > 当我们读一个数据的时候,我们读进来的就是DataFrame格式的数据表,而一个DataFrame中的每一列,则为一个Series \n",
" 也就是说,DataFrame由多个Series组成\n"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:52:25.713432Z",
"start_time": "2020-01-18T04:52:25.708450Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我们想取DataFrame的某一列,想得到的是Series,那么直接用以下代码\n",
"\n",
" > data[列名]"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:52:32.097575Z",
"start_time": "2020-01-18T04:52:32.090592Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(data['Lng'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2019-09-06T09:22:43.642625Z",
"start_time": "2019-09-06T09:22:43.638487Z"
}
},
"source": [
"如果我们想取DataFrame的某一列或者某几列,想得到的是DataFrame,那么直接用以下代码\n",
"\n",
"> data2[[列名,列名]]"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:52:33.013124Z",
"start_time": "2020-01-18T04:52:32.990186Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(data[['Lng']])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 数据的筛选"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"数据的筛选:\n",
"\n",
" 在筛选数据的时候,我们一般用data[条件]的格式\n",
" 其中的条件,是对data每一行数据的true和false布尔变量的Series"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" 例如,我们想得到车牌照为22271的所有数据\n",
" 首先我们要获得一个布尔变量的Series,这个Series对应的是data的每一行,如果车牌照为\"粤B4H2K8\"则为true,不是则为false\n",
" 这样子的Series很容易获得,只需要\n",
" data['VehicleNum']==22271"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:52:44.078571Z",
"start_time": "2020-01-18T04:52:44.049646Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
"Name: VehicleNum, dtype: bool"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(data['VehicleNum']==22271).head(5)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"ExecuteTime": {
"end_time": "2020-01-18T04:52:51.723416Z",
"start_time": "2020-01-18T04:52:51.688510Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"
"
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"\n",
"
" \n",
"
\n","
\n","
VehicleNum\n","
Stime\n","
Lng\n","
Lat\n","
OpenStatus\n","
Speed\n","
\n","
\n","
\n","
\n","
0\n","
22271\n","
22:54:04\n","
114.167000\n","
22.718399\n","
0\n","
0\n","
\n","
\n","
1\n","
22271\n","
18:26:26\n","
114.190598\n","
22.647800\n","
0\n","
4\n","
\n","
\n","
2\n","
22271\n","
18:35:18\n","
114.201401\n","
22.649700\n","
0\n","
0\n","
\n","
\n","
3\n","
22271\n","
16:02:46\n","
114.233498\n","
22.725901\n","
0\n","
24\n","
\n","
\n","
4\n","
22271\n","
21:41:17\n","
114.233597\n","
22.720900\n","
0\n","
19\n","
\n","
\n","
\n","
],
"text/plain": [
" VehicleNum Stime Lng Lat OpenStatus Speed\n",
"0 22271 22:54:04 114.167000 22.718399 0 0\n",
"1 22271 18:26:26 114.190598 22.647800 0 4\n",
"2 22271 18:35:18 114.201401 22.649700 0 0\n",
"3 22271 16:02:46 114.233498 22.725901 0 24\n",
"4 22271 21:41:17 114.233597 22.720900 0 19"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#得到车牌照为22271的所有数据\n",
"data[data['VehicleNum']==22271].head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"sou
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。