当前位置:   article > 正文

Python出租车GPS数据处理(TransBigData)_taxidata-sample.csv

taxidata-sample.csv

使用TransBigData包进行出租车GPS数据处理

使用示例中的样例数据集在github仓库中,链接为:https://github.com/ni1o1/transbigdata/tree/main/docs/source/gallery

下面我们介绍如何使用TransBigData包,调用其中的函数实现对出租车GPS数据的快速处理。

首先我们引入TransBigData包,并读取数据:

  1. import transbigdata as tbd
  2. import pandas as pd
  3. import geopandas as gpd
  4. #读取数据
  5. data = pd.read_csv('TaxiData-Sample.csv',header = None)
  6. data.columns = ['VehicleNum','Time','Lng','Lat','OpenStatus','Speed']
  7. data
VehicleNumTimeLngLatOpenStatusSpeed
03474520:27:43113.80684722.623249127
13474520:24:07113.80989822.62739900
23474520:24:27113.80989822.62739900
33474520:22:07113.81134822.62806700
43474520:10:06113.81988522.647800054
.....................
5449942826521:35:13114.32150322.709499018
5449952826509:08:02114.32270122.68170000
5449962826509:14:31114.33670022.69010000
5449972826521:19:12114.35260022.72839900
5449982826519:08:06114.13770322.62170000

544999 rows × 6 columns

  1. #读取区域信息
  2. import geopandas as gpd
  3. sz = gpd.read_file(r'sz/sz.shp')
  4. sz.crs = None
  5. sz.plot()

../_images/output_3_1.png

数据预处理

TransBigData包也集成了数据预处理的常用方法。其中,tbd.clean_outofshape方法输入数据和研究范围区域信息,筛选剔除研究范围外的数据。而tbd.clean_taxi_status方法则可以剔除的载客状态瞬间变化的记录。在使用预处理的方法时,需要传入相应的列,代码如下:

  1. #数据预处理
  2. #剔除研究范围外的数据
  3. data = tbd.clean_outofshape(data, sz, col=['Lng', 'Lat'], accuracy=500)
  4. #剔除出租车数据中载客状态瞬间变化的记录
  5. data = tbd.clean_taxi_status(data, col=['VehicleNum', 'Time', 'OpenStatus'])

数据栅格化

以栅格形式表达数据分布是最基本的表达方法。GPS数据经过栅格化后,每个数据点都含有对应的栅格信息,采用栅格表达数据的分布时,其表示的分布情况与真实情况接近。如果要使用TransBigData工具进行栅格划分,首先需要确定栅格化的参数(可以理解为定义了一个栅格坐标系),参数可以帮助我们快速进行栅格化:

  1. #栅格化
  2. #定义范围,获取栅格化参数
  3. bounds = [113.6,22.4,114.8,22.9]
  4. params = tbd.grid_params(bounds,accuracy = 500)
  5. params

(113.6, 22.4, 0.004872390756896538, 0.004496605206422906)

取得栅格化参数后,将GPS对应至栅格,由LONCOL与LATCOL两列共同指定一个栅格:

  1. #将GPS栅格化
  2. data['LONCOL'],data['LATCOL'] = tbd.GPS_to_grids(data['Lng'],data['Lat'],params)

统计每个栅格的数据量:

  1. #集计栅格数据量
  2. datatest = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()

生成栅格的地理图形,并将它转化为GeoDataFrame:

  1. #生成栅格地理图形
  2. datatest['geometry'] = tbd.gridid_to_polygon(datatest['LONCOL'],datatest['LATCOL'],params)
  3. #转为GeoDataFrame
  4. import geopandas as gpd
  5. datatest = gpd.GeoDataFrame(datatest)

绘制栅格测试是否成功:

  1. #绘制
  2. datatest.plot(column = 'VehicleNum')

../_images/output_17_1.png

出行OD提取与集计

使用tbd.taxigps_to_od方法,传入对应的列名,即可提取出行OD:

  1. #从GPS数据提取OD
  2. oddata = tbd.taxigps_to_od(data,col = ['VehicleNum','Time','Lng','Lat','OpenStatus'])
  3. oddata
VehicleNumstimeslonslatetimeelonelatID
4270752239600:19:41114.01301622.66481800:23:01114.02140022.6639180
1313012239600:41:51114.02176722.64020000:43:44114.02607022.6402661
4174172239600:45:44114.02809922.64508200:47:44114.03038022.6500172
3761602239601:08:26114.03489722.61630101:16:34114.03561422.6467173
217682239601:26:06114.04602122.64125101:34:48114.06604822.6361834
...........................
576663680522:37:42114.11340322.53476722:48:01114.11436522.5506325332
1755193680522:49:12114.11436522.55063222:50:40114.11550122.5579835333
2120923680522:52:07114.11540222.55808323:03:27114.11848422.5478675334
1190413680523:03:45114.11848422.54786723:20:09114.13328622.6177505335
2241033680523:36:19114.11296822.54960123:43:12114.08948522.5389185336

5337 rows × 8 columns

对提取出的OD进行OD的栅格集计,并生成GeoDataFrame

  1. #栅格化OD并集计
  2. od_gdf = tbd.odagg_grid(oddata,params)
  3. od_gdf.plot(column = 'count')

../_images/output_22_1.png

出行OD小区集计

TransBigData包也提供了将OD直接集计到小区的方法

  1. #OD集计到小区(在不传入栅格化参数时,直接用经纬度匹配)
  2. od_gdf = tbd.odagg_shape(oddata,sz,round_accuracy=6)
  3. od_gdf.plot(column = 'count')

../_images/output_25_1.png

  1. #OD集计到小区(传入栅格化参数时,先栅格化后匹配,可加快匹配速度,数据量大时建议使用)
  2. od_gdf = tbd.odagg_shape(oddata,sz,params = params)
  3. od_gdf.plot(column = 'count')

../_images/output_26_1.png

基于matplotlib的地图绘制

tbd中提供了地图底图加载和比例尺指北针的功能。使用plot_map方法添加地图底图,plotscale添加比例尺和指北针:

  1. #创建图框
  2. import matplotlib.pyplot as plt
  3. import plot_map
  4. fig =plt.figure(1,(8,8),dpi=80)
  5. ax =plt.subplot(111)
  6. plt.sca(ax)
  7. #添加地图底图
  8. tbd.plot_map(plt,bounds,zoom = 12,style = 4)
  9. #绘制colorbar
  10. cax = plt.axes([0.05, 0.33, 0.02, 0.3])
  11. plt.title('count')
  12. plt.sca(ax)
  13. #绘制OD
  14. od_gdf.plot(ax = ax,vmax = 100,column = 'count',cax = cax,legend = True)
  15. #绘制小区底图
  16. sz.plot(ax = ax,edgecolor = (0,0,0,1),facecolor = (0,0,0,0.2),linewidths=0.5)
  17. #添加比例尺和指北针
  18. tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,accuracy = 2000,rect = [0.06,0.03],zorder = 10)
  19. plt.axis('off')
  20. plt.xlim(bounds[0],bounds[2])
  21. plt.ylim(bounds[1],bounds[3])
  22. plt.show()

../_images/output_29_0.png

出租车轨迹的提取

使用tbd.taxigps_traj_point方法,输入数据和OD数据,可以提取出轨迹点

  1. data_deliver,data_idle = tbd.taxigps_traj_point(data,oddata,col=['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus'])
  2. data_deliver
VehicleNumTimeLngLatOpenStatusSpeedLONCOLLATCOLIDflag
4270752239600:19:41114.01301622.664818163.085.059.00.01.0
4270852239600:19:49114.01403022.665483155.085.059.00.01.0
4166222239600:21:01114.01889822.66250011.086.058.00.01.0
4274802239600:21:41114.01934822.66230017.086.058.00.01.0
4166232239600:22:21114.02061522.66336610.086.059.00.01.0
.................................
1709603680523:42:31114.09276622.538317166.0101.031.05336.01.0
1709583680523:42:37114.09172122.538349165.0101.031.05336.01.0
1709743680523:42:43114.09075222.538300160.0101.031.05336.01.0
1709733680523:42:49114.08981322.538099162.0101.031.05336.01.0
2530643680523:42:55114.08950022.538067151.0100.031.05336.01.0

190492 rows × 10 columns

data_idle
VehicleNumTimeLngLatOpenStatusSpeedLONCOLLATCOLIDflag
4166282239600:23:01114.02140022.663918025.086.059.00.00.0
4017442239600:25:01114.02711522.662100025.088.058.00.00.0
3946302239600:25:41114.02455122.659834021.087.058.00.00.0
3946712239600:26:21114.02279722.65836700.087.057.00.00.0
3946722239600:26:29114.02279722.65836700.087.057.00.00.0
.................................
644113680523:53:09114.12035422.54430012.0107.032.05336.00.0
644053680523:53:15114.12035422.54430011.0107.032.05336.00.0
643903680523:53:21114.12035422.54430010.0107.032.05336.00.0
644063680523:53:27114.12035422.54430010.0107.032.05336.00.0
643933680523:53:33114.12035422.54430010.0107.032.05336.00.0

312779 rows × 10 columns

对轨迹点生成载客与空载的轨迹

  1. traj_deliver = tbd.points_to_traj(data_deliver)
  2. traj_deliver.plot()

../_images/output_36_1.png

  1. traj_idle = tbd.points_to_traj(data_idle)
  2. traj_idle.plot()

../_images/output_37_1.png

轨迹可视化

TransBigData包也依托于kepler.gl提供的可视化插件提供了一键数据整理与可视化的方法

使用此功能请先安装python的keplergl包

pip install keplergl

将轨迹数据进行可视化:

tbd.visualization_trip(data_deliver)

../_images/kepler-traj.png

声明:本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号