当前位置:   article > 正文

计算机毕业设计Hadoop+Spark+Hive租房推荐系统 贝壳租房数据分析 租房爬虫 租房可视化 租房大数据 大数据毕业设计 大数据毕设 机器学习_基于hive的民宿价格分析系统的设计与实现

基于hive的民宿价格分析系统的设计与实现

毕业技术方向调查表

姓名:    李昌福    

课题方向

房无忧房屋租赁平台

开发语言:     

Java       

 前端框架:

VUE

数据库:

MySQL

服务器端

框架:

SpringCloud

其他技术:

Hadoop、HDFS

方向意义

结合四年在校所学专业知识,针对如今人们对住房需求提升的问题,进行调研和分析,并利用Java、VUE、SpringCloud等技术开发XX房屋租赁平台,解决人们找房难、出租难的问题,并提供数据分析结果便于用户对房源及租期进行合理的规划。

预设

业务逻辑

模块一:租客用户模块

   功能点1:用户注册和登录

   功能点2:查看在租房屋的具体房源信息(时间,所在位置,大小等)

   功能点3:向房东提出看房请求 

   功能点4:对已租房屋向房东发起退租请求

   功能点5:查看租房历史,并可对其进行增删改查

模块二:房东用户模块

   功能点1:用户注册和登录

   功能点2:发布房源具体信息(包括图片、文字、视频等)

   功能点3:查阅看房请求(所对应的租客信息、时间、请求的房源)

   功能点4:管理看房请求(可对其接受或拒绝)和退租请求

模块三:管理员模块

   功能点1:管理员注册和登录

   功能点2:查看平台租客、房东权限和信息,并可对其进行管理

   功能点3:查看平台的房源内容,并有权限对其进行增删改查

   功能点4:发布平台公告,返回公告已确认信息

模块四:报障模块

   功能点1:租客发现故障,进行报障申请

   功能点2:房东查看对应租客未处理的故障

   功能点3:房东收到报障申请,开始处理已报故障

   功能点4:发布故障处理流程和处理结果反馈

模块五:数据导出及分析模块

   功能点1:将用户数据导出为MR平台数据文件 

   功能点2:允许用户基于HDFS分布式平台进行数据管理

   功能点3:允许用户基于Hadoop集群进行数据处理

   功能点4:处理数据,得出看房请求的最大值,请求量与时间的关系

   功能点5:处理并分析租客年龄信息,对比各年龄人群租房的偏好

   功能点6:输出数据成为数据库文件,供数据展示平台使用

技术或业务逻辑特色

核心算法代码分享如下:

  1. from selenium import webdriver
  2. from selenium.webdriver.common.by import By
  3. from lxml import etree
  4. import time
  5. from selenium.webdriver.chrome.options import Options
  6. import pymysql
  7. import re
  8. import json
  9. #一线城市租房信息
  10. #cities = ['bj','sh','gz','sz']
  11. cities = ['sz']
  12. options = Options()
  13. driver = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
  14. def get_url_info(url):
  15. driver.get(url)
  16. #driver.set_page_load_timeout(60)
  17. time.sleep(40)
  18. driver.refresh()
  19. driver.minimize_window()
  20. zufang = driver.find_element(By.XPATH,'/html/body/div[3]/div[1]/div[1]/div/div[1]/div[1]/span[1]/a')
  21. zufang.click()
  22. #driver.close()
  23. driver.switch_to.window(driver.window_handles[-1])
  24. time.sleep(1)
  25. page_source = driver.page_source;
  26. hs = etree.HTML(page_source)
  27. nums = driver.find_element(By.XPATH, '//*[@id="pager_wrap"]/div[@class="pager"]/a[@class="next"]/preceding-sibling::a[1]/span').text
  28. if nums and len(nums)>0:
  29. end = int(nums)
  30. else:
  31. end = 20;
  32. for j in range(0,end+1):
  33. ep_logs = hs.xpath('//ul[@class="house-list"]/li/@ep-log')
  34. imgs = hs.xpath('//ul[@class="house-list"]/li[@ep-log]/div[@class="img-list"]/a/img/@src')
  35. urls = hs.xpath('//ul[@class="house-list"]/li[@ep-log]/div[@class="des"]/h2/a/@href')
  36. decs = hs.xpath('//ul[@class="house-list"]/li[@ep-log]/div[@class="des"]/h2/a/text()')
  37. prices = hs.xpath('//ul[@class="house-list"]/li[@ep-log]/div[@class="list-li-right"]/div[@class="money"]/b/text()')
  38. danweis = hs.xpath('//ul[@class="house-list"]/li[@ep-log]/div[@class="list-li-right"]/div[@class="money"]/b/following-sibling::text()')
  39. next = driver.find_element(By.XPATH,'//*[@id="pager_wrap"]/div[@class="pager"]/a[@class="next"]')
  40. length = len(ep_logs)
  41. for i in range(0,length):
  42. data = {}
  43. dec = decs[i].split('|')
  44. json_obj = json.loads(ep_logs[i])
  45. print( json_obj['houseid'] )
  46. #data['id'] = json_obj['houseid'] #房间ID
  47. #if not_exists(houseid=data['id']):
  48. if not_exists(houseid= json_obj['houseid'] ):
  49. data['pic'] = imgs[i] #房间图片链接
  50. data['url'] = urls[i] #房间URL链接
  51. data['house_title'] = dec[1].strip() #房间标题
  52. data['rent_way'] = dec[0].strip() #租房模式
  53. data['house_pay'] = ''.join([prices[i],danweis[i].strip()]) #价格
  54. time.sleep(3)
  55. driver.get(data['url']) #进入详情页
  56. ps_inner = driver.page_source
  57. hs_inner = etree.HTML(ps_inner)
  58. pay_way = hs_inner.xpath('//span[@class="instructions"]/text()')
  59. #pay_way = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/div/span[2]/text()')
  60. #type_str = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/ul/li[2]/span[2]/text()')
  61. type_str = hs_inner.xpath('//ul[@class="f14"]/li[2]/span[2]/text()')
  62. #floor_str = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/ul/li[3]/span[2]/text()')
  63. floor_str = hs_inner.xpath('//ul[@class="f14"]/li[3]/span[2]/text()')
  64. # estate = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/ul/li[4]/span[2]/a/text()')
  65. estate = hs_inner.xpath('//ul[@class="f14"]/li[4]/span[2]/a/text()')
  66. # areas = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/ul/li[5]/span[2]/a[1]/text()')
  67. areas = hs_inner.xpath('//ul[@class="f14"]/li[5]/span[2]/a/text()')
  68. addresses = hs_inner.xpath('//span[@class="dz"]/text()')
  69. #addresses = hs_inner.xpath('/html/body/div[3]/div[2]/div[2]/div[1]/div[1]/ul/li[6]/span[2]/text()')
  70. #times = hs_inner.xpath('/html/body/div[3]/div[1]/p/text()')
  71. times = hs_inner.xpath('//div[@class="house-title"]/p/text()')
  72. agents = hs_inner.xpath('//*[@id="vipAgent"]/div[1]/p[1]/a/text()')
  73. disposals = hs_inner.xpath('//ul[@class="house-disposal"]/li[not(@class="no-config")]/text()')
  74. spots = hs_inner.xpath('//ul[@class="introduce-item"]/li[1]/span[2]/em/text()')
  75. #descs = hs_inner.xpath('//ul[@class="introduce-item"]/li[2]/span[2]/em/text()')
  76. descs = hs_inner.xpath('//ul[@class="introduce-item"]//li[3]/span[2]/text()')
  77. print(descs)
  78. if pay_way and len(pay_way)>0:
  79. data['house_pay_way'] = pay_way[0]
  80. if type_str and len(type_str)>0:
  81. type_str = type_str[0]
  82. types = type_str.split("\xa0\xa0")
  83. if types and len(types)==3:
  84. data['house_type'] = types[0]
  85. data['house_area'] = types[1].split(' ')[0]+"平"
  86. data['house_decora'] = types[2]
  87. elif types and len(types)==2:
  88. data['house_type'] = types[0]
  89. data['house_area'] = types[1]
  90. elif types and len(types)==1:
  91. data['house_type'] = types[0]
  92. if floor_str and len(floor_str)>0:
  93. floor_str = floor_str[0]
  94. floors = floor_str.split("\xa0\xa0")
  95. if floors and len(floors) == 2:
  96. data['toward'] = floors[0]
  97. f = floors[1].split('/')
  98. if f and len(f) == 2:
  99. data['floor'] = f[0]
  100. data['floor_height'] = f[1]
  101. elif f and len(f) == 1:
  102. data['floor'] = re.findall('\d{1,2}',f[0])[0]+'层'
  103. elif floors and len(floors) == 1:
  104. data['toward'] = floors[0]
  105. if estate and len(estate)>0:
  106. data['house_estate'] = estate[0].strip()
  107. if areas and len(areas)>0:
  108. data['area'] = areas[0]
  109. if addresses and len(addresses)>0:
  110. data['address'] = addresses[0].strip()
  111. if times and len(times)>0:
  112. times = times[len(times)-1].strip()
  113. data['time'] = times.split('\xa0')[0]
  114. print("时间:"+data['time'])
  115. if agents and len(agents)>0:
  116. data['agent_name'] = agents[0].strip()
  117. if disposals and len(disposals)>0:
  118. data['house_disposal'] = ' '.join(disposals).strip()
  119. if spots and len(spots)>0:
  120. data['house_spot'] = ' '.join(spots)
  121. if descs and len(descs)>0:
  122. data['house_desc'] = descs[0]
  123. print(data)
  124. to_mysql(data)
  125. driver.back()
  126. time.sleep(1)
  127. next.click()
  128. def not_exists(houseid):
  129. """
  130. 信息写入mysql
  131. """
  132. table = 'house_info'
  133. db = pymysql.connect(host='localhost', user='root', password='123456', port=3396, db='model')
  134. cursor = db.cursor()
  135. sql_search = "SELECT COUNT(1) FROM {table} where id = {id}".format(table=table,id=houseid)
  136. cursor.execute(sql_search)
  137. data_sql = cursor.fetchall()
  138. count = data_sql[0][0]
  139. if count > 0:
  140. print('exists')
  141. return False
  142. return True
  143. def to_mysql(data):
  144. """
  145. 信息写入mysql
  146. """
  147. table = 'house_info'
  148. keys = ', '.join(data.keys())
  149. values = ', '.join(['%s'] * len(data))
  150. db = pymysql.connect(host='localhost', user='root', password='123456', port=3396, db='model')
  151. cursor = db.cursor()
  152. sql = 'INSERT INTO {table}({keys}) VALUES ({values})'.format(table=table, keys=keys, values=values)
  153. try:
  154. datas = data.values()
  155. if cursor.execute(sql, tuple(datas)):
  156. print("Successful")
  157. db.commit()
  158. except:
  159. print('Failed')
  160. db.rollback()
  161. db.close()
  162. if __name__ == '__main__':
  163. for i in cities:
  164. url = 'XXXXX'%i
  165. get_url_info(url)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/397374
推荐阅读
相关标签
  

闽ICP备14008679号