赞
踩
推荐一下来自工信出版集团的《Python数据分析从小白到专家》
整体的内容实用性极强,与大学生的知识体系匹配度较高,对于入门而言非常友好,并且难度设计有条理,到后期设计了数值分析的概率论知识,可以协助读者迅速入门回归问题和神经网络的初步探索
numpy运算
pandas表格和矩阵
matplotlib图表绘制
sklearn和statsmodels回归分析和统计计算
ndarray.ndim return int,维度 ndarray.size ndarray.dtype ndarray.shape return tuple,(round,column)
reshape
1维变2维
import numpy as np a=np.array([1,2,3,4,5,6,7,8]) b=np.arange(1,9).reshape(4,2) print(a) print(b)
正在上传…重新上传取消
reshape(x,x,x)
变3维
b=np.arange(1,9).reshape(2,2,2)
正在上传…重新上传取消
reshape(x,-1)
变成x行和不知道n列
b=np.arange(1,9).reshape(x,-1)
linspace(start,stop,step)
浮点型划分
import numpy as np a=np.arange(0,53,3).reshape(3,-1) b=np.linspace(0,53,3).reshape(3,-1) c=np.linspace(0,53,18).reshape(3,-1) print(a) print(b) print(c)
array(,dtype=float)
转为浮点或复数
a=np.array([1,2,3,4,5,6,7,8],dtype=complex)
其他一些方法
np.empty((x,y))
随机浮点矩阵
np.zeros((x,y))
零矩阵
np.ones((x,y))
1矩阵
np.pi
np.exp(1)
矩阵操作
具体可上网查
module 'numpy' has no attribute 'array' #py的名字不要叫numpy,不然import就冲突了,它会优先识别同文件夹的numpy,所以array的定义它就找不到了
Numpy - module has no attribute ‘arrange‘ #要把arrange改为arange
正则表达式-RE模块-高级文本匹配模式
import re str=""" Getting going with Fedora is easier than ever. All you need is a 2GB USB flash drive, and Fedora Media Writer. Once Fedora Media Writer is installed, it will set up your flash drive to run a "Live" version of Fedora Workstation, meaning that you can boot it from your flash drive and try it out right away without making any permanent changes to your computer. Once you are hooked, installing it to your hard drive is a matter of clicking a few buttons*. """ result = re.search('you',str).group() print(result)
有一些正则表达式的式子,可以去学习
csv使用逗号分隔和空格分隔
别的数据文件都转化为csv比较方便
因为与csv绑定的pandas很方便
import pandas as pd file = pd.read_csv('a.txt') file = file.head(10) file.to_csv('a.csv') print(file)
series和dataframe
series原属于numpy,而numpy没有分析与统计的方法,而pandas能实现分析与统计,pandas同时具备series与dataframe相互转化的能力
import pandas as pd import numpy as np a = pd.Series([0,1,2,34,np.nan,6,2,3]) print(a)
也可以使用键值对-字典
比较优雅的例子
import pandas as pd import numpy as np a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3], [12,1,312,3,12,312,4,1], [12,3,12,3,np.nan,12312]]) for col in a.columns: print(a[col])
dataframe转为series
import pandas as pd import numpy as np a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3], [12,1,312,3,12,312,4,1], [12,3,12,3,np.nan,12312]]) for col in a.columns: print(a[col]) print('-'*60) b = a.to_numpy() print(b)
dataframes列选择
import pandas as pd import numpy as np a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3], [12,1,312,3,12,312,4,1], [12,3,12,3,np.nan,123,1,2]],columns=list('ABCDEFGH')) print(a) print('-'*30) print(a['B'])
loc方法可用于选定特定行与列
同时loc方法可用于降维(选定部分行列)
布尔方法查找
import pandas as pd import numpy as np a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3], [12,1,312,3,12,312,4,1], [12,3,12,3,np.nan,123,1,2]],columns=list('ABCDEFGH')) print(a) print('-'*30) print(a['A'])
默认状态不支持中文,注意一下
mpl.rcParams['font.san-serifs']=['SimHei'] mpl.rcParams['axes.unicode_minimus']=False
想要RGB调色,需要加入seaborn库,然后它还需要一个scipy库
点图函数
def visualModel(x,y,ols,lad): fig = plt.figure(figsize=(12,6),dpi=80) ax2 = fig.add_subplot(121) ax3 = fig.add_subplot(122) ax2.set_xlabel("$x$") ax2.set_xticks(range(0,15000,1500)) ax2.set_ylabel("$x$") ax2.set_title('OLS') ax3.set_xlabel("$x$") ax3.set_xticks(range(0, 15000, 1500)) ax3.set_ylabel("$x$") ax3.set_title('LAD') ax2.scatter(x, y, color="b",alpha=0.4,label='实验数据') ax2.plot(x,ols,label='实验数据') ax3.scatter(x, y, color="b", alpha=0.4, label='预测数据') ax3.plot(x, lad, label='预测数据') plt.legend(shadow=True) plt.show()
散点图绘画,主要是scatter函数
import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np x = np.random.randn(1000) y = np.random.randn(1000) plt.scatter(x,y,marker='h',s=np.random.randn(1000)*100,cmap='Blues',c=y,edgecolors='black') plt.grid(True,linestyle='--') plt.show()
概率论的知识来了
import numpy as np import pandas as pd def generate_date(): np.random.seed(4889) x = np.array([10]+list(range(10,29))) error = np.round(np.random.randn(20),2) y = x + error x = np.append(x,29) y = np.append(y,29*10) return pd.DataFrame({"x": x, "y": y}) print(generate_date())
import matplotlib.pyplot as plt from sklearn import linear_model import numpy as np import pandas as pd def generate_date(): np.random.seed(4889) x = np.array([10]+list(range(10,29))) error = np.round(np.random.randn(20),2) y = x + error x = np.append(x,29) y = np.append(y,29*10) return pd.DataFrame({"x": x, "y": y}) def train_OLS(x,y): model=linear_model.LinearRegression() model.fit(x,y) re=model.predict(x) return re def visualize_model(x, y ,ols): fig = plt.figure(figsize=(8,8),dpi=80) ax = fig.add_subplot(111) ax.set_xlabel("$x$") ax.set_xticks(range(10,31,5)) ax.set_ylabel("$y$") ax.scatter(x, y, color="b", alpha=0.4) ax.plot(x, ols, 'r--',label="OLS") plt.legend(shadow=True) plt.show() if __name__=="__main__": data = generate_date() features=["x"] label=["y"] ols=train_OLS(data[features],data[label]) visualize_model(data[features],data[label],ols)
OLS模型会把离谱的数字都考虑进去,用在没有大幅度变化的数据,或者所有数据都要考虑的情况
LAD模型用于变化比较稳定的数据,可以自动剔除误差量
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。