赞
踩
《Python数据分析技术栈》第07章Python数据可视化 03 Seaborn 库
Seaborn is another Python-based data visualization library. Seaborn changes the default properties of Matplotlib to adjust the color palettes and perform aggregation automatically on columns. The default settings make it easier to write the code needed for creating various plots.
Seaborn 是另一个基于 Python 的数据可视化库。Seaborn 更改了 Matplotlib 的默认属性,以调整调色板并自动对列进行聚合。通过默认设置,可以更轻松地编写创建各种图表所需的代码。
Seaborn offers the ability to customize these plots as well, but the customization options are less as compared to Matplotlib.
Seaborn 也提供自定义这些绘图的功能,但与 Matplotlib 相比,自定义选项较少。
Seaborn enables the visualization of data in more than two dimensions. It also requires data to be in the long (tidy) format, which is in contrast to Pandas, which needs data to be in a wide form.
Seaborn 可以实现二维以上数据的可视化。它还要求数据为长(整齐)格式,这与 Pandas 不同,后者要求数据为宽格式。
Let us see how to plot graphs using Seaborn with the Titanic dataset.
让我们看看如何使用 Seaborn 绘制泰坦尼克号数据集的图表。
We use the functions in Seaborn to create different plots for visualizing different variables in this dataset.
我们使用 Seaborn 中的函数创建不同的图表,以直观显示该数据集中的不同变量。
The Seaborn library needs to be imported first before its functions can be used. The alias for the Seaborn library is sns, which is used for invoking the plotting functions.
在使用 Seaborn 库的函数之前,需要先导入 Seaborn 库。Seaborn 库的别名是 sns,用于调用绘图函数。
import seaborn as sns
titanic=pd.read_csv('titanic.csv')
A box plot gives an idea of the distribution and skewness of a variable, based on statistical parameters, and indicates the presence of outliers (denoted by circles or dots), as shown in Figure 7-5. The boxplot function in Seaborn can be used to create box plots. The column name of the feature to be visualized is passed as an argument to this function.
如图 7-5 所示,箱形图根据统计参数显示变量的分布和偏度,并显示异常值(用圆圈或点表示)的存在。Seaborn 中的方框图功能可用于创建方框图。要可视化的特征的列名作为参数传递给该函数。
sns.boxplot(titanic['Age'])
There are two methods we can use when we pass arguments to any function used in Seaborn:
当我们向 Seaborn 中使用的任何函数传递参数时,可以使用两种方法:
We can either use the full column name (that includes the name ofthe DataFrame), skipping the data parameter.
我们可以使用完整的列名(包括 DataFrame 的名称),跳过数据参数。
sns.boxplot(titanic['Age'])
Or, mention the column names as strings and use the data parameterto specify the name of the DataFrame.
或者,以字符串形式列出列名,并使用 data 参数指定 DataFrame 的名称。
sns.boxplot(x='Age',data=titanic)
The kernel density estimate is a plot for visualizing the probability distribution of a continuous variable, as shown in Figure 7-6. The kdeplot function in Seaborn is used for plotting a kernel density estimate.
如图 7-6 所示,核密度估计是一种可视化连续变量概率分布的绘图。Seaborn 中的 kdeplot 函数用于绘制核密度估计值。
sns.kdeplot(titanic['Age'])
A violin plot merges the box plot with the kernel density plot, with the shape of the violin representing the frequency distribution, as shown in Figure 7-7. We use the violinplot function in Seaborn for generating violin plots.
如图 7-7 所示,小提琴图将方框图与核密度图合并,小提琴的形状代表频率分布。我们使用 Seaborn 中的小提琴图函数生成小提琴图。
sns.violinplot(x='Pclass',y='Age',data=titanic)
Count plots are used to plot categorical variables, with the length of the bars representing the number of observations for each unique value of the variable. In Figure 7-8, the two bars are showing the number of passengers who did not survive (corresponding to a value of 0 for the “Survived” variable) and the number of passengers who survived (corresponding to a value of 1 for the “Survived” variable). The countplot function in Seaborn can be used for generating count plots.
计数图用于绘制分类变量,条形图的长度代表变量每个唯一值的观察次数。在图 7-8 中,两个条形图分别显示了未存活的乘客人数("存活 "变量的值为 0)和存活的乘客人数("存活 "变量的值为 1)。Seaborn 中的计数图功能可用于生成计数图。
sns.countplot(titanic['Survived'])
A heatmap is a graphical representation of a correlation matrix, representing the correlation between different variables in a dataset, as shown in Figure 7-9. The intensity of the color represents the strength of the correlation, and the values represent the degree of correlation (a value closer to one represents two strongly correlated variables). Note that the values along the diagonal are all one since they represent the correlation of the variable with itself.
如图 7-9 所示,热图是相关矩阵的图形表示法,代表数据集中不同变量之间的相关性。颜色的深浅代表相关性的强弱,数值代表相关性的程度(数值越接近 1 代表两个相关性越强的变量)。请注意,沿对角线的值都是 1,因为它们代表变量与自身的相关性。
The heatmap function in Seaborn creates the heat map. The parameter annot (with the value “True”) enables the display of values representing the degree of correlation, and the cmap parameter can be used to change the default color palette. The corr method creates a DataFrame containing the degree of correlation between various pairs of variables. The labels of the heatmap are populated from the index and column values in the correlation DataFrame (titanic.corr in this example).
Seaborn 的热图函数可以创建热图。参数 annot(值为 “True”)可以显示代表相关度的值,参数 cmap 可用来更改默认调色板。corr 方法会创建一个 DataFrame,其中包含不同变量对之间的相关程度。热图的标签由相关 DataFrame(本例中为 titanic.corr)中的索引和列值填充。
sns.heatmap(titanic.corr(),annot=True,cmap='YlGnBu')
A facet grid represents the distribution of a single parameter or the relationship between parameters across a grid containing a row, column, or hue parameter, as shown in Figure 7-10. In the first step, we create a grid object (the row, col, and hue parameters are optional), and in the second step, we use this grid object to plot a graph of our choice (the name of the plot and the variables to be plotted are supplied as arguments to the map function). The FacetGrid function in Seaborn is used for plotting a facet grid.
如图 7-10 所示,面网格表示单一参数或参数之间的关系在包含行、列或色调参数的网格中的分布。第一步,我们创建一个网格对象(行、列和色调参数可选),第二步,我们使用该网格对象绘制我们选择的图形(图形名称和要绘制的变量作为参数提供给 map 函数)。Seaborn 中的 FacetGrid 函数用于绘制面网格。
g = sns.FacetGrid(titanic, col="Sex",row='Survived') #Initializing the grid
g.map(plt.hist,'Age')#Plotting a histogram using the grid object
This plot uses the linear regression model to plot a regression line between the data points of two continuous variables, as shown in Figure 7-11. The Seaborn function regplot is used for creating this plot.
如图 7-11 所示,该图使用线性回归模型绘制两个连续变量数据点之间的回归线。使用 Seaborn 函数 regplot 绘制此图。
sns.regplot(x='Age',y='Fare',data=titanic)
This plot is a combination of a regplot and a facet grid, as shown in Figure 7-12. Using the lmplot function, we can see the relationship between two continuous variables across different parameter values.
如图 7-12 所示,该图是回归图和面网格的组合。使用 lmplot 函数,我们可以看到两个连续变量在不同参数值下的关系。
In the following example, we plot two numeric variables (“Age” and “Fare”) across a grid with different row and column variables.
在下面的示例中,我们在网格中绘制两个数字变量("年龄 "和 “票价”),行列变量各不相同。
sns.lmplot(x='Age',y='Fare',row='Survived',data=titanic,col='Sex')
The following summarizes the differences between regplot and lmplot:
下面总结了 regplot 和 lmplot 的区别:
A strip plot is similar to a scatter plot. The difference lies in the type of variables used in a strip plot. While a scatter plot has both variables as continuous, a strip plot plots one categorical variable against one continuous variable, as shown in Figure 7-13. The Seaborn function striplot generates a strip plot.
条带图与散点图类似。区别在于条带图中使用的变量类型。散点图的两个变量都是连续变量,而带状图则是一个分类变量与一个连续变量的对比图,如图 7-13 所示。Seaborn 函数 striplot 可以生成条带图。
Consider the following example, where the “Age” variable is continuous, while the “Survived” variable is categorical.
请看下面的例子,其中 "年龄 "变量是连续变量,而 "存活 "变量是分类变量。
sns.stripplot(x='Survived',y='Age',data=titanic)
A swarm plot is similar to a strip plot, the difference being that the points in a swarm plot are not overlapping like those in a strip plot. With the points more spread out, we get a better idea of the distribution of the continuous variable, as shown in Figure 7-14. The Seaborn function swarmplot generates a swarm plot.
蜂群图与带状图类似,不同之处在于蜂群图中的点不像带状图中的点那样重叠。如图 7-14 所示,随着点更加分散,我们可以更好地了解连续变量的分布情况。Seaborn 函数 swarmplot 可以生成蜂群图。
sns.swarmplot(x='Survived',y='Age',data=titanic)
A catplot is a combination of a strip plot and a facet grid. We can plot one continuous variable against various categorical variables by specifying the row, col, or hue parameters, as shown in Figure 7-15. Note that while the strip plot is the default plot generated by the catplot function, it can generate other plots too. The type of plot can be changed using the kind parameter.
猫图是条形图和面状网格的组合。我们可以通过指定行、列或色调参数,将一个连续变量与各种分类变量绘制成图,如图 7-15 所示。需要注意的是,条形图是 catplot 函数生成的默认图形,但它也可以生成其他图形。使用种类参数可以更改绘图类型。
sns.catplot(x='Survived',y='Age',col='Survived',row='Sex',data=titanic)
A pair plot is one that shows bivariate relationships between all possible pairs of variables in the dataset, as shown in Figure 7-16. The Seaborn function pairplot creates a pair plot. Notice that you do not have to supply any column names as arguments since all the variables in the dataset are considered automatically for plotting. The only parameter that you need to pass is the name of the DataFrame. In some of the plots displayed as part of the pair plot output, any given variable is also plotted against itself. The plots along the diagonal of a pair plot show these plots.
配对图是显示数据集中所有可能的变量对之间的二元关系的图,如图 7-16 所示。Seaborn 函数 pairplot 可以创建配对图。请注意,您无需提供任何列名作为参数,因为数据集中的所有变量都会被自动考虑用于绘制。唯一需要传递的参数是 DataFrame 的名称。在作为配对绘图输出的一部分显示的某些绘图中,任何给定变量也会对照自身进行绘图。配对图对角线上的图显示了这些图。
sns.pairplot(data=titanic)
The joint plot displays the relationship between two variables as well as the individual distribution of the variables, as shown in Figure 7-17. The jointplot function takes the names of the two variables to be plotted as arguments.
sns.jointplot(x='Fare',y='Age',data=titanic)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。