泰坦尼克号 数据分析
My goal was to get a better understanding of how to work with tabular data so I challenged myself and started with the Titanic -project. I think this was an excellent way to learn the basics of data analysis with python.
我的目标是更好地了解如何使用表格数据,因此我挑战自我并开始了Titanic项目。 我认为这是学习python数据分析基础知识的绝佳方法。
You can find the competition here: https://www.kaggle.com/c/titanicI really recommend you to try it yourself if you want to learn how to analyze the data and build machine learning models.
您可以在这里找到比赛: https : //www.kaggle.com/c/titanic如果您想学习如何分析数据和建立机器学习模型,我真的建议您自己尝试一下。
I started by uploading the packages:
我首先上传了软件包:
import pandas as pd import numpy as npimport matplotlib.pyplot as pltimport seaborn as sns
Pandas is a great package for tabular data analysis. Numpy provides a high-performance multidimensional array object and tools for working with these arrays. Matplotlib packages help you to generate plots, histograms, power spectra, bar charts, etc., with just a few lines of code. Seaborn is developed based on the Matplotlib library and it can be used to create attractive and informative statistical graphics.
Pandas是用于表格数据分析的出色软件包。 Numpy提供了高性能的多维数组对象和用于处理这些数组的工具。 Matplotlib软件包可帮助您仅用几行代码即可生成图,直方图,功率谱,条形图等。 Seaborn是基于Matplotlib库开发的,可用于创建引人入胜且内容丰富的统计图形。
After loading these packages I loaded the data:
加载这些软件包后,我加载了数据:
df=pd.read_csv("train.csv")
Then I had a quick look at the data:
然后,我快速浏览了一下数据:
df.head()#This prints you the first 5 rows of the table#If you want to print 10 rows of the table instead of 5, then usedf.head(10)