3. 分组统计¶

Pandas 的分组统计功能可以按某一列的内容对数据行进行分组，并对其应用统计函数，比如求和，平均数，中位数，标准差等等…

举例来说，用 .groupby() 方法，我们可以对下面这数据表按 'Company' 列进行分组，并用 .mean() 求每组的平均值：

import pandas as pd
import numpy as np
d={'Company':['Google','Google','Google','Oracle','Oracle','Oracle','Twitter','Twitter','Twitter'],
   'Person':['Peter','Sam','Ann','Cindy','Tom','Lee','Gim','Will','Bob'], 'Sale':[200,300,275,250,270,300,350,220,250],
   'Wage':[4000,3870,3900,3500,3600,3870,3650,3220,3660]}
df=pd.DataFrame(d)
df

以平均销售额（mean of 'Sale'）度对公司（‘Company’）分类不是排序 String index按首字母派列¶

df.groupby('Company').mean()#结果中，Sale列就变成每个公司的分组平均数了。

type(df.groupby('Company'))

pandas.core.groupby.groupby.DataFrameGroupBy

df.groupby('Company')['Wage'].mean()

Company
Google     3923.333333
Oracle     3656.666667
Twitter    3510.000000
Name: Wage, dtype: float64

df.groupby('Person').mean()

4. 计数¶

用 .count(axis,level,numeric_only) 方法，能对 DataFrame 中的某个元素/非零元素出现的次数进行计数。¶

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’ counts are generated for each column . If 1 or ‘columns’ counts are generated for each row.

level : int or str, optional

If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame. A str specifies the level name.

numeric_only : boolean, default False

Include only float, int or boolean data.

df.set_index(["Company", "Person"]) #制作一个Multiindex

df.set_index(["Company", "Person"]).count(level="Company") #对Multiindex的某一个level统计

df.set_index(["Person", "Company"])

df.set_index(["Person", "Company"]).count(level="Person")

df

df.count() #每一列的非零元素

Company    9
Person     9
Sale       9
Wage       9
dtype: int64

df.count(axis=1) #每一行的非零元素

0    4
1    4
2    4
3    4
4    4
5    4
6    4
7    4
8    4
dtype: int64

df['Company']

0     Google
1     Google
2     Google
3     Oracle
4     Oracle
5     Oracle
6    Twitter
7    Twitter
8    Twitter
Name: Company, dtype: object

df['Company'].count()

9

5. 数据描述¶

Pandas 的 .describe() 方法将对 DataFrame 里的数据进行分析，并一次性生成多个描述性的统计指标，方便用户对数据有一个直观上的认识。

生成的指标，从左到右分别是：计数、平均数、标准差、最小值、25% 50% 75% 位置的值、最大值。

df.groupby('Company').describe()

df.groupby('Company')['Sale'].describe() # 功能与df.groupby('Company').describe()['Sale']

df.groupby('Company').describe()['Sale']

如果你不喜欢这个排版，你可以用 .transpose() 方法获得一个竖排的格式：¶

df.groupby('Company').describe().transpose()

df.groupby('Company')['Sale'].describe().transpose()

如果你只想看 Google 的数据，还能这样：¶

df.groupby('Company').describe().transpose()['Google']

Sale  count       3.000000
      mean      258.333333
      std        52.041650
      min       200.000000
      25%       237.500000
      50%       275.000000
      75%       287.500000
      max       300.000000
Wage  count       3.000000
      mean     3923.333333
      std        68.068593
      min      3870.000000
      25%      3885.000000
      50%      3900.000000
      75%      3950.000000
      max      4000.000000
Name: Google, dtype: float64

6. 堆叠（Concat）¶

堆叠基本上就是简单地把多个 DataFrame 堆在一起，拼成一个更大的 DataFrame。当你进行堆叠的时候，请务必注意你数据表的索引和列的延伸方向，堆叠的方向要和它一致。¶

比如，有这样3个 DataFrame：¶

df1=pd.DataFrame({'A':['A0','A1','A2'],'B':['B0','B1','B2'],'C':['C1','C2','C3']},index=[0,1,2])
df2=pd.DataFrame({'A':['A3','A4','A5'],'B':['B3','B4','B5'],'C':['C3','C4','C5']},index=[3,4,5])
df3=pd.DataFrame({'A':['A6','A7','A8'],'B':['B6','B7','B8'],'C':['C6','C7','C8']},index=[6,7,8])
pd.concat([df1,df2,df3]) #[df1,df2,df3]

pd.concat([df1,df2,df3],axis=0)

pd.concat([df1,df2,df3],axis=1) 
#注意，这里出现了一大堆空值。因为我们用来堆叠的3个 DataFrame 里，有许多索引是没有对应数据的。
#因此，当你使用 pd.concat() 的时候，一定要注意堆叠方向的坐标轴（行或列）含有所需的所有数据。

默认index 都一样的index可以横向叠加¶

df1=pd.DataFrame({'A':['A0','A1','A2'],'B':['B0','B1','B2'],'C':['C1','C2','C3']})
df2=pd.DataFrame({'A':['A3','A4','A5'],'B':['B3','B4','B5'],'C':['C3','C4','C5']})
df3=pd.DataFrame({'A':['A6','A7','A8'],'B':['B6','B7','B8'],'C':['C6','C7','C8']})
pd.concat([df1,df2,df3],axis=0) #[df1,df2,df3]

pd.concat([df1,df2,df3],axis=1) #[df1,df2,df3]

7. 归并（Merge）¶

使用 pd.merge() 函数，能将多个 DataFrame 归并在一起，它的合并方式类似合并 SQL 数据表的方式。

归并操作的基本语法是 pd.merge(left_dataframe, right_dataframe, how='inner', on='Key') 。其中 :

left_dataframe 参数代表放在左侧的 DataFrame
right_dataframe 参数代表放在右边的 DataFrame
how='inner' 指的是当左右两个 DataFrame 中存在不重合的 Key 时，取结果的方式：inner 代表交集（default）；Outer 代表并集
on='Key' 代表需要合并的键值所在的列，最后整个表格会以该列为准进行归并。

merge_df1=pd.DataFrame({'key':['k1','k2','k3'],'A':['A1','A2','A3'],'B':['B1','B2','B3']})
merge_df2=pd.DataFrame({'key':['k1','k2','k3'],'C':['C1','C2','C3'],'D':['D1','D2','D3']})
pd.merge(merge_df1,merge_df2,how='inner',on='key')

同时，我们可以传入多个 on 参数，这样就能按多个键值进行归并：¶

merge_df1=pd.DataFrame({'key1':['k1','k2','k3'],'key2':['k4','k5','k6'],'A':['A1','A2','A3'],'B':['B1','B2','B3']})
merge_df2=pd.DataFrame({'key1':['k1','k2','k3'],'key2':['k4','k5','k6'],'C':['C1','C2','C3'],'D':['D1','D2','D3']})
pd.merge(merge_df1,merge_df2,how='inner',on=['key1','key2'])

outer 当key不完全一样¶

merge_df1=pd.DataFrame({'key1':['k1','k3','k4'],'A':['A1','A2','A3'],'B':['B1','B2','B3']})
merge_df2=pd.DataFrame({'key1':['k1','k2','k3'],'C':['C1','C2','C3'],'D':['D1','D2','D3']})
pd.merge(merge_df1,merge_df2,how='outer',on='key1')

8. 连接（Join）¶

如果你要把两个表连在一起，然而它们之间没有太多共同的列，那么你可以试试 .join() 方法。¶

!! 和 .merge() 不同，连接采用索引作为公共的键，而不是某一列。

同样有 how=‘inner’ /'outer' 默认为inner 交集¶

join_df1=pd.DataFrame({'A':['A1','A2','A3'],'B':['B1','B2','B3']})
join_df2=pd.DataFrame({'C':['C1','C2','C3'],'D':['D1','D2','D3']})
join_df1.join(join_df2)

join_df1=pd.DataFrame({'A':['A1','A2','A3'],'B':['B1','B2','B3']},index=['key1','key2','key3'])
join_df2=pd.DataFrame({'C':['C1','C2','C3'],'D':['D1','D2','D3']},index=['key1','key3','key4'])
join_df1.join(join_df2)

join_df1=pd.DataFrame({'A':['A1','A2','A3'],'B':['B1','B2','B3']},index=['key1','key2','key3'])
join_df2=pd.DataFrame({'C':['C1','C2','C3'],'D':['D1','D2','D3']},index=['key1','key3','key4'])
join_df1.join(join_df2,how='outer')

9. 数值处理¶

9.1 查找不重复的值¶

不重复的值，在一个 DataFrame 里往往是独一无二，与众不同的。找到不重复的值，在数据分析中有助于避免样本偏差。在 Pandas 里，主要用到 3 种方法：

.unique() 方法。所有不重复的值：
.nunique() 方法，获取所有不重复值的个数：
.value_counts() 同时获得所有值和对应值的计数：

join_df1=pd.DataFrame({'A':['A1','A2','A3','A1'],'B':['B1','B2','B3','B4']})
join_df1.head()

join_df1['A'].unique()

array(['A1', 'A2', 'A3'], dtype=object)

join_df1['B'].nunique()

4

join_df1['A'].nunique()

3

join_df1['A'].value_counts()

A1    2
A2    1
A3    1
Name: A, dtype: int64

join_df1.value_counts()# DataFrame没有这个属性 只有Series有

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-37-f4aa70f5183d> in <module>
----> 1 join_df1.value_counts()# DataFrame没有这个属性 只有Series有

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
-> 4376             return object.__getattribute__(self, name)
   4377 
   4378     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'value_counts'

type(join_df1['A'])

9.2 apply() 方法¶

用 .apply() 方法，可以对 DataFrame 中的数据应用自定义函数，进行数据处理。¶

比如，我们先定义一个 square() 函数，然后对表中的 col1 列应用这个函数：¶

# suppose we have a function to calculate the square of a given value

def square(x):
    return x*x


# we can call the self-defined function with .apply()

df_apply=pd.DataFrame({'Company':['Google','Google','Google','Oracle','Oracle','Oracle','Twitter','Twitter','Twitter'],
   'Person':['Peter','Sam','Ann','Cindy','Tom','Lee','Gim','Will','Bob'], 'Sale':[20,30,27.5,25,27,30,35,22,25]})
df_apply['Sale'].apply(square)

0     400.00
1     900.00
2     756.25
3     625.00
4     729.00
5     900.00
6    1225.00
7     484.00
8     625.00
Name: Sale, dtype: float64

df_apply['Company'].apply(len) #Company列的内容字符长度

0    6
1    6
2    6
3    6
4    6
5    6
6    7
7    7
8    7
Name: Company, dtype: int64

有的时候，你定义了一个函数，而它其实只会被用到一次。那么，我们可以用 lambda 表达式来代替函数定义，简化代码。比如，我们可以用这样的 lambda 表达式代替上面的Square函数定义：¶

df_apply['Sale'].apply(lambda x: x*x)

0     400.00
1     900.00
2     756.25
3     625.00
4     729.00
5     900.00
6    1225.00
7     484.00
8     625.00
Name: Sale, dtype: float64

9.3 获取 DataFrame 的属性¶

DataFrame 的属性包括列和索引的名字。假如你不确定表中的某个列名是否含有空格之类的字符，你可以通过 .columns 来获取属性值，以查看具体的列名。¶

df_apply.index

RangeIndex(start=0, stop=9, step=1)

df_apply.columns

Index(['Company', 'Person', 'Sale'], dtype='object')

9.4 排序¶

如果想要将整个表按某一列的值进行排序，可以用 .sort_values('Column_Name') ：¶

表格变成按 col2 列的值从小到大排序。要注意的是，表格的索引 index 还是对应着排序前的行，并没有因为排序而丢失原来的索引数据。¶

df_apply.sort_values('Sale')

df_apply#df_apply.set_index('Person')

df_apply.sort_values('Sale')

type(df_apply.sort_values('Sale'))

pandas.core.frame.DataFrame

df_apply.sort_values('Sale').loc['Will']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1789                 if not ax.contains(key):
-> 1790                     error()
   1791             except TypeError as e:

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in error()
   1784                                .format(key=key,
-> 1785                                        axis=self.obj._get_axis_name(axis)))
   1786 

KeyError: 'the label [Will] is not in the [index]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-47-e99213bf1e72> in <module>
----> 1 df_apply.sort_values('Sale').loc['Will']

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1909 
   1910         # fall thru to straight lookup
-> 1911         self._validate_key(key, axis)
   1912         return self._get_label(key, axis=axis)
   1913 

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1796                 raise
   1797             except:
-> 1798                 error()
   1799 
   1800     def _is_scalar_access(self, key):

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in error()
   1783                 raise KeyError(u"the label [{key}] is not in the [{axis}]"
   1784                                .format(key=key,
-> 1785                                        axis=self.obj._get_axis_name(axis)))
   1786 
   1787             try:

KeyError: 'the label [Will] is not in the [index]'

df_apply.sort_values('Sale').loc['Will','Company']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1789                 if not ax.contains(key):
-> 1790                     error()
   1791             except TypeError as e:

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in error()
   1784                                .format(key=key,
-> 1785                                        axis=self.obj._get_axis_name(axis)))
   1786 

KeyError: 'the label [Will] is not in the [index]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-48-373f7b5d5e4f> in <module>
----> 1 df_apply.sort_values('Sale').loc['Will','Company']

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1470             except (KeyError, IndexError):
   1471                 pass
-> 1472             return self._getitem_tuple(key)
   1473         else:
   1474             # we by definition only have the 0th axis

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    868     def _getitem_tuple(self, tup):
    869         try:
--> 870             return self._getitem_lowerdim(tup)
    871         except IndexingError:
    872             pass

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    996         for i, key in enumerate(tup):
    997             if is_label_like(key) or isinstance(key, tuple):
--> 998                 section = self._getitem_axis(key, axis=i)
    999 
   1000                 # we have yielded a scalar ?

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1909 
   1910         # fall thru to straight lookup
-> 1911         self._validate_key(key, axis)
   1912         return self._get_label(key, axis=axis)
   1913 

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
   1796                 raise
   1797             except:
-> 1798                 error()
   1799 
   1800     def _is_scalar_access(self, key):

~/anaconda3/envs/dataanalysis/lib/python3.7/site-packages/pandas/core/indexing.py in error()
   1783                 raise KeyError(u"the label [{key}] is not in the [{axis}]"
   1784                                .format(key=key,
-> 1785                                        axis=self.obj._get_axis_name(axis)))
   1786 
   1787             try:

KeyError: 'the label [Will] is not in the [index]'

9.5 查找空值¶

假如你有一个很大的数据集，你可以用 Pandas 的 .isnull() 方法，方便快捷地发现表中的空值。¶

这返回的是一个新的 DataFrame，里面用布尔值（True/False）表示原 DataFrame 中对应位置的数据是否是空值。

df_apply.isnull()

df_apply['Company'].isnull()

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
Name: Company, dtype: bool

9.6 数据透视表¶

在使用 Excel 的时候，你或许已经试过数据透视表的功能了。数据透视表是一种汇总统计表，它展现了原表格中数据的汇总统计结果。Pandas 的数据透视表能自动帮你对数据进行分组、切片、筛选、排序、计数、求和或取平均值，并将结果直观地显示出来。¶

比如，这里有个关于动物的统计表：¶

data={'A':['Dog','Dog','Dog','Goat','Goat','Goat'],
      'B':['Brown','Brown','Black','Brown','Brown','Black'],
      'C':['x','y','x','y','x','y'],
      'D':[1,3,2,4,5,3]}
df_pivot=pd.DataFrame(data)
df_pivot

Pandas 数据透视表的语法是 .pivot_table(data, values='', index=[''], columns=['']) ，

values 要汇总的数据所在列，
index 成为数据汇总表的索引的列
columns 成为数据汇总表的列的列

pd.pivot_table(df_pivot,values='D',index=['A','B'],columns='C')

或者也可以直接调用 df 对象的方法：

df_pivot.pivot_table(values='D',index=['A','B'],columns='C')

9.7 导入导出数据¶

采用类似 pd.read_ 这样的方法，你可以用 Pandas 读取各种不同格式的数据文件，包括 Excel 表格、CSV 文件、SQL 数据库，甚至 HTML 文件等。¶

9.7.1 写入 CSV 文件¶

df_pivot.to_csv('New_DataFrame.csv') # See problem if the index doesn't be ignored

df_pivot.to_csv('New_DataFrame_2.csv',index=False)# if you don't want the index 0,1,2..

9.7.2 读取 CSV 文件¶

pd.read_csv() 就能将 CSV 文件里的数据转换成 DataFrame 对象：

import os
os.chdir("/home/junyao/文档/pandas_practice")

new_df=pd.read_csv('New_DataFrame_2.csv')
new_df

new_df1=pd.read_csv('New_DataFrame.csv')
new_df1

9.7.3 写入 Excel 表格文件¶

df_pivot_new = df_pivot.set_index('A')
df_pivot_new

df_pivot_new.to_excel('New_DataFrame.xlsx',sheet_name='Sheet 1')

9.7.4 读出 Excel 表格文件¶

Excel 文件是一个不错的数据来源。使用 pd.read_excel() 方法，我们能将 Excel 表格中的数据导入 Pandas 中。请注意，Pandas 只能导入表格文件中的数据，其他对象，例如宏、图形和公式等都不会被导入。如果文件中存在有此类对象，可能会导致 pd.read_excel() 方法执行失败。¶

new_df_pivot_new=pd.read_excel('New_DataFrame.xlsx')

new_df_pivot_new

9.7.5 读取 HTML的数据¶

为了读取 HTML 文件，你需要安装 htmllib5，lxml 以及 BeautifulSoup4 库，在终端或者命令提示符运行以下命令来安装：¶

- conda install lxml¶

- conda install html5lib¶

- conda install BeautifulSoup4¶

举个例子，我们用让 Pandas 读取这个页面的数据： https://en.wikipedia.org/wiki/Udacity 。由于一个页面上含有多个不同的表格，我们需要通过下标 [0, ..., len(tables) - 1] 访问数组中的不同元素。¶

下面的这个例子，我们显示的是页面中的第 2 个表格：¶

df_html=pd.read_html('https://en.wikipedia.org/wiki/Udacity')

df_html[0]

df_html[1]

	A	B	C	A	B	C	A	B	C
0	A0	B0	C1	NaN	NaN	NaN	NaN	NaN	NaN
1	A1	B1	C2	NaN	NaN	NaN	NaN	NaN	NaN
2	A2	B2	C3	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	A3	B3	C3	NaN	NaN	NaN
4	NaN	NaN	NaN	A4	B4	C4	NaN	NaN	NaN
5	NaN	NaN	NaN	A5	B5	C5	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	A6	B6	C6
7	NaN	NaN	NaN	NaN	NaN	NaN	A7	B7	C7
8	NaN	NaN	NaN	NaN	NaN	NaN	A8	B8	C8

	0	1
0	NaN	NaN
1	Type of site	Online education
2	Available in	English
3	Founded	June 2011[1]
4	Created by	Sebastian Thrun, David Stavens, Mike Sokolsky
5	CEO	Vishal Makhijani [2]
6	Website	www.udacity.com
7	Alexa rank	1,781 (September 2017[3]
8	Commercial	Yes
9	Registration	Required
10	Users	1.6 Million[4]
11	Launched	February 2012
12	Current status	Active

	0	1	2	3
0	ID	Course Name	Primary Instructor(s)	Launch Date
1	BIO110	Tales from the Genome: Introduction to Genetic...	Matthew Cook, Lauren Castellano, Joanna Mounta...	NaN
2	CS046	Intro to Programming in Java: Learning Java	Cay Horstmann	3 June 2013[32]
3	CS101	Introduction to Computer Science: Building a S...	David Evans	20 February 2012
4	CS212	Design of Computer Programs: Programming Princ...	Peter Norvig	16 April 2012
5	CS215	Algorithms: Crunching Social Networks	Michael Littman	25 June 2012
6	CS222	Making Math Matter: Differential Equations in ...	Jörn Loviscach	3 September 2012
7	CS253	Web Development: How to Build a Blog	Steve Huffman	16 April 2012
8	CS255	HTML5 Game Development: Building High Performa...	Colt McAnlis, Peter Lubbers	4 February 2013[30]
9	CS256	Mobile Web Development: Building Mobile Web Ex...	Chris Wilson, Peter Lubbers	NaN
10	CS258	Software Testing: How to Make Software Fail	John Regehr	25 June 2012
11	CS259	Software Debugging: Automating the Boring Tasks	Andreas Zeller	3 September 2012
12	CS262	Programming Languages: Building a Web Browser	Westley Weimer	16 April 2012
13	CS271	Introduction to Artificial Intelligence: AI-Class	Sebastian Thrun, Peter Norvig	28 November 2012[n 1]
14	CS291	Interactive Rendering: Introduction to 3D Comp...	Eric Haines	11 March 2013
15	CS313	Intro to Theoretical Computer Science: Dealing...	Sebastian Wernicke	1 October 2012
16	CS344	Introduction to Parallel Programming: Using CU...	John Owens, David Luebke	4 February 2013[30]
17	CS348	Functional Hardware Verification: How to Verif...	Axel Scherer, Hannes Fröhlich	12 March 2013[30]
18	CS373	Artificial Intelligence: Programming A Robotic...	Sebastian Thrun	20 February 2012
19	CS387	Applied Cryptography: Science of Secrets	David Evans	16 April 2012
20	Design101	The Design of Everyday Things	Don Norman, Kristian Simsarian	NaN
21	EP245	How to Build a Startup: The Lean LaunchPad	Steve Blank	14 September 2012
22	MA006	Visualizing Algebra: Patterns and Problems	Susan McClory, Sandra DeSousa	30 January 2013[33]
23	MA008	College Algebra: Animals, Architecture, and In...	Julie Sliva Spitzer	30 January 2013[33]
24	PH100	Intro to Physics: Landmarks in Physics	Andy Brown	25 June 2012
25	PS001	Introduction to Psychology: The Science of Tho...	Susan Snycerski, Greg Feist	3 June 2013[32]
26	ST095	Statistics: The Science of Decisions	Sean Laraway, Ronald Rogers	30 January 2013[33]
27	ST101	Intro to Statistics: Making Decisions Based on...	Sebastian Thrun	25 June 2012

	Company	Person	Sale	Wage
0	Google	Peter	200	4000
1	Google	Sam	300	3870
2	Google	Ann	275	3900
3	Oracle	Cindy	250	3500
4	Oracle	Tom	270	3600
5	Oracle	Lee	300	3870
6	Twitter	Gim	350	3650
7	Twitter	Will	220	3220
8	Twitter	Bob	250	3660

	Sale	Wage
Company
Google	258.333333	3923.333333
Oracle	273.333333	3656.666667
Twitter	273.333333	3510.000000

	Sale	Wage
Person
Ann	275	3900
Bob	250	3660
Cindy	250	3500
Gim	350	3650
Lee	300	3870
Peter	200	4000
Sam	300	3870
Tom	270	3600
Will	220	3220

	Sale								Wage
	count	mean	std	min	25%	50%	75%	max	count	mean	std	min	25%	50%	75%	max
Company
Google	3.0	258.333333	52.041650	200.0	237.5	275.0	287.5	300.0	3.0	3923.333333	68.068593	3870.0	3885.0	3900.0	3950.0	4000.0
Oracle	3.0	273.333333	25.166115	250.0	260.0	270.0	285.0	300.0	3.0	3656.666667	191.398363	3500.0	3550.0	3600.0	3735.0	3870.0
Twitter	3.0	273.333333	68.068593	220.0	235.0	250.0	300.0	350.0	3.0	3510.000000	251.197134	3220.0	3435.0	3650.0	3655.0	3660.0

	Company	Google	Oracle	Twitter
Sale	count	3.000000	3.000000	3.000000
	mean	258.333333	273.333333	273.333333
	std	52.041650	25.166115	68.068593
	min	200.000000	250.000000	220.000000
	25%	237.500000	260.000000	235.000000
	50%	275.000000	270.000000	250.000000
	75%	287.500000	285.000000	300.000000
	max	300.000000	300.000000	350.000000
Wage	count	3.000000	3.000000	3.000000
	mean	3923.333333	3656.666667	3510.000000
	std	68.068593	191.398363	251.197134
	min	3870.000000	3500.000000	3220.000000
	25%	3885.000000	3550.000000	3435.000000
	50%	3900.000000	3600.000000	3650.000000
	75%	3950.000000	3735.000000	3655.000000
	max	4000.000000	3870.000000	3660.000000

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
3	A3	B3	C3
4	A4	B4	C4
5	A5	B5	C5
6	A6	B6	C6
7	A7	B7	C7
8	A8	B8	C8

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
3	A3	B3	C3
4	A4	B4	C4
5	A5	B5	C5
6	A6	B6	C6
7	A7	B7	C7
8	A8	B8	C8

	A	B	C	A	B	C	A	B	C
0	A0	B0	C1	NaN	NaN	NaN	NaN	NaN	NaN
1	A1	B1	C2	NaN	NaN	NaN	NaN	NaN	NaN
2	A2	B2	C3	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	A3	B3	C3	NaN	NaN	NaN
4	NaN	NaN	NaN	A4	B4	C4	NaN	NaN	NaN
5	NaN	NaN	NaN	A5	B5	C5	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	A6	B6	C6
7	NaN	NaN	NaN	NaN	NaN	NaN	A7	B7	C7
8	NaN	NaN	NaN	NaN	NaN	NaN	A8	B8	C8

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
0	A3	B3	C3
1	A4	B4	C4
2	A5	B5	C5
0	A6	B6	C6
1	A7	B7	C7
2	A8	B8	C8

	Company	Person	Sale
0	Google	Peter	20.0
7	Twitter	Will	22.0
3	Oracle	Cindy	25.0
8	Twitter	Bob	25.0
4	Oracle	Tom	27.0
2	Google	Ann	27.5
1	Google	Sam	30.0
5	Oracle	Lee	30.0
6	Twitter	Gim	35.0

	key1	key2	A	B	C	D
0	k1	k4	A1	B1	C1	D1
1	k2	k5	A2	B2	C2	D2
2	k3	k6	A3	B3	C3	D3

	A	B	C	D
key1	A1	B1	C1	D1
key2	A2	B2	NaN	NaN
key3	A3	B3	C2	D2

	A	B	C	D
key1	A1	B1	C1	D1
key2	A2	B2	NaN	NaN
key3	A3	B3	C2	D2
key4	NaN	NaN	C3	D3

	A	B	C	D
0	Dog	Brown	x	1
1	Dog	Brown	y	3
2	Dog	Black	x	2
3	Goat	Brown	y	4
4	Goat	Brown	x	5
5	Goat	Black	y	3

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
3	A3	B3	C3
4	A4	B4	C4
5	A5	B5	C5
6	A6	B6	C6
7	A7	B7	C7
8	A8	B8	C8

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
3	A3	B3	C3
4	A4	B4	C4
5	A5	B5	C5
6	A6	B6	C6
7	A7	B7	C7
8	A8	B8	C8

	A	B	C	A	B	C	A	B	C
0	A0	B0	C1	NaN	NaN	NaN	NaN	NaN	NaN
1	A1	B1	C2	NaN	NaN	NaN	NaN	NaN	NaN
2	A2	B2	C3	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	A3	B3	C3	NaN	NaN	NaN
4	NaN	NaN	NaN	A4	B4	C4	NaN	NaN	NaN
5	NaN	NaN	NaN	A5	B5	C5	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	A6	B6	C6
7	NaN	NaN	NaN	NaN	NaN	NaN	A7	B7	C7
8	NaN	NaN	NaN	NaN	NaN	NaN	A8	B8	C8

	A	B	C
0	A0	B0	C1
1	A1	B1	C2
2	A2	B2	C3
0	A3	B3	C3
1	A4	B4	C4
2	A5	B5	C5
0	A6	B6	C6
1	A7	B7	C7
2	A8	B8	C8

	Company	Person	Sale
0	False	False	False
1	False	False	False
2	False	False	False
3	False	False	False
4	False	False	False
5	False	False	False
6	False	False	False
7	False	False	False
8	False	False	False