Pandas 中时间序列的处理

1. pandas.Grouper

按年/月/日汇总

# importing modules
import pandas as pd

# creating a dataframe df
df = pd.DataFrame(
	{
		"Date": [
			pd.Timestamp("2000-11-02"),
			pd.Timestamp("2000-01-02"),
			pd.Timestamp("2000-01-09"),
			pd.Timestamp("2000-03-11"),
			pd.Timestamp("2000-01-26"),
			pd.Timestamp("2000-02-16")
		],
		"ID": [1, 2, 3, 4, 5, 6],
		"Price": [140, 120, 230, 40, 100, 450]
	}
)


# applying the groupby function on df

# by month
df.groupby(pd.Grouper(key='Date', axis=0, freq='M')).sum()

# by day
df.groupby(pd.Grouper(key='Date', axis=0, freq='2D', sort=True)).sum()


# by year
df.groupby(pd.Grouper(key='Date', freq='2Y')).sum()

按时/分/秒汇总

# importing module
import pandas as pd

# create an array of 5 dates starting
# at '2015-02-24', one per minute
dates = pd.date_range('2015-02-24', periods=10, freq='T')

# creating dataframe with above array
# of dates
df = pd.DataFrame(
	{
		"Date": dates,
		"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
		"Price": [140, 120, 230, 40, 100, 450, 234, 785, 12, 42]
	}
)

# display dataframe
display(df)

# applied groupby function
# by sec.
df.groupby(pd.Grouper(key='Date', freq='30S')).sum()
# by min.
df.groupby(pd.Grouper(key='Date', freq='2T')).sum()
df.groupby(pd.Grouper(key='Date', freq='2min')).sum()
# by hour
df.groupby(pd.Grouper(key='Date', freq='1H')).sum()

2. Dataframe.resample()

构造样例数据:

d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
      'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018', periods=8, freq='W')
print(df)

输出:

   price  volume week_starting
0     10      50    2018-01-07
1     11      60    2018-01-14
2      9      40    2018-01-21
3     13     100    2018-01-28
4     14      50    2018-02-04
5     18     100    2018-02-11
6     17      40    2018-02-18
7     19      50    2018-02-25

按月汇总:

print(df.resample('M', on='week_starting').mean())

结果:

				price  volume
week_starting
2018-01-31     10.75    62.5
2018-02-28     17.00    60.0

参考:

  1. How to Group Pandas DataFrame By Date and Time ?
  2. pd.Grouper
  3. pandas.DataFrame.resample
  4. Offset aliases