Pandas 中时间序列的处理
1. pandas.Grouper
按年/月/日汇总
# importing modules
import pandas as pd
# creating a dataframe df
df = pd.DataFrame(
{
"Date": [
pd.Timestamp("2000-11-02"),
pd.Timestamp("2000-01-02"),
pd.Timestamp("2000-01-09"),
pd.Timestamp("2000-03-11"),
pd.Timestamp("2000-01-26"),
pd.Timestamp("2000-02-16")
],
"ID": [1, 2, 3, 4, 5, 6],
"Price": [140, 120, 230, 40, 100, 450]
}
)
# applying the groupby function on df
# by month
df.groupby(pd.Grouper(key='Date', axis=0, freq='M')).sum()
# by day
df.groupby(pd.Grouper(key='Date', axis=0, freq='2D', sort=True)).sum()
# by year
df.groupby(pd.Grouper(key='Date', freq='2Y')).sum()
按时/分/秒汇总
# importing module
import pandas as pd
# create an array of 5 dates starting
# at '2015-02-24', one per minute
dates = pd.date_range('2015-02-24', periods=10, freq='T')
# creating dataframe with above array
# of dates
df = pd.DataFrame(
{
"Date": dates,
"ID": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"Price": [140, 120, 230, 40, 100, 450, 234, 785, 12, 42]
}
)
# display dataframe
display(df)
# applied groupby function
# by sec.
df.groupby(pd.Grouper(key='Date', freq='30S')).sum()
# by min.
df.groupby(pd.Grouper(key='Date', freq='2T')).sum()
df.groupby(pd.Grouper(key='Date', freq='2min')).sum()
# by hour
df.groupby(pd.Grouper(key='Date', freq='1H')).sum()
2. Dataframe.resample()
构造样例数据:
d = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
df = pd.DataFrame(d)
df['week_starting'] = pd.date_range('01/01/2018', periods=8, freq='W')
print(df)
输出:
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
按月汇总:
print(df.resample('M', on='week_starting').mean())
结果:
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0
参考: