Upsample in pandas multi-index

Question

I am trying to upsample within a grouped DataFrame but am unsure how to get it to only upsample within the groups. I have a DataFrame that looks like:

cat      weekstart                  date      
0.0      2016-07-04 00:00:00+00:00  2016-07-04    1
                                    2016-07-06    1
                                    2016-07-07    2
         2016-08-15 00:00:00+00:00  2016-08-16    1
                                    2016-08-19    1
         2016-09-19 00:00:00+00:00  2016-09-20    1
                                    2016-09-21    1
         2016-12-19 00:00:00+00:00  2016-12-19    1
                                    2016-12-21    1

1.0      2016-07-25 00:00:00+00:00  2016-07-26    2
         2016-08-01 00:00:00+00:00  2016-08-03    1
         2016-08-08 00:00:00+00:00  2016-08-12    1

If I do something like df.unstack().fillna(0).stack() leads to:

cat      weekstart                  date      
0.0      2016-07-04 00:00:00+00:00  2016-1-1      0 
                                           .
                                           .
                                           .
                                    2016-07-04    1
                                    2016-07-06    1
                                    2016-07-07    2

because the minimum in the date column is 2016-1-1. What i'm after though is only sampling business days within each 'cat' and 'weekstart', like:

 cat      weekstart                  date      
 0.0      2016-07-04 00:00:00+00:00  2016-07-04    1
                                     2016-07-05    0 
                                     2016-07-06    1
                                     2016-07-07    2
                                     2016-07-8     0
          2016-08-15 00:00:00+00:00  2016-08-15    0
                                     2016-08-16    1
                                     2016-08-17    0
                                     2016-08-18    0
                                    2016-08-19    1

I've tried using:

 level_values = df.index.get_level_values
 df.groupby(
            [level_values(i) for i in [0, 1]] + [pd.Grouper('B', level=-1)]
            )
    .sum()

but it isn't working as expected.

jezrael · Accepted Answer · 2018-02-07T15:38:12.710

I think you need custom function with reindex by MultiIndex created by bdate_range:

def f(x):
    lvl0 = x.index.get_level_values(0)[0]
    lvl1 = x.index.get_level_values(1)[0]
    lvl2 = pd.bdate_range(start=lvl1, periods=5)
    mux = pd.MultiIndex.from_product([[lvl0], [lvl1], lvl2], names=x.index.names)
    return (x.reindex(mux, fill_value=0))

s1 = s.groupby(['cat','weekstart'], group_keys=False).apply(f)

print (s1)

cat  weekstart   date      
0.0  2016-07-04  2016-07-04    1
                 2016-07-05    0
                 2016-07-06    1
                 2016-07-07    2
                 2016-07-08    0
     2016-08-15  2016-08-15    0
                 2016-08-16    1
                 2016-08-17    0
                 2016-08-18    0
                 2016-08-19    1
     2016-09-19  2016-09-19    0
                 2016-09-20    1
                 2016-09-21    1
                 2016-09-22    0
                 2016-09-23    0
     2016-12-19  2016-12-19    1
                 2016-12-20    0
                 2016-12-21    1
                 2016-12-22    0
                 2016-12-23    0
1.0  2016-07-25  2016-07-25    0
                 2016-07-26    2
                 2016-07-27    0
                 2016-07-28    0
                 2016-07-29    0
     2016-08-01  2016-08-01    0
                 2016-08-02    0
                 2016-08-03    1
                 2016-08-04    0
                 2016-08-05    0
     2016-08-08  2016-08-08    0
                 2016-08-09    0
                 2016-08-10    0
                 2016-08-11    0
                 2016-08-12    1
Name: a, dtype: int64

Setup:

d = {(0.0, pd.Timestamp('2016-07-04 00:00:00'), pd.Timestamp('2016-07-07 00:00:00')): 2, (1.0, pd.Timestamp('2016-07-25 00:00:00'), pd.Timestamp('2016-07-26 00:00:00')): 2, (0.0, pd.Timestamp('2016-08-15 00:00:00'), pd.Timestamp('2016-08-16 00:00:00')): 1, (0.0, pd.Timestamp('2016-07-04 00:00:00'), pd.Timestamp('2016-07-04 00:00:00')): 1, (0.0, pd.Timestamp('2016-09-19 00:00:00'), pd.Timestamp('2016-09-20 00:00:00')): 1, (0.0, pd.Timestamp('2016-09-19 00:00:00'), pd.Timestamp('2016-09-21 00:00:00')): 1, (0.0, pd.Timestamp('2016-12-19 00:00:00'), pd.Timestamp('2016-12-19 00:00:00')): 1, (1.0, pd.Timestamp('2016-08-08 00:00:00'), pd.Timestamp('2016-08-12 00:00:00')): 1, (0.0, pd.Timestamp('2016-07-04 00:00:00'), pd.Timestamp('2016-07-06 00:00:00')): 1, (1.0, pd.Timestamp('2016-08-01 00:00:00'), pd.Timestamp('2016-08-03 00:00:00')): 1, (0.0, pd.Timestamp('2016-12-19 00:00:00'), pd.Timestamp('2016-12-21 00:00:00')): 1, (0.0, pd.Timestamp('2016-08-15 00:00:00'), pd.Timestamp('2016-08-19 00:00:00')): 1}
s = pd.Series(d).rename_axis(['cat','weekstart','date'])    
print (s)
cat  weekstart   date      
0.0  2016-07-04  2016-07-04    1
                 2016-07-06    1
                 2016-07-07    2
     2016-08-15  2016-08-16    1
                 2016-08-19    1
     2016-09-19  2016-09-20    1
                 2016-09-21    1
     2016-12-19  2016-12-19    1
                 2016-12-21    1
1.0  2016-07-25  2016-07-26    2
     2016-08-01  2016-08-03    1
     2016-08-08  2016-08-12    1
dtype: int64

did you use `pandas.read_clipboard()` to replicate OP's orignial dataframe or you have to manually create it ? I was struggling to just use `read_clipboard`. thanks. — stucash, Feb 07 '18 at 14:55
no worries thanks for letting me know so I don't need to scratch my head anymore :> — stucash, Feb 07 '18 at 14:58
But maxu create nice solution, but I dont test it - [link](https://stackoverflow.com/a/45741989/2901002) — jezrael, Feb 07 '18 at 14:59
did a quick one with OP's dataframe by saving MaxU's version locally and run as `read_clipboard_mi()`, `read_clipboard_mi(index_names_row=1)`,and `read_clipboard_mi(index_names_row=0)`, they didn't work for me, but it's great work we could improve on :) — stucash, Feb 07 '18 at 15:07
I never knew about read clipboard, already useful for the comments :-) . When I try `df.groupby(['cat','weekstart'], group_keys=False).apply(f)` , it throws an IndexError at `lvl1 = x.index.get_level_values(1)[0]` complaining that it only has one level? — Blazina, Feb 07 '18 at 15:14
@jezrael thanks! I think you would need to use: `s.groupby(level=[0, 1], group_keys=False).apply(f)` rather than `s.groupby(['cat', 'date'], group_keys=False).apply(f)`? Just for my own interest, why is the group_keys=False required? — Blazina, Feb 07 '18 at 15:41
I think you use not last version of pandas. If remove `group_keys=False` then get terrible 5 levels MultiIndex, it remove 2 levels. — jezrael, Feb 07 '18 at 15:42

Upsample in pandas multi-index

1 Answers1

Linked