How to subtract one dataframe from subset intersection of another dataframe in pandas python?

Question

I have a the following dataframes in python:

dataframe 1

             1  2  3  4  5
dog   dog    0  1  1  0  1
      fox    1  0  0  0  0
      jumps  0  0  0  1  0
      over   1  0  1  0  1
      the    0  1  0  0  0
fox   dog    0  0  1  1  1
      fox    0  0  0  0  0
      jumps  0  0  1  0  1
      over   0  1  0  0  0
      the    0  0  0  1  1
jumps dog    0  0  0  0  0
      fox    0  1  0  1  1
      jumps  0  0  0  0  1
      over   1  0  1  0  0
      the    0  0  0  0  0
over  dog    0  0  1  0  0
      fox    0  1  0  1  1
      jumps  0  0  0  0  0
      over   0  1  0  1  0
      the    1  0  1  0  0
the   dog    0  0  1  0  0
      fox    0  0  0  0  1
      jumps  0  1  0  0  0
      over   0  0  1  1  0
      the    0  1  1  0  1

dataframe 2

             1  2  4  5
dog   dog    1  0  0  0
      fox    0  1  0  1
      jumps  0  1  1  0
      the    0  0  0  0
      horse  1  0  1  0
fox   dog    0  0  0  0
      fox    0  1  0  1
      over   0  0  0  0
      the    0  1  0  1
      cat    0  0  1  0

You can see that dataframe2 contains multiindexes of dataframe1 but it also contains additional multiindexes like horse and cat. Dataframe 2 also doesn't contain all the columns of dataframe 1 as you can see it misses column 3.

I want to subtract dataframe 2 from dataframe 1 in such a way that the function only subtracts the data which is common in both and ignores the rest and the resulting dataframe is in shape of dataframe 2.

Does any know if pandas provides a builtin way of doing this or do I need to construct a function myself. If so, can you point me in the right direction? Any suggestions are highly appreciated. Thank you.

NOTE: This question is similar to another question I posted here apart from the fact that I am not wanting to compare these, instead wanting to do an arithmetic operation of subtraction.

Can you provide a more workable example, i.e. something people could actually copy-paste-run? Otherwise, lots of people will move on to other questions. Admittedly, this can be tricky with multi-indexes. See this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — juanpa.arrivillaga, Nov 03 '17 at 22:23
@juanpa.arrivillaga, [`d1 = read_clipboard_mi()`](https://stackoverflow.com/a/45741989/5741205) ;-) — MaxU - stand with Ukraine, Nov 03 '17 at 22:31

score 4 · Answer 1 · answered Nov 03 '17 at 22:37

IIUC:

In [24]: r = d1.sub(d2, axis=0)

In [25]: r.loc[r.index.intersection(d2.index)]
Out[25]:
             1    2   3    4    5
dog dog   -1.0  1.0 NaN  0.0  1.0
    fox    1.0 -1.0 NaN  0.0 -1.0
    horse  NaN  NaN NaN  NaN  NaN
    jumps  0.0 -1.0 NaN  0.0  0.0
    the    0.0  1.0 NaN  0.0  0.0
fox cat    NaN  NaN NaN  NaN  NaN
    dog    0.0  0.0 NaN  1.0  1.0
    fox    0.0 -1.0 NaN  0.0 -1.0
    over   0.0  1.0 NaN  0.0  0.0
    the    0.0 -1.0 NaN  1.0  0.0

juanpa.arrivillaga · Accepted Answer · 2017-11-03T23:00:28.247

I believe you simply want something like:

In [23]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna()
Out[23]:
             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

Pandas already automatically aligns on the index, that's part of it's magic, but you just have to fill/drop nans intelligently.

Edit

Whoops, you actually want df1 - df2, but with the shape of df2, a little bit more tricky since then fillna(df1) would prevent us from dropping the right rows, however, you can just use multiply by -1!

In [25]: (df2 - df1.drop('3', axis=1)).fillna(df2).dropna() * -1
Out[25]:
             1    2    4    5
dog dog   -1.0  1.0 -0.0  1.0
    fox    1.0 -1.0 -0.0 -1.0
    horse -1.0 -0.0 -1.0 -0.0
    jumps -0.0 -1.0 -0.0 -0.0
    the   -0.0  1.0 -0.0 -0.0
fox cat   -0.0 -0.0 -1.0 -0.0
    dog   -0.0 -0.0  1.0  1.0
    fox   -0.0 -1.0 -0.0 -1.0
    over  -0.0  1.0 -0.0 -0.0
    the   -0.0 -1.0  1.0 -0.0

Or, if those negative zeros bother you:

In [31]: (-df2 + df1.drop('3', axis=1)).fillna(-df2).dropna()
Out[31]:
             1    2    4    5
dog dog   -1.0  1.0  0.0  1.0
    fox    1.0 -1.0  0.0 -1.0
    horse -1.0  0.0 -1.0  0.0
    jumps  0.0 -1.0  0.0  0.0
    the    0.0  1.0  0.0  0.0
fox cat    0.0  0.0 -1.0  0.0
    dog    0.0  0.0  1.0  1.0
    fox    0.0 -1.0  0.0 -1.0
    over   0.0  1.0  0.0  0.0
    the    0.0 -1.0  1.0  0.0

@juanpa.arrivillaga it is not necessary that the colum '3' will be the one that's missing since df2 is variable. Any idea to get around that? — sshussain270, Nov 03 '17 at 23:36
@Elisha512 you can do somthing like `missing = df1.columns.difference(df2.columns)` and pass that. — juanpa.arrivillaga, Nov 03 '17 at 23:48
@juanpa.arrivillaga and (-df2 + df1.drop(missing, axis=1)).fillna(-df2).dropna() Correct? — sshussain270, Nov 03 '17 at 23:53

score 3 · Answer 3 · answered Nov 03 '17 at 22:49

Let us do some thing like

id=df2.index.values.tolist()
dd=df1.loc[list(set(df1.index.values.tolist())&set(id))]
(df2-dd).combine_first(df2).dropna(1)

             1    2    4    5
dog dog    1.0 -1.0  0.0 -1.0
    fox   -1.0  1.0  0.0  1.0
    horse  1.0  0.0  1.0  0.0
    jumps  0.0  1.0  0.0  0.0
    the    0.0 -1.0  0.0  0.0
fox cat    0.0  0.0  1.0  0.0
    dog    0.0  0.0 -1.0 -1.0
    fox    0.0  1.0  0.0  1.0
    over   0.0 -1.0  0.0  0.0
    the    0.0  1.0 -1.0  0.0

piRSquared · Answer 4 · 2017-11-04T15:05:55.083

3

Use pd.DataFrame.align with the parameter 'inner' to reduce both dataframes to only the common indices. Then pass results to pd.DataFrame.sub

pd.DataFrame.sub(*df1.align(df2, 'inner'))

           1  2  4  5
dog dog   -1  1  0  1
    fox    1 -1  0 -1
    jumps  0 -1  0  0
    the    0  1  0  0
fox dog    0  0  1  1
    fox    0 -1  0 -1
    over   0  1  0  0
    the    0 -1  1  0

Written in two lines

a, b = df1.align(df2, 'inner')
a - b

edited Nov 04 '17 at 15:05

answered Nov 04 '17 at 07:44

piRSquared

285,575
57
475
624

Nice! I’ve never seen “align” method before – MaxU - stand with Ukraine Nov 04 '17 at 08:24
1

I credit @Psidom for showing me. It’s quite nice. You can specify axes, fill_value, and inner or outer. Not sure if it does left or right. – piRSquared Nov 04 '17 at 14:57

How to subtract one dataframe from subset intersection of another dataframe in pandas python?

dataframe 1

dataframe 2

4 Answers4

Edit