In the process of answering this question for myself, I learned many things, and I wanted to put together a catalog of examples and some explanation.

The specific answer to the point of the `levels`

argument will come towards the end.

`pandas.concat`

: The Missing Manual

Link To Current Documentation

## Imports and defining objects

```
import pandas as pd
d1 = pd.DataFrame(dict(A=.1, B=.2, C=.3), index=[2, 3])
d2 = pd.DataFrame(dict(B=.4, C=.5, D=.6), index=[1, 2])
d3 = pd.DataFrame(dict(A=.7, B=.8, D=.9), index=[1, 3])
s1 = pd.Series([1, 2], index=[2, 3])
s2 = pd.Series([3, 4], index=[1, 2])
s3 = pd.Series([5, 6], index=[1, 3])
```

# Arguments

`objs`

The first argument we come across is `objs`

:

**objs**: a sequence or mapping of Series, DataFrame, or Panel objects
If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below). Any None objects will be dropped silently unless they are all None in which case a ValueError will be raised

- We typically see this used with a list of
`Series`

or `DataFrame`

objects.
- I'll show that
`dict`

can be very useful as well.
- Generators may also be used and can be useful when using
`map`

as in `map(f, list_of_df)`

For now, we'll stick with a list of some of the `DataFrame`

and `Series`

objects defined above.
I'll show how dictionaries can be leveraged to give very useful `MultiIndex`

results later.

```
pd.concat([d1, d2])
A B C D
2 0.1 0.2 0.3 NaN
3 0.1 0.2 0.3 NaN
1 NaN 0.4 0.5 0.6
2 NaN 0.4 0.5 0.6
```

`axis`

The second argument we encounter is `axis`

whose default value is `0`

:

**axis**: {0/’index’, 1/’columns’}, default 0
The axis to concatenate along.

### Two `DataFrame`

s with `axis=0`

(stacked)

For values of `0`

or `index`

we mean to say: "Align along the columns and add to the index".

As shown above where we used `axis=0`

, because `0`

is the default value, and we see that the index of `d2`

extends the index of `d1`

despite there being overlap of the value `2`

:

```
pd.concat([d1, d2], axis=0)
A B C D
2 0.1 0.2 0.3 NaN
3 0.1 0.2 0.3 NaN
1 NaN 0.4 0.5 0.6
2 NaN 0.4 0.5 0.6
```

### Two `DataFrame`

s with `axis=1`

(side by side)

For values `1`

or `columns`

we mean to say: "Align along the index and add to the columns",

```
pd.concat([d1, d2], axis=1)
A B C B C D
1 NaN NaN NaN 0.4 0.5 0.6
2 0.1 0.2 0.3 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN NaN NaN
```

We can see that the resulting index is the union of indices and the resulting columns are the extension of columns from `d1`

by the columns of `d2`

.

### Two (or Three) `Series`

with `axis=0`

(stacked)

When combining `pandas.Series`

along `axis=0`

, we get back a `pandas.Series`

. The name of the resulting `Series`

will be `None`

unless all `Series`

being combined have the same name. Pay attention to the `'Name: A'`

when we print out the resulting `Series`

. When it isn't present, we can assume the `Series`

name is `None`

.

```
| | | pd.concat(
| pd.concat( | pd.concat( | [s1.rename('A'),
pd.concat( | [s1.rename('A'), | [s1.rename('A'), | s2.rename('B'),
[s1, s2]) | s2]) | s2.rename('A')]) | s3.rename('A')])
-------------- | --------------------- | ---------------------- | ----------------------
2 1 | 2 1 | 2 1 | 2 1
3 2 | 3 2 | 3 2 | 3 2
1 3 | 1 3 | 1 3 | 1 3
2 4 | 2 4 | 2 4 | 2 4
dtype: int64 | dtype: int64 | Name: A, dtype: int64 | 1 5
| | | 3 6
| | | dtype: int64
```

### Two (or Three) `Series`

with `axis=1`

(side by side)

When combining `pandas.Series`

along `axis=1`

, it is the `name`

attribute that we refer to in order to infer a column name in the resulting `pandas.DataFrame`

.

```
| | pd.concat(
| pd.concat( | [s1.rename('X'),
pd.concat( | [s1.rename('X'), | s2.rename('Y'),
[s1, s2], axis=1) | s2], axis=1) | s3.rename('Z')], axis=1)
---------------------- | --------------------- | ------------------------------
0 1 | X 0 | X Y Z
1 NaN 3.0 | 1 NaN 3.0 | 1 NaN 3.0 5.0
2 1.0 4.0 | 2 1.0 4.0 | 2 1.0 4.0 NaN
3 2.0 NaN | 3 2.0 NaN | 3 2.0 NaN 6.0
```

### Mixed `Series`

and `DataFrame`

with `axis=0`

(stacked)

When performing a concatenation of a `Series`

and `DataFrame`

along `axis=0`

, we convert all `Series`

to single column `DataFrame`

s.

Take special note that this is a concatenation along `axis=0`

; that means extending the index (rows) while aligning the columns. In the examples below, we see the index becomes `[2, 3, 2, 3]`

which is an indiscriminate appending of indices. The columns do not overlap unless I force the naming of the `Series`

column with the argument to `to_frame`

:

```
pd.concat( |
[s1.to_frame(), d1]) | pd.concat([s1, d1])
------------------------- | ---------------------
0 A B C | 0 A B C
2 1.0 NaN NaN NaN | 2 1.0 NaN NaN NaN
3 2.0 NaN NaN NaN | 3 2.0 NaN NaN NaN
2 NaN 0.1 0.2 0.3 | 2 NaN 0.1 0.2 0.3
3 NaN 0.1 0.2 0.3 | 3 NaN 0.1 0.2 0.3
```

You can see the results of `pd.concat([s1, d1])`

are the same as if I had perfromed the `to_frame`

myself.

However, I can control the name of the resulting column with a parameter to `to_frame`

. Renaming the `Series`

with the `rename`

method does **not** control the column name in the resulting `DataFrame`

.

```
# Effectively renames | |
# `s1` but does not align | # Does not rename. So | # Renames to something
# with columns in `d1` | # Pandas defaults to `0` | # that does align with `d1`
pd.concat( | pd.concat( | pd.concat(
[s1.to_frame('X'), d1]) | [s1.rename('X'), d1]) | [s1.to_frame('B'), d1])
---------------------------- | -------------------------- | ----------------------------
A B C X | 0 A B C | A B C
2 NaN NaN NaN 1.0 | 2 1.0 NaN NaN NaN | 2 NaN 1.0 NaN
3 NaN NaN NaN 2.0 | 3 2.0 NaN NaN NaN | 3 NaN 2.0 NaN
2 0.1 0.2 0.3 NaN | 2 NaN 0.1 0.2 0.3 | 2 0.1 0.2 0.3
3 0.1 0.2 0.3 NaN | 3 NaN 0.1 0.2 0.3 | 3 0.1 0.2 0.3
```

### Mixed `Series`

and `DataFrame`

with `axis=1`

(side by side)

This is fairly intuitive. `Series`

column name defaults to an enumeration of such `Series`

objects when a `name`

attribute is not available.

```
| pd.concat(
pd.concat( | [s1.rename('X'),
[s1, d1], | s2, s3, d1],
axis=1) | axis=1)
------------------- | -------------------------------
0 A B C | X 0 1 A B C
2 1 0.1 0.2 0.3 | 1 NaN 3.0 5.0 NaN NaN NaN
3 2 0.1 0.2 0.3 | 2 1.0 4.0 NaN 0.1 0.2 0.3
| 3 2.0 NaN 6.0 0.1 0.2 0.3
```

`join`

The third argument is `join`

that describes whether the resulting merge should be an outer merge (default) or an inner merge.

**join**: {‘inner’, ‘outer’}, default ‘outer’

How to handle indexes on other axis(es).

It turns out, there is no `left`

or `right`

option as `pd.concat`

can handle more than just two objects to merge.

In the case of `d1`

and `d2`

, the options look like:

`outer`

```
pd.concat([d1, d2], axis=1, join='outer')
A B C B C D
1 NaN NaN NaN 0.4 0.5 0.6
2 0.1 0.2 0.3 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN NaN NaN
```

`inner`

```
pd.concat([d1, d2], axis=1, join='inner')
A B C B C D
2 0.1 0.2 0.3 0.4 0.5 0.6
```

`join_axes`

Fourth argument is the thing that allows us to do our `left`

merge and more.

**join_axes**: list of Index objects

Specific indexes to use for the other n - 1 axes instead of performing inner/outer set logic.

### Left Merge

```
pd.concat([d1, d2, d3], axis=1, join_axes=[d1.index])
A B C B C D A B D
2 0.1 0.2 0.3 0.4 0.5 0.6 NaN NaN NaN
3 0.1 0.2 0.3 NaN NaN NaN 0.7 0.8 0.9
```

### Right Merge

```
pd.concat([d1, d2, d3], axis=1, join_axes=[d3.index])
A B C B C D A B D
1 NaN NaN NaN 0.4 0.5 0.6 0.7 0.8 0.9
3 0.1 0.2 0.3 NaN NaN NaN 0.7 0.8 0.9
```

`ignore_index`

**ignore_index**: boolean, default False

If True, do not use the index values along the concatenation axis. The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information. Note the index values on the other axes are still respected in the join.

Like when I stack `d1`

on top of `d2`

, if I don't care about the index values, I could reset them or ignore them.

```
| pd.concat( | pd.concat(
| [d1, d2], | [d1, d2]
pd.concat([d1, d2]) | ignore_index=True) | ).reset_index(drop=True)
--------------------- | ----------------------- | -------------------------
A B C D | A B C D | A B C D
2 0.1 0.2 0.3 NaN | 0 0.1 0.2 0.3 NaN | 0 0.1 0.2 0.3 NaN
3 0.1 0.2 0.3 NaN | 1 0.1 0.2 0.3 NaN | 1 0.1 0.2 0.3 NaN
1 NaN 0.4 0.5 0.6 | 2 NaN 0.4 0.5 0.6 | 2 NaN 0.4 0.5 0.6
2 NaN 0.4 0.5 0.6 | 3 NaN 0.4 0.5 0.6 | 3 NaN 0.4 0.5 0.6
```

And when using `axis=1`

:

```
| pd.concat(
| [d1, d2], axis=1,
pd.concat([d1, d2], axis=1) | ignore_index=True)
------------------------------- | -------------------------------
A B C B C D | 0 1 2 3 4 5
1 NaN NaN NaN 0.4 0.5 0.6 | 1 NaN NaN NaN 0.4 0.5 0.6
2 0.1 0.2 0.3 0.4 0.5 0.6 | 2 0.1 0.2 0.3 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN NaN NaN | 3 0.1 0.2 0.3 NaN NaN NaN
```

`keys`

We can pass a list of scalar values or tuples in order to assign tuple or scalar values to corresponding MultiIndex. The length of the passed list must be the same length as the number of items being concatenated.

**keys**: sequence, default None

If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level

`axis=0`

When concatenating `Series`

objects along `axis=0`

(extending the index).

Those keys, become a new initial level of a `MultiIndex`

object in the index attribute.

```
# length 3 length 3 # length 2 length 2
# /--------\ /-----------\ # /----\ /------\
pd.concat([s1, s2, s3], keys=['A', 'B', 'C']) pd.concat([s1, s2], keys=['A', 'B'])
---------------------------------------------- -------------------------------------
A 2 1 A 2 1
3 2 3 2
B 1 3 B 1 3
2 4 2 4
C 1 5 dtype: int64
3 6
dtype: int64
```

However, we can use more than scalar values in the `keys`

argument to create an even deeper `MultiIndex`

. Here we pass `tuples`

of length 2 the prepend two new levels of a `MultiIndex`

:

```
pd.concat(
[s1, s2, s3],
keys=[('A', 'X'), ('A', 'Y'), ('B', 'X')])
-----------------------------------------------
A X 2 1
3 2
Y 1 3
2 4
B X 1 5
3 6
dtype: int64
```

`axis=1`

It's a bit different when extending along columns. When we used `axis=0`

(see above) our `keys`

acted as `MultiIndex`

levels in addition to the existing index. For `axis=1`

, we are referring to an axis that `Series`

objects don't have, namely the `columns`

attribute.

Variations of Two

`Series`

wtih

`axis=1`

Notice that naming the `s1`

and `s2`

matters so long as no `keys`

are passed, but it gets overridden if `keys`

are passed.

```
| | | pd.concat(
| pd.concat( | pd.concat( | [s1.rename('U'),
pd.concat( | [s1, s2], | [s1.rename('U'), | s2.rename('V')],
[s1, s2], | axis=1, | s2.rename('V')], | axis=1,
axis=1) | keys=['X', 'Y']) | axis=1) | keys=['X', 'Y'])
-------------- | --------------------- | ---------------------- | ----------------------
0 1 | X Y | U V | X Y
1 NaN 3.0 | 1 NaN 3.0 | 1 NaN 3.0 | 1 NaN 3.0
2 1.0 4.0 | 2 1.0 4.0 | 2 1.0 4.0 | 2 1.0 4.0
3 2.0 NaN | 3 2.0 NaN | 3 2.0 NaN | 3 2.0 NaN
```

`MultiIndex`

with

`Series`

and

`axis=1`

```
pd.concat(
[s1, s2],
axis=1,
keys=[('W', 'X'), ('W', 'Y')])
-----------------------------------
W
X Y
1 NaN 3.0
2 1.0 4.0
3 2.0 NaN
```

Two

`DataFrame`

with

`axis=1`

As with the `axis=0`

examples, `keys`

add levels to a `MultiIndex`

, but this time to the object stored in the `columns`

attribute.

```
pd.concat( | pd.concat(
[d1, d2], | [d1, d2],
axis=1, | axis=1,
keys=['X', 'Y']) | keys=[('First', 'X'), ('Second', 'X')])
------------------------------- | --------------------------------------------
X Y | First Second
A B C B C D | X X
1 NaN NaN NaN 0.4 0.5 0.6 | A B C B C D
2 0.1 0.2 0.3 0.4 0.5 0.6 | 1 NaN NaN NaN 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN NaN NaN | 2 0.1 0.2 0.3 0.4 0.5 0.6
| 3 0.1 0.2 0.3 NaN NaN NaN
```

`Series`

and

`DataFrame`

with

`axis=1`

This is tricky. In this case, a scalar key value cannot act as the only level of index for the `Series`

object when it becomes a column while also acting as the first level of a `MultiIndex`

for the `DataFrame`

. So Pandas will again use the `name`

attribute of the `Series`

object as the source of the column name.

```
pd.concat( | pd.concat(
[s1, d1], | [s1.rename('Z'), d1],
axis=1, | axis=1,
keys=['X', 'Y']) | keys=['X', 'Y'])
--------------------- | --------------------------
X Y | X Y
0 A B C | Z A B C
2 1 0.1 0.2 0.3 | 2 1 0.1 0.2 0.3
3 2 0.1 0.2 0.3 | 3 2 0.1 0.2 0.3
```

Limitations of

`keys`

and

`MultiIndex`

inferrence.

Pandas only seems to infer column names from `Series`

name, but it will not fill in the blanks when doing an analogous concatenation among data frames with a different number of column levels.

```
d1_ = pd.concat(
[d1], axis=1,
keys=['One'])
d1_
One
A B C
2 0.1 0.2 0.3
3 0.1 0.2 0.3
```

Then concatenate this with another data frame with only one level in the columns object and Pandas will refuse to try and make tuples of the `MultiIndex`

object and combine all data frames as if a single level of objects, scalars and tuples.

```
pd.concat([d1_, d2], axis=1)
(One, A) (One, B) (One, C) B C D
1 NaN NaN NaN 0.4 0.5 0.6
2 0.1 0.2 0.3 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN NaN NaN
```

### Passing a `dict`

instead of a `list`

When passing a dictionary, `pandas.concat`

will use the keys from the dictionary as the `keys`

parameter.

```
# axis=0 | # axis=1
pd.concat( | pd.concat(
{0: d1, 1: d2}) | {0: d1, 1: d2}, axis=1)
----------------------- | -------------------------------
A B C D | 0 1
0 2 0.1 0.2 0.3 NaN | A B C B C D
3 0.1 0.2 0.3 NaN | 1 NaN NaN NaN 0.4 0.5 0.6
1 1 NaN 0.4 0.5 0.6 | 2 0.1 0.2 0.3 0.4 0.5 0.6
2 NaN 0.4 0.5 0.6 | 3 0.1 0.2 0.3 NaN NaN NaN
```

`levels`

This is used in conjunction with the `keys`

argument.When `levels`

is left as its default value of `None`

, Pandas will take the unique values of each level of the resulting `MultiIndex`

and use that as the object used in the resulting `index.levels`

attribute.

**levels**: list of sequences, default None

Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.

If Pandas already infers what these levels should be, what advantage is there to specify it ourselves? I'll show one example and leave it up to you to think up other reasons why this might be useful.

### Example

Per the documentation, the `levels`

argument is a list of sequences. This means that we can use another `pandas.Index`

as one of those sequences.

Consider the data frame `df`

that is the concatenation of `d1`

, `d2`

and `d3`

:

```
df = pd.concat(
[d1, d2, d3], axis=1,
keys=['First', 'Second', 'Fourth'])
df
First Second Fourth
A B C B C D A B D
1 NaN NaN NaN 0.4 0.5 0.6 0.7 0.8 0.9
2 0.1 0.2 0.3 0.4 0.5 0.6 NaN NaN NaN
3 0.1 0.2 0.3 NaN NaN NaN 0.7 0.8 0.9
```

The levels of the columns object are:

```
print(df, *df.columns.levels, sep='\n')
Index(['First', 'Second', 'Fourth'], dtype='object')
Index(['A', 'B', 'C', 'D'], dtype='object')
```

If we use `sum`

within a `groupby`

we get:

```
df.groupby(axis=1, level=0).sum()
First Fourth Second
1 0.0 2.4 1.5
2 0.6 0.0 1.5
3 0.6 2.4 0.0
```

But what if instead of `['First', 'Second', 'Fourth']`

there were another missing categories named `Third`

and `Fifth`

? And I wanted them included in the results of a `groupby`

aggregation? We can do this if we had a `pandas.CategoricalIndex`

. And we can specify that ahead of time with the `levels`

argument.

So instead, let's define `df`

as:

```
cats = ['First', 'Second', 'Third', 'Fourth', 'Fifth']
lvl = pd.CategoricalIndex(cats, categories=cats, ordered=True)
df = pd.concat(
[d1, d2, d3], axis=1,
keys=['First', 'Second', 'Fourth'],
levels=[lvl]
)
df
First Fourth Second
1 0.0 2.4 1.5
2 0.6 0.0 1.5
3 0.6 2.4 0.0
```

But the first level of the columns object is:

```
df.columns.levels[0]
CategoricalIndex(
['First', 'Second', 'Third', 'Fourth', 'Fifth'],
categories=['First', 'Second', 'Third', 'Fourth', 'Fifth'],
ordered=True, dtype='category')
```

And our `groupby`

summation looks like:

```
df.groupby(axis=1, level=0).sum()
First Second Third Fourth Fifth
1 0.0 1.5 0.0 2.4 0.0
2 0.6 1.5 0.0 0.0 0.0
3 0.6 0.0 0.0 2.4 0.0
```

`names`

This is used to name the levels of a resulting `MultiIndex`

. The length of the `names`

list should match the number of levels in the resulting `MultiIndex`

.

**names**: list, default None

Names for the levels in the resulting hierarchical index

```
# axis=0 | # axis=1
pd.concat( | pd.concat(
[d1, d2], | [d1, d2],
keys=[0, 1], | axis=1, keys=[0, 1],
names=['lvl0', 'lvl1']) | names=['lvl0', 'lvl1'])
----------------------------- | ----------------------------------
A B C D | lvl0 0 1
lvl0 lvl1 | lvl1 A B C B C D
0 2 0.1 0.2 0.3 NaN | 1 NaN NaN NaN 0.4 0.5 0.6
3 0.1 0.2 0.3 NaN | 2 0.1 0.2 0.3 0.4 0.5 0.6
1 1 NaN 0.4 0.5 0.6 | 3 0.1 0.2 0.3 NaN NaN NaN
2 NaN 0.4 0.5 0.6 |
```

`verify_integrity`

Self explanatory documentation

**verify_integrity**: boolean, default False

Check whether the new concatenated axis contains duplicates. This can be very expensive relative to the actual data concatenation.

Because the resulting index from concatenating `d1`

and `d2`

is not unique, it would fail the integrity check.

```
pd.concat([d1, d2])
A B C D
2 0.1 0.2 0.3 NaN
3 0.1 0.2 0.3 NaN
1 NaN 0.4 0.5 0.6
2 NaN 0.4 0.5 0.6
```

And

```
pd.concat([d1, d2], verify_integrity=True)
```

>
ValueError: Indexes have overlapping values: [2]