1829

How do I get the number of rows of a pandas dataframe df?

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
yemu
  • 26,249
  • 10
  • 32
  • 29
  • 21
    ok I found out, i should have called method not check property, so it should be df.count() no df.count – yemu Apr 11 '13 at 08:15
  • 99
    ^ Dangerous! Beware that `df.count()` will only return the count of non-NA/NaN rows for each column. You should use `df.shape[0]` instead, which will always correctly tell you the number of rows. – smci Apr 18 '14 at 12:04
  • 7
    Note that df.count will not return an int when the dataframe is empty (e.g., pd.DataFrame(columns=["Blue","Red").count is not 0) – Marcelo Bielsa Sep 01 '15 at 03:32
  • 2
    could use df.info() so you get row count (# entries), number of non-null entries in each column, dtypes and memory usage. Good complete picture of the df. If you're looking for a number you can use programatically then df.shape[0]. – MikeB2019x May 04 '22 at 20:06

19 Answers19

2715

For a dataframe df, one can use any of the following:

Performance plot


Code to reproduce the plot:

import numpy as np
import pandas as pd
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
    n_range=[2**k for k in range(25)],
    kernels=[
        lambda df: len(df.index),
        lambda df: df.shape[0],
        lambda df: df[df.columns[0]].count(),
    ],
    labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
    xlabel="Number of rows",
)
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
root
  • 76,608
  • 25
  • 108
  • 120
  • 29
    There's one good reason why to use `shape` in interactive work, instead of len(df): Trying out different filtering, I often need to know how many items remain. With shape I can see that just by adding .shape after my filtering. With len() the editing of the command-line becomes much more cumbersome, going back and forth. – K.-Michael Aye Feb 25 '14 at 04:51
  • 12
    Won't work for OP, but if you just need to know whether the dataframe is empty, `df.empty` is the best option. – jtschoonhoven Mar 16 '16 at 21:26
  • 22
    I know it's been a while, but isn't len(df.index) takes 381 nanoseconds, or 0.381 microseconds, df.shape is 3 times slower, taking 1.17 microseconds. did I miss something? @root – T.G. May 22 '17 at 18:34
  • From my testing df.shape[0] and len(df.index) gave the same performance. df.shape was ever so slightly quicker. – Reddspark Jun 25 '17 at 15:20
  • 14
    (3,3) matrix is bad example as it does not show the order of the shape tuple – xaedes Aug 15 '17 at 16:42
  • Don't forget to answer the actual question; the answer is `df.shape[0]` not `df.shape` which gives a tuple, and as xaedes said, best to pick an example where nrows != ncols – smci Feb 16 '18 at 01:24
  • 9
    How is `df.shape[0]` faster than `len(df)` or `len(df.columns)`? Since **1 ns** (nanosecond) = **1000 µs** (microsecond), therefore 1.17µs = 1170ns, which means it's roughly 3 times slower than 381ns – itsjef Mar 24 '18 at 03:19
  • 5
    @itsjef You have the conversion backward: 1 **μs** = 1000 **ns**. But your point is correct, `len(df.index)` is actually faster. – jared Apr 24 '18 at 19:23
  • 3
    Updated answer to reflect fact that `len(df.index)` is the fastest method. – halloleo Sep 18 '18 at 05:31
  • 2
    looks like len(df) is the fastest. – Decula Jan 22 '19 at 18:59
  • 4
    And what's the difference between `len(df)` and `len(df.index)`? Why type extra for `df.index`? – NoName Feb 03 '20 at 23:06
  • Suspected [`pandas.Index.size`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.size.html) would actually be faster than `len(df.index)` but `timeit` on my computer tells me otherwise (~150 ns slower per loop). – gosuto Feb 24 '20 at 15:08
  • 3
    For the record: `len(df)` is slower than `len(df.index)` (@Decula, @NoName). – gosuto Feb 24 '20 at 15:11
  • 5
    I have tested on Python3, @halloleo is right, len(df.index) is about twice as fast as df.shape[0]. On the other hand, len(df) sometimes returns the number of columns rather than rows, depending on the dataframe's current internal format. – xuancong84 Jul 15 '20 at 05:14
  • @MateenUlhaq Can you explain what "slowest, but avoids counting NaN values in the first column" from your edit mean? Would be helpful to also include in the answer when can one expect the difference compared to other methods (with example). – Karol Zlot Oct 06 '21 at 06:36
  • 2
    @KarolZlot Improved the wording. `df[df.columns[0]]` returns the 0th column. `.count()` measures the number of non-NaN values in the given column. The first two methods are what I would usually use -- the third is only useful in rare cases. – Mateen Ulhaq Oct 06 '21 at 23:55
  • Does len(df.index) work correctly even df has mixed indexes? – justRandomLearner Aug 08 '22 at 10:13
  • 4
    Very complicated answer which does not even state `len(df)` – run_the_race Aug 20 '22 at 20:06
  • My edits were rejected but len(df) should be included and the preferred answer. That and shape seem to be the only ones needed in a live interactive session. All of them finish in under a second at reasonable data sizes so outside of high performance production code there's no point complicating things. I don't think it's reasonable to expect people to interpret "10^-2 seconds" in a timing graph. – Subatomic Tripod Feb 22 '23 at 18:30
479

Suppose df is your dataframe then:

count_row = df.shape[0]  # Gives number of rows
count_col = df.shape[1]  # Gives number of columns

Or, more succinctly,

r, c = df.shape
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nasir Shah
  • 5,033
  • 1
  • 11
  • 11
  • 20
    If the data set is large, len (df.index) is significantly faster than df.shape[0] if you need only row count. I tested it. – Sumit Pokhrel Jan 02 '20 at 14:47
  • 2
    Why i do not have shape method on my DataFrame? – Ardalan Shahgholi Oct 06 '20 at 20:00
  • 2
    @ArdalanShahgholi it's probably because what was returned is a series, which is always 1 dimensional. Therefore, only `len(df.index)` will work – Connor Aug 01 '21 at 23:54
  • @Connor I need to have Number of rows and number of Columns from my DF. In my DF also i have a select it means i have a table and now the question is why i do not have SHAPE function on my DF? – Ardalan Shahgholi Aug 17 '21 at 18:41
  • Great question, make it a separate question on SO, share what you’ve tried and what you see as a result (give a full working set of code that’s simple for others to replicate) and then share the link to that question here. I’ll see if I can help – Connor Aug 19 '21 at 20:06
  • @ArdalanShahgholi `shape` is not a function is an attribute you can discover that comparing `df.shape` with `df.shape()` in your df. – rubengavidia0x Jan 26 '22 at 21:20
  • @connor `df.A.shape[0]` or `df.loc[:,'A'].shape[0]` work for series. – rubengavidia0x Jan 26 '22 at 21:21
270

Use len(df) :-).

__len__() is documented with "Returns length of index".

Timing info, set up the same way as in root's answer:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index) directly. But this should not matter in most cases. I find len(df) to be quite readable.

Dr. Jan-Philip Gehrcke
  • 33,287
  • 14
  • 85
  • 130
160

How do I get the row count of a Pandas DataFrame?

This table summarises the different situations in which you'd want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).

Enter image description here

Footnotes

  1. DataFrame.count returns counts for each column as a Series since the non-null count varies by column.
  2. DataFrameGroupBy.size returns a Series, since all columns in the same group share the same row-count.
  3. DataFrameGroupBy.count returns a DataFrame, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, use df.groupby(...)['x'].count() where "x" is the column to count.

Minimal Code Examples

Below, I show examples of each of the methods described in the table above. First, the setup -

df = pd.DataFrame({
    'A': list('aabbc'), 'B': ['x', 'x', np.nan, 'x', np.nan]})
s = df['B'].copy()

df

   A    B
0  a    x
1  a    x
2  b  NaN
3  b    x
4  c  NaN

s

0      x
1      x
2    NaN
3      x
4    NaN
Name: B, dtype: object

Row Count of a DataFrame: len(df), df.shape[0], or len(df.index)

len(df)
# 5

df.shape[0]
# 5

len(df.index)
# 5

It seems silly to compare the performance of constant time operations, especially when the difference is on the level of "seriously, don't worry about it". But this seems to be a trend with other answers, so I'm doing the same for completeness.

Of the three methods above, len(df.index) (as mentioned in other answers) is the fastest.

Note

  • All the methods above are constant time operations as they are simple attribute lookups.
  • df.shape (similar to ndarray.shape) is an attribute that returns a tuple of (# Rows, # Cols). For example, df.shape returns (8, 2) for the example here.

Column Count of a DataFrame: df.shape[1], len(df.columns)

df.shape[1]
# 2

len(df.columns)
# 2

Analogous to len(df.index), len(df.columns) is the faster of the two methods (but takes more characters to type).

Row Count of a Series: len(s), s.size, len(s.index)

len(s)
# 5

s.size
# 5

len(s.index)
# 5

s.size and len(s.index) are about the same in terms of speed. But I recommend len(df).

Note size is an attribute, and it returns the number of elements (=count of rows for any Series). DataFrames also define a size attribute which returns the same result as df.shape[0] * df.shape[1].

Non-Null Row Count: DataFrame.count and Series.count

The methods described here only count non-null values (meaning NaNs are ignored).

Calling DataFrame.count will return non-NaN counts for each column:

df.count()

A    5
B    3
dtype: int64

For Series, use Series.count to similar effect:

s.count()
# 3

Group-wise Row Count: GroupBy.size

For DataFrames, use DataFrameGroupBy.size to count the number of rows per group.

df.groupby('A').size()

A
a    2
b    2
c    1
dtype: int64

Similarly, for Series, you'll use SeriesGroupBy.size.

s.groupby(df.A).size()

A
a    2
b    2
c    1
Name: B, dtype: int64

In both cases, a Series is returned. This makes sense for DataFrames as well since all groups share the same row-count.

Group-wise Non-Null Row Count: GroupBy.count

Similar to above, but use GroupBy.count, not GroupBy.size. Note that size always returns a Series, while count returns a Series if called on a specific column, or else a DataFrame.

The following methods return the same thing:

df.groupby('A')['B'].size()
df.groupby('A').size()

A
a    2
b    2
c    1
Name: B, dtype: int64

Meanwhile, for count, we have

df.groupby('A').count()

   B
A
a  2
b  1
c  0

...called on the entire GroupBy object, vs.,

df.groupby('A')['B'].count()

A
a    2
b    1
c    0
Name: B, dtype: int64

Called on a specific column.

cs95
  • 379,657
  • 97
  • 704
  • 746
77

TL;DR use len(df)

len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df). For more about len function, see the official page.


Alternatively, you can access all rows and all columns with df.index, and df.columns,respectively. Since you can use the len(anyList) for getting the element numbers, using the len(df.index) will give the number of rows, and len(df.columns) will give the number of columns.

Or, you can use df.shape which returns the number of rows and columns together (as a tuple) where you can access each item with its index. If you want to access the number of rows, only use df.shape[0]. For the number of columns, only use: df.shape[1].

Memin
  • 3,788
  • 30
  • 31
  • 3
    @BrendanMetcalfe, I dont know what might me wrong with your dataframe without seeing the its data. You can check the small script end the end to see, indeed `len` works well for getting row counts. Here is the script https://onecompiler.com/python/3xc9nuvrx – Memin Sep 22 '21 at 19:19
  • I can't wrap my head around, why `df.shape` isn't faster than `len` as it just have to get the `shape` attribute and not call the function `__len__` – CutePoison Nov 01 '22 at 08:49
25

Apart from the previous answers, you can use df.axes to get the tuple with row and column indexes and then use the len() function:

total_rows = len(df.axes[0])
total_cols = len(df.axes[1])
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nik
  • 431
  • 1
  • 6
  • 10
  • 5
    This returns index objects, which may or may not be copies of the original, which is wasteful if you are just discarding them after checking the length. Unless you intend to do anything else with the index, **DO NOT USE**. – cs95 Mar 30 '19 at 20:13
15

...building on Jan-Philip Gehrcke's answer.

The reason why len(df) or len(df.index) is faster than df.shape[0]:

Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.

df.shape??
Type:        property
String form: <property object at 0x1127b33c0>
Source:
# df.shape.fget
@property
def shape(self):
    """
    Return a tuple representing the dimensionality of the DataFrame.
    """
    return len(self.index), len(self.columns)

And beneath the hood of len(df)

df.__len__??
Signature: df.__len__()
Source:
    def __len__(self):
        """Returns length of info axis, but here we use the index """
        return len(self.index)
File:      ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py
Type:      instancemethod

len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
debo
  • 319
  • 2
  • 8
  • 2
    The syntax highlighting does not seem quite right. Can you fix it? E.g., is this a mixture of output, code, and annotation (not a rhetorical question)? – Peter Mortensen Feb 08 '21 at 15:22
  • @PeterMortensen This output is from ipython/jupyter. Executing a function name with two question marks and without the parenthesis will show the function definition. ie for function `len()` you would execute `len??` – debo Apr 08 '21 at 04:04
11

I come to Pandas from an R background, and I see that Pandas is more complicated when it comes to selecting rows or columns.

I had to wrestle with it for a while, and then I found some ways to deal with:

Getting the number of columns:

len(df.columns)
## Here:
# df is your data.frame
# df.columns returns a string. It contains column's titles of the df.
# Then, "len()" gets the length of it.

Getting the number of rows:

len(df.index) # It's similar.
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chau Pham
  • 4,705
  • 1
  • 35
  • 30
  • After using *Pandas* for a while, I think we should go with `df.shape`. It returns the number of rows and columns respectively. – Chau Pham Oct 29 '18 at 10:16
9

For a dataframe df:

When you're still writing your code:

  1. len(df)
  2. df.shape[0]

Fastest once your code is done:

  • len(df.index)

At normal data sizes each option will finish in under a second. So the "fastest" option is actually whichever one lets you work the fastest, which can be len(df) or df.shape[0] if you already have a subsetted df and want to just add .shape[0] briefly in an interactive session.

In final optimized code, the fastest runtime is len(df.index).

Performance plot

df[df.columns[0]].count() was omitted in the above discussion because no commenter has identified a case where it is useful. It is exponentially slow, and long to type. It provides the number of non-NaN values in the first column.

Code to reproduce the plot:

pip install pandas perfplot

import numpy as np
import pandas as pd
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
    n_range=[2**k for k in range(25)],
    kernels=[
        lambda df: len(df.index),
        lambda df: len(df),
        lambda df: df.shape[0],
        lambda df: df[df.columns[0]].count(),
    ],
    labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
    xlabel="Number of rows",
)
  • I've tried twice to improve the accepted answer and been rejected both times. The accepted answer is unclear and pointlessly verbose, not telling people the fastest right of the bat. It also doesn't mention `len(df)` nor any purpose for `df[df.columns[0]].count()`. – Subatomic Tripod Feb 22 '23 at 18:25
8

In case you want to get the row count in the middle of a chained operation, you can use:

df.pipe(len)

Example:

row_count = (
      pd.DataFrame(np.random.rand(3,4))
      .reset_index()
      .pipe(len)
)

This can be useful if you don't want to put a long statement inside a len() function.

You could use __len__() instead but __len__() looks a bit weird.

Chris Tang
  • 567
  • 7
  • 18
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • 2
    It seems pointless to want to "pipe" this operation because there's nothing else you can pipe this into (it returns an integer). I would much rather `count = len(df.reset_index())` than `count = df.reset_index().pipe(len)`. The former is just an attribute lookup without the function call. – cs95 Mar 30 '19 at 20:15
8

You can do this also:

Let’s say df is your dataframe. Then df.shape gives you the shape of your dataframe i.e (row,col)

Thus, assign the below command to get the required

 row = df.shape[0], col = df.shape[1]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Saurav
  • 163
  • 2
  • 4
  • Or you can directly use `row, col = df.shape` instead if you need to get both at the same them (it's shorter and you do not have to care about indexes). – Nerxis May 17 '21 at 08:46
6

Either of this can do it (df is the name of the DataFrame):

Method 1: Using the len function:

len(df) will give the number of rows in a DataFrame named df.

Method 2: using count function:

df[col].count() will count the number of rows in a given column col.

df.count() will give the number of rows for all the columns.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 5
    This is a fine answer, but there are already sufficient answers to this question, so this doesn't really add anything. – John Apr 24 '20 at 18:07
4

For dataframe df, a printed comma formatted row count used while exploring data:

def nrow(df):
    print("{:,}".format(df.shape[0]))

Example:

nrow(my_df)
12,456,789
Vlad
  • 3,058
  • 4
  • 25
  • 53
4

When using len(df) or len(df.index) you might encounter this error:

----> 4 df['id'] = np.arange(len(df.index)
TypeError: 'int' object is not callable

Solution:

lengh = df.shape[0]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Lorenzo Bassetti
  • 795
  • 10
  • 15
0

An alternative method to finding out the amount of rows in a dataframe which I think is the most readable variant is pandas.Index.size.

Do note that, as I commented on the accepted answer,

Suspected pandas.Index.size would actually be faster than len(df.index) but timeit on my computer tells me otherwise (~150 ns slower per loop).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
gosuto
  • 5,422
  • 6
  • 36
  • 57
0

I'm not sure if this would work (data could be omitted), but this may work:

*dataframe name*.tails(1)

and then using this, you could find the number of rows by running the code snippet and looking at the row number that was given to you.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Abhiraam Eranti
  • 350
  • 5
  • 19
0

len(df.index) would work the fastest of all the ways listed

  • 1
    Why would that be? And do you have some performance measurements (incl. conditions, like hardware platform, all with versions)? – Peter Mortensen Oct 19 '22 at 01:38
0

df.index.stop will return the last index, means the number of rows if the step is 1.

df.index.size will return the total number of rows.

You can use either one, but preferably the latter.

-1

Think, the dataset is "data" and name your dataset as " data_fr " and number of rows in the data_fr is "nu_rows"

#import the data frame. Extention could be different as csv,xlsx or etc.
data_fr = pd.read_csv('data.csv')

#print the number of rows
nu_rows = data_fr.shape[0]
print(nu_rows)
SamithaP
  • 90
  • 1
  • 1
  • 13