Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
3945
votes
33 answers

How to iterate over rows in a DataFrame in Pandas

I have a pandas dataframe, df: c1 c2 0 10 100 1 11 110 2 12 120 How do I iterate over the rows of this dataframe? For every row, I want to access its elements (values in cells) by the name of the columns. For example: for row in…
Roman
  • 124,451
  • 167
  • 349
  • 456
3394
votes
17 answers

How do I select rows from a DataFrame based on column values?

How can I select rows from a DataFrame based on values in some column in Pandas? In SQL, I would use: SELECT * FROM table WHERE column_name = some_value
szli
  • 36,893
  • 11
  • 32
  • 40
2886
votes
36 answers

Renaming column names in Pandas

I want to change the column labels of a Pandas DataFrame from ['$a', '$b', '$c', '$d', '$e'] to ['a', 'b', 'c', 'd', 'e']
user1504276
  • 28,955
  • 3
  • 15
  • 7
2168
votes
22 answers

Delete a column from a Pandas DataFrame

To delete a column in a DataFrame, I can successfully use: del df['column_name'] But why can't I use the following? del df.column_name Since it is possible to access the Series via df.column_name, I expected this to work.
John
  • 41,131
  • 31
  • 82
  • 106
1829
votes
19 answers

How do I get the row count of a Pandas DataFrame?

How do I get the number of rows of a pandas dataframe df?
yemu
  • 26,249
  • 10
  • 32
  • 29
1700
votes
24 answers

Selecting multiple columns in a Pandas dataframe

How do I select columns a and b from df, and save them into a new dataframe df1? index a b c 1 2 3 4 2 3 4 5 Unsuccessful attempt: df1 = df['a':'b'] df1 = df.ix[:, 'a':'b']
user1234440
  • 22,521
  • 18
  • 61
  • 103
1579
votes
41 answers

How to change the order of DataFrame columns?

I have the following DataFrame (df): import numpy as np import pandas as pd df = pd.DataFrame(np.random.rand(10, 5)) I add more column(s) by assignment: df['mean'] = df.mean(1) How can I move the column mean to the front, i.e. set it as first…
Timmie
  • 15,995
  • 3
  • 14
  • 7
1475
votes
16 answers

Change column type in pandas

I created a DataFrame from a list of lists: table = [ ['a', '1.2', '4.2' ], ['b', '70', '0.03'], ['x', '5', '0' ], ] df = pd.DataFrame(table) How do I convert the columns to specific types? In this case, I want to convert…
user1642513
1402
votes
15 answers

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

I have this DataFrame and want only the records whose EPS column is not NaN: >>> df STK_ID EPS cash STK_ID RPT_Date 601166 20111231 601166 NaN NaN 600036 20111231 600036 NaN 12 600016 20111231 600016 …
bigbug
  • 55,954
  • 42
  • 77
  • 96
1358
votes
32 answers

Create a Pandas Dataframe by appending one row at a time

How do I create an empty DataFrame, then add rows, one by one? I created an empty DataFrame: df = pd.DataFrame(columns=('lib', 'qty1', 'qty2')) Then I can add a new row at the end and fill a single field with: df = df._set_value(index=len(df),…
PhE
  • 15,656
  • 4
  • 23
  • 21
1354
votes
21 answers

How to deal with SettingWithCopyWarning in Pandas

Background I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this: E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a…
1331
votes
24 answers

Get a list from Pandas DataFrame column headers

I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called. For example, if I'm given a DataFrame like this: y gdp …
natsuki_2002
  • 24,239
  • 21
  • 46
  • 50
1296
votes
8 answers

Use a list of values to select rows from a Pandas dataframe

Let’s say I have the following Pandas dataframe: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) df A B 0 5 1 1 6 2 2 3 3 3 4 5 I can subset based on a specific value: x = df[df['A'] == 3] x A B 2 3 …
zach
  • 29,475
  • 16
  • 67
  • 88
1295
votes
32 answers

How to add a new column to an existing DataFrame?

I have the following indexed DataFrame with named columns and rows not- continuous numbers: a b c d 2 0.671399 0.101208 -0.181532 0.241273 3 0.446172 -0.243316 0.051767 1.577318 5 0.614758 0.075793 -0.451460…
tomasz74
  • 16,031
  • 10
  • 37
  • 51
1178
votes
16 answers

"Large data" workflows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other…
Zelazny7
  • 39,946
  • 18
  • 70
  • 84
1
2 3
99 100