Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.

Pandas is a Python library for PAN-el DA-ta manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.

To create a reproducible Pandas example:

Main Features:

  • Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging and joining together data sets
    • Simple Matplotlib integration for plotting and graphing
    • Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
  • Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, and covariance
  • Rich User Documentation, using Sphinx

Asking Questions:

  • Before asking the question, make sure you have gone through the 10 Minutes to pandas introduction. It covers all the basic functionality of Pandas.
  • See this question on asking good questions: How to make good reproducible pandas examples
  • Please provide the version of Pandas, NumPy, and platform details (if appropriate) in your questions

Answering Questions:

Useful Canonicals:

More FAQs are at this link.

Resources and Tutorials:

Books:

282843 questions
47
votes
6 answers

Apply StandardScaler to parts of a data set

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 …
mitsi
  • 1,005
  • 2
  • 11
  • 15
47
votes
13 answers

Convert month int to month name in Pandas

I want to transform an integer between 1 and 12 into an abbrieviated month name. I have a df which looks like: client Month 1 sss 02 2 yyy 12 3 www 06 I want the df to look like this: client Month 1 sss Feb 2 yyy Dec 3 …
Boosted_d16
  • 13,340
  • 35
  • 98
  • 158
47
votes
5 answers

Pandas OHLC aggregation on OHLC data

I understand that OHLC re-sampling of time series data in Pandas, using one column of data, will work perfectly, for example on the following dataframe: >>df ctime openbid 1443654000 1.11700 1443654060 1.11700 ... df['ctime'] =…
user3439187
  • 613
  • 1
  • 7
  • 10
47
votes
12 answers

Reading a file from a private S3 bucket to a pandas dataframe

I'm trying to read a CSV file from a private S3 bucket to a pandas dataframe: df = pandas.read_csv('s3://mybucket/file.csv') I can read a file from a public bucket, but reading a file from a private bucket results in HTTP 403: Forbidden error. I…
IgorK
  • 483
  • 1
  • 5
  • 8
47
votes
5 answers

Reading a csv with a timestamp column, with pandas

When doing: import pandas x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', names=['DateTime', 'X'], header=None, sep=';') with this data.csv…
Basj
  • 41,386
  • 99
  • 383
  • 673
47
votes
2 answers

"Too many indexers" with DataFrame.loc

I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc to slice a DataFrame with a MultiIndex. I'll start with the DataFrame from this SO answer: …
LondonRob
  • 73,083
  • 37
  • 144
  • 201
47
votes
1 answer

How to find the line that is generating a Pandas SettingWithCopyWarning?

I have a large block of code that is, at some point somewhere, generating a setting with copy warning in pandas (this problem). I know how to fix the problem, but I can't find what line number it is! Is there a way to back out the line number (apart…
tim654321
  • 2,218
  • 2
  • 15
  • 19
47
votes
6 answers

pandas left join and update existing column

I am new to pandas and can't seem to get this to work with merge function: >>> left >>> right a b c a c d 0 1 4 9 0 1 7 13 1 2 5 10 1 2 8 14 2 3 6 11 2 3 9 15 3 4 7 12 With a left join on…
iwbabn
  • 1,275
  • 4
  • 17
  • 32
47
votes
2 answers

iPython/Jupyter Notebook and Pandas, how to plot multiple graphs in a for loop?

Consider the following code running in iPython/Jupyter Notebook: from pandas import * %matplotlib inline ys = [[0,1,2,3,4],[4,3,2,1,0]] x_ax = [0,1,2,3,4] for y_ax in ys: ts = Series(y_ax,index=x_ax) ts.plot(kind='bar', figsize=(15,5)) I…
alec_djinn
  • 10,104
  • 8
  • 46
  • 71
47
votes
2 answers

Absolute value for column in Python

How could I convert the values of column 'count' to absolute value? A summary of my dataframe this: datetime count 0 2011-01-20 00:00:00 14.565996 1 2011-01-20 01:00:00 10.204177 2 2011-01-20 02:00:00 -1.261569 3 …
Yari
  • 867
  • 3
  • 9
  • 13
47
votes
4 answers

Pandas dataframe to json without index

I'm trying to take a dataframe and transform it into a partcular json format. Here's my dataframe example: DataFrame name: Stops id location 0 [50, 50] 1 [60, 60] 2 [70, 70] 3 [80, 80] Here's the json format I'd like to transform…
Eric Miller
  • 1,367
  • 4
  • 13
  • 20
47
votes
2 answers

Python pandas integer YYYYMMDD to datetime

I have a DataFrame that looks like the following: OrdNo LstInvDt 9 20070620 11 20070830 19 20070719 21 20070719 23 20070719 26 20070911 29 20070918 31 0070816 34 20070925 LstInvDt of dtype int64. As you can…
Rookie
  • 1,590
  • 5
  • 20
  • 34
47
votes
8 answers

Multiple histograms in Pandas

I would like to create the following histogram (see image below) taken from the book "Think Stats". However, I cannot get them on the same plot. Each DataFrame takes its own subplot. I have the following code: import nsfg import matplotlib.pyplot…
Rohit
  • 5,840
  • 13
  • 42
  • 65
47
votes
2 answers

How to read a file with a semi colon separator in pandas

I a importing a .csv file in python with pandas. Here is the file format from the .csv : a1;b1;c1;d1;e1;... a2;b2;c2;d2;e2;... ..... here is how get it : from pandas import * csv_path = "C:...." data = read_csv(csv_path) Now when I print the…
Jean
  • 1,707
  • 3
  • 24
  • 43
47
votes
2 answers

Pandas converting String object to lower case and checking for string

I have the below code import pandas as pd private = pd.read_excel("file.xlsx","Pri") public = pd.read_excel("file.xlsx","Pub") private["ISH"] = private.HolidayName.str.lower().contains("holiday|recess") public["ISH"] =…
user1452759
  • 8,810
  • 15
  • 42
  • 58
1 2 3
99
100