read_clipboard
is truly a saving grace for anyone starting out to answer questions in the Pandas tag. Unfortunately, pandas veterans also know that the data provided in questions isn't always easy to grok into a terminal due to various complications in the format of the data posted.
Thankfully, read_clipboard
has arguments that make handling most of these cases possible (and easy). Here are some common use cases and their corresponding arguments.
Common Use Cases
read_clipboard
uses read_csv
under the hood with white space separator, so a lot of the techniques for parsing data from CSV apply here, such as
parsing columns with spaces in the data
use sep
with regex argument. First, ensure there are at least two spaces between columns and at most one consecutive white space inside the column's data itself. Then you can use sep=r'\s{2,}'
which means "separate columns by looking for at least two consecutive white spaces for the separator" (note: engine='python'
is required for multicharacter or regex separators):
df = pd.read_clipboard(..., sep=r'\s{2,}', engine='python')
Also see How do you handle column names having spaces in them when using pd.read_clipboard?.
reading a series instead of DataFrame
loading data with custom header names
use names=[...]
in conjunction with header=None
and skiprows=[0]
to ignore existing headers.
df = pd.read_clipboard(..., header=None, names=['a', 'b', 'c'], skiprows=[0])
loading data without any headers
set one or more columns as the index
- use
index_col=[...]
with the appropriate label or index
parsing dates
- use
parse_dates
with the appropriate format. If parsing datetimes
(i.e., columns with date separated by timestamp), you will likely also need to use sep=r'\s{2,}'
while ensuring your columns are separated by at least two spaces.
See this answer by me for a more comprehensive list on read_csv
arguments for other cases not covered here.
Caveats
read_clipboard
is a Swiss Army knife. However, it
cannot read data in prettytable/tabulate formats (IOW, borders make it harder)
cannot correctly parse MultIndexes unless all elements in the index are specified.
cannot ignore/handle ellipses in data
- my suggested method is to manually remove ellipses before printing
cannot parse columns of lists (or other objects) as anything other than string. The columns will need to be converted separately, as shown in How do you read in a dataframe with lists using pd.read_clipboard?.
cannot read text from images (so please don't use images as a means to share your data with folks, please!)