Questions tagged [text-processing]

Manipulation or examining of text by programs, scripts, etc.

Unix systems tend to favor text files, often consisting of one record per line. Most unix configuration files are text files. Unix systems come with many tools to manipulate such files. Most tools process the file in a stream: read a line, process it, emit the corresponding output; this makes it possible to chain scripts with pipes.

Use this tag when your question is about processing text files and you're not sure which tool to use. If your question is about a specific tool, use its tag. If your question is about multiple tools, include this tag and the tags for the other tools.

When asking a text processing question, you should always

  • Explain the task you need to do
  • include a reasonable part of your input file (preformatted by indenting with four whitespaces)
  • include the expected output for this input data (also formatted)
  • give your attempt to solve the problem and what didn't work (this is not to embarrass you, it helps to give an explanation for the solution, so you'll learn to help yourself next time)

Text processing utilities

  • a simple line-by-line text processor, mostly used for regexp substitutions.
  • a scripting language dedicated to text file processing

Text processing often involves combining many single-purpose tools, such as:

  • select fields on each line
  • compare two files line by line
  • search a pattern in text files
  • show the first few lines of a file
  • display binary files in decimal, octal or hexadecimal
  • sort lines or fields alphabetically
  • split a file into fixed-size pieces
  • show the last few lines of a file; tail -f keeps the file open in case more data arrives
  • replicate the output of a command and send it to several destinations

For a list of many text utilities and more, check out busybox commands or GNU coreutils.

Other related tags

  • text processing is usually performed by shell scripts that calls the tools described above
  • many tasks require chaining several tools
  • the collection of GNU utilities (text processing and others), for regular Linux systems
  • a collection of utilities (text processing and others) for embedded Linux systems
  • when the going gets tough, it's better to switch to more general languages

Further reading

8057 questions
10 answers

How can I replace a string in a file(s)?

Replacing strings in files based on certain search criteria is a very common task. How can I replace string foo with bar in all files in the current directory? do the same recursively for sub directories? replace only if the file name matches…
  • 220,769
  • 58
  • 415
  • 622
4 answers

Why is printf better than echo?

I have heard that printf is better than echo. I can recall only one instance from my experience where I had to use printf because echo didn't work for feeding some text into some program on RHEL 5.8 but printf did. But apparently, there are other…
  • 11,842
  • 18
  • 60
  • 85
10 answers

How to append multiple lines to a file

I am writing a bash script to look for a file if it doesn't exist then create it and append this to it: Host localhost ForwardAgent yes So "line then new line 'tab' then text" I think its a sensitive format. I know you can do this: cat…
  • 5,955
  • 4
  • 14
  • 14
8 answers

Can grep output only specified groupings that match?

Say I have a file: # file: 'test.txt' foobar bash 1 bash foobar happy foobar I only want to know what words appear after "foobar", so I can use this regex: "foobar \(\w\+\)" The parenthesis indicate that I have a special interest in the word right…
Cory Klein
  • 17,221
  • 26
  • 78
  • 92
2 answers

Using 'sed' to find and replace

I know this question has probably been answered before. I have seen many threads about this in various places, but the answers are usually hard to extract for me. I am looking for help with an example usage of the 'sed' command. Say I wanted to…
  • 4,331
  • 4
  • 11
  • 7
19 answers

How do I trim leading and trailing whitespace from each line of some output?

I would like to remove all leading and trailing spaces and tabs from each line in an output. Is there a simple tool like trim I could pipe my output into? Example file: test space at back test space at front TAB at end TAB at front sequence…
  • 25,579
  • 39
  • 121
  • 184
5 answers

Why is using a shell loop to process text considered bad practice?

Is using a while loop to process text generally considered bad practice in POSIX shells? As Stéphane Chazelas pointed out, some of the reasons for not using shell loop are conceptual, reliability, legibility, performance and security. This answer…
  • 145,137
  • 37
  • 310
  • 391
18 answers

How to add a newline to the end of a file?

Using version control systems I get annoyed at the noise when the diff says No newline at end of file. So I was wondering: How to add a newline at the end of a file to get rid of those messages?
  • 13,703
  • 21
  • 55
  • 76
18 answers

How do you sort du output by size?

How do you sort du -sh /dir/* by size? I read one site that said use | sort -n but that's obviously not right. Here's an example that is wrong. [~]# du -sh /var/* | sort -n 0 /var/mail 1.2M /var/www 1.8M /var/tmp 1.9G /var/named 2.9M …
  • 55,442
  • 70
  • 180
  • 243
9 answers

Looping through files with spaces in the names?

I wrote the following script to diff the outputs of two directores with all the same files in them as such: #!/bin/bash for file in `find . -name "*.csv"` do echo "file = $file"; diff $file /some/other/path/$file; read…
Amir Afghani
  • 6,773
  • 11
  • 25
  • 23
10 answers

How to remove duplicate lines inside a text file?

A huge (up to 2 GiB) text file of mine contains about 100 exact duplicates of every line in it (useless in my case, as the file is a CSV-like data table). What I need is to remove all the repetitions while (preferably, but this can be sacrificed for…
  • 16,568
  • 30
  • 91
  • 115
6 answers

remove particular characters from a variable using bash

I want to parse a variable (in my case it's development kit version) to make it dot(.) free. If version='2.3.3', desired output is 233. I tried as below, but it requires . to be replaced with another character giving me 2_3_3. It would have been…
  • 4,498
  • 12
  • 31
  • 37
5 answers

How can I wrap text at a certain column size?

I know that I can use something like cat test.txt | pr -w 80 to wrap lines to 80 characters wide, but that puts a lot of space on the top and bottom of the printed lines and it does not work right on some systems What's the best way to force a text…
  • 42,359
  • 67
  • 143
  • 166
9 answers

Split string by delimiter and get N-th element

I have a string: one_two_three_four_five I need to save in a variable A value two and in variable B value fourfrom the above string I am using ksh.
  • 2,085
  • 2
  • 16
  • 12
6 answers

Return only the portion of a line after a matching pattern

So pulling open a file with cat and then using grep to get matching lines only gets me so far when I am working with the particular log set that I am dealing with. It need a way to match lines to a pattern, but only to return the portion of the line…
  • 2,424
  • 4
  • 21
  • 18
2 3
99 100