Monday, March 25, 2013

Useful Text Processing Commands In Linux BASH

I've come back to this time and time again. I hope you find it useful!

#replace inline-file
sed -i 's/old/new/g' file.txt

awk '!x[$0]++' input.txt > output.txt
perl -ne 'print unless $dup{$_}++;' input.txt > output.txt
awk '{if (++dup[$0] == 1) print $0;}' input.txt > output.txt

#save only line containing TXT
sed '/TXT/!d' input.txt > output.txt

grep -F "TXT" input.txt > output.txt
#delete line matching TXT
sed '/TXT/d' input.txt > output.txt
grep -v -F "TXT" input.txt > output.txt

#trim trailing spaces inline
sed -i 's/[[:space:]]*$//' filename.txt

#delete blank lines
sed '/^[[:space:]]*$/d' input.txt > output.txt

#delete blank lines inline-file
sed -i '/^[[:space:]]*$/d' input.txt

#del blank lines - does not work in ALL cases
sed '/^$/d' input.txt > output.txt

#del blank lines - does not work in ALL cases
awk NF input.txt > output.txt

#append text to lines
sed 's/$/APPEND/' input.txt
cat input.txt | awk '{ print $0 "APPEND" }' > output.txt
awk '{ print $0 "APPEND" }' < input.txt > output.txt

#output IP's only
grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' input.txt > output.txt

#output numbered quartet (IP-like)
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' input.txt

#count occurences
sort input.txt | uniq -c > output.txt

#top 10 stats of unique text, where $1 means 1st column, $2 second column, etc
grep sometext input.txt | grep someothertext | awk '{ print $1 }' | sort -n | uniq -c | sort -rn | head > output.txt

#top 20 stats of IP's performing nslookup to ".ru" sites, where input.txt is from a Windows Server DNS log.
grep -F '(2)ru(0)' input.txt | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | sort -n | uniq -c | sort -rn | head -n 20 > output.txt

#grep or
egrep 'string1|string2' input.txt
grep 'string1\|string2' input.txt

#grep non-comment, non-empty-lines -- see 'grep or' above
cat /etc/rsyslog.conf | grep -v '^#\|^$'
cat /etc/php.ini | grep -v '^;\|^$'

#inverse head
#if "head" is first 10, then
tail -n +11 input.txt

#inverse tail
#if "tail" is last 10, then
head -n -11 input.txt

#remove lines in file2.txt from file1.txt
awk 'NR==FNR{a[$0]++;next} !a[$0]' file2.txt file1.txt > filtered.txt

#find international characters (possibly limited charset) (e.g. finds íóëã)
grep -P '[^\x00-\x7f]' input.txt

#replace international characters with plain-text (e.g. íóëã to ioea)
perl -C -MText::Unidecode -n -e 'print unidecode( $_)' < input.txt

#replace ` with '
sed "s/\`/'/g" input.txt

#de-timestamping (zero time from datestamp)
echo "2015-06-26 07:33:55" | awk '{$0=substr($0,1,11)"00:00:00"; print $0}'
echo "2015-06-26 07:33:55" | sed 's/[0-1][0-9]:[0-5][0-9]:[0-5][0-9]/00:00:00/g'

#take a .yml file that has some lines with "income:" <value> and replace the value with 10% of the value
#        income: 30.0
#        experience: 100.0
perl -pe 's/(income:) (\d+.*)/($1)." ".($2*0.10)/ge' jobConfig.yml >


sed one-liners

PERL Regular Expressions

As Always, Good Luck! You can thank me with bitcoin.