March 25, 2013

Useful Text Processing Commands In Linux BASH





I've come back to this time and time again. I hope you find it useful!

#replace inline-file
sed -i 's/old/new/g' file.txt

#de-duplicate
awk '!x[$0]++' input.txt > output.txt
#OR
perl -ne 'print unless $dup{$_}++;' input.txt > output.txt
#OR
awk '{if (++dup[$0] == 1) print $0;}' input.txt > output.txt

#save only line containing TXT
sed '/TXT/!d' input.txt > output.txt
#OR

grep -F "TXT" input.txt > output.txt
#delete line matching TXT
sed '/TXT/d' input.txt > output.txt
#OR
grep -v -F "TXT" input.txt > output.txt

#trim trailing spaces inline
sed -i 's/[[:space:]]*$//' filename.txt

#delete blank lines
sed '/^[[:space:]]*$/d' input.txt > output.txt

#delete blank lines inline-file
sed -i '/^[[:space:]]*$/d' input.txt

#del blank lines - does not work in ALL cases
sed '/^$/d' input.txt > output.txt

#del blank lines - does not work in ALL cases
awk NF input.txt > output.txt

# trim leading, middle, and trailing spaces (reconstitutes records )
echo '         text     txt   info     ' | awk '{$1=$1}1'


#append text to lines
sed 's/$/APPEND/' input.txt
#OR
cat input.txt | awk '{ print $0 "APPEND" }' > output.txt
#OR
awk '{ print $0 "APPEND" }' < input.txt > output.txt

#output IP's only
grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' input.txt > output.txt

#output numbered quartet (IP-like)
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' input.txt

#count occurences
sort input.txt | uniq -c > output.txt

#top 10 stats of unique text, where $1 means 1st column, $2 second column, etc
grep sometext input.txt | grep someothertext | awk '{ print $1 }' | sort -n | uniq -c | sort -rn | head > output.txt

#top 20 stats of IP's performing nslookup to ".ru" sites, where input.txt is from a Windows Server DNS log.
grep -F '(2)ru(0)' input.txt | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | sort -n | uniq -c | sort -rn | head -n 20 > output.txt

#grep or
egrep 'string1|string2' input.txt
grep 'string1\|string2' input.txt

#grep non-comment, non-empty-lines -- see 'grep or' above
cat /etc/rsyslog.conf | grep -v '^#\|^$'
cat /etc/php.ini | grep -v '^;\|^$'

#inverse head
#if "head" is first 10, then
tail -n +11 input.txt

#inverse tail
#if "tail" is last 10, then
head -n -11 input.txt

#remove lines in file2.txt from file1.txt
awk 'NR==FNR{a[$0]++;next} !a[$0]' file2.txt file1.txt > filtered.txt

#find international characters (possibly limited charset) (e.g. finds íóëã)
grep -P '[^\x00-\x7f]' input.txt

#replace international characters with plain-text (e.g. íóëã to ioea)
perl -C -MText::Unidecode -n -e 'print unidecode( $_)' < input.txt

#replace ` with '
sed "s/\`/'/g" input.txt

#de-timestamping (zero time from datestamp)
echo "2015-06-26 07:33:55" | awk '{$0=substr($0,1,11)"00:00:00"; print $0}'
#OR
echo "2015-06-26 07:33:55" | sed 's/[0-1][0-9]:[0-5][0-9]:[0-5][0-9]/00:00:00/g'

#find files that do not contain TEXT from a set of specific files
find ./ -iname "filename" -exec grep -L TEXT {} \;
# e.g. find src/main/target/ -iname "target.h" -exec grep -L USE_.*_EXTI {} \;

#take a .yml file that has some lines with "income:" <value> and replace the value with 10% of the value
#ex.
#      WATER_WORKER:
#        income: 30.0
#        experience: 100.0
perl -pe 's/(income:) (\d+.*)/($1)." ".($2*0.10)/ge' jobConfig.yml > jobConfig.yml.new



Grep AND, OR, NOT

sed one-liners

PERL Regular Expressions

~~~
As Always, Good Luck! You can thank me with bitcoin.   

1 comment:

  1. https://www.unix.com/shell-programming-and-scripting/143909-display-lines-file1-not-file2.html

    ReplyDelete

Comments, Suggestions or "Thank you's" Invited! If you have used this info in any way, please comment below and link/link-back to your project (if applicable). Please Share.
I accept Bitcoin tips of ANY amount to: 1GS3XWJCTWU7fnM4vfzerrVAxmnMFnhysL
I accept Litecoin tips of ANY amount to: LTBvVxRdv2Lz9T41UzqNrAVVNw4wz3kKYk