Showing posts with label grep. Show all posts
Showing posts with label grep. Show all posts

March 25, 2013

Useful Text Processing Commands In Linux BASH





I've come back to this time and time again. I hope you find it useful!

#replace inline-file
sed -i 's/old/new/g' file.txt

#de-duplicate
awk '!x[$0]++' input.txt > output.txt
#OR
perl -ne 'print unless $dup{$_}++;' input.txt > output.txt
#OR
awk '{if (++dup[$0] == 1) print $0;}' input.txt > output.txt

#save only line containing TXT
sed '/TXT/!d' input.txt > output.txt
#OR

grep -F "TXT" input.txt > output.txt
#delete line matching TXT
sed '/TXT/d' input.txt > output.txt
#OR
grep -v -F "TXT" input.txt > output.txt

#trim trailing spaces inline
sed -i 's/[[:space:]]*$//' filename.txt

#delete blank lines
sed '/^[[:space:]]*$/d' input.txt > output.txt

#delete blank lines inline-file
sed -i '/^[[:space:]]*$/d' input.txt

#del blank lines - does not work in ALL cases
sed '/^$/d' input.txt > output.txt

#del blank lines - does not work in ALL cases
awk NF input.txt > output.txt

# trim leading, middle, and trailing spaces (reconstitutes records )
echo '         text     txt   info     ' | awk '{$1=$1}1'


#append text to lines
sed 's/$/APPEND/' input.txt
#OR
cat input.txt | awk '{ print $0 "APPEND" }' > output.txt
#OR
awk '{ print $0 "APPEND" }' < input.txt > output.txt

#output IP's only
grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' input.txt > output.txt

#output numbered quartet (IP-like)
grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' input.txt

#count occurences
sort input.txt | uniq -c > output.txt

#top 10 stats of unique text, where $1 means 1st column, $2 second column, etc
grep sometext input.txt | grep someothertext | awk '{ print $1 }' | sort -n | uniq -c | sort -rn | head > output.txt

#top 20 stats of IP's performing nslookup to ".ru" sites, where input.txt is from a Windows Server DNS log.
grep -F '(2)ru(0)' input.txt | grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' | sort -n | uniq -c | sort -rn | head -n 20 > output.txt

#grep or
egrep 'string1|string2' input.txt
grep 'string1\|string2' input.txt

#grep non-comment, non-empty-lines -- see 'grep or' above
cat /etc/rsyslog.conf | grep -v '^#\|^$'
cat /etc/php.ini | grep -v '^;\|^$'

#inverse head
#if "head" is first 10, then
tail -n +11 input.txt

#inverse tail
#if "tail" is last 10, then
head -n -11 input.txt

#remove lines in file2.txt from file1.txt
awk 'NR==FNR{a[$0]++;next} !a[$0]' file2.txt file1.txt > filtered.txt

#find international characters (possibly limited charset) (e.g. finds íóëã)
grep -P '[^\x00-\x7f]' input.txt

#replace international characters with plain-text (e.g. íóëã to ioea)
perl -C -MText::Unidecode -n -e 'print unidecode( $_)' < input.txt

#replace ` with '
sed "s/\`/'/g" input.txt

#de-timestamping (zero time from datestamp)
echo "2015-06-26 07:33:55" | awk '{$0=substr($0,1,11)"00:00:00"; print $0}'
#OR
echo "2015-06-26 07:33:55" | sed 's/[0-1][0-9]:[0-5][0-9]:[0-5][0-9]/00:00:00/g'

#find files that do not contain TEXT from a set of specific files
find ./ -iname "filename" -exec grep -L TEXT {} \;
# e.g. find src/main/target/ -iname "target.h" -exec grep -L USE_.*_EXTI {} \;

#take a .yml file that has some lines with "income:" <value> and replace the value with 10% of the value
#ex.
#      WATER_WORKER:
#        income: 30.0
#        experience: 100.0
perl -pe 's/(income:) (\d+.*)/($1)." ".($2*0.10)/ge' jobConfig.yml > jobConfig.yml.new



Grep AND, OR, NOT

sed one-liners

PERL Regular Expressions

~~~
As Always, Good Luck! You can thank me with bitcoin.