Organizing and Cleaning Data
1. The sort Command
sort rearranges the lines of text files or input alphabetically, numerically, or based on other criteria.
Basic Alphabetical Sort:
bash $ cat data.txt | sort
Numeric Sort (-n): Essential when sorting numbers, as standard sort treats them alphabetically (e.g., 10 comes before 2).
bash $ cat numbers.txt | sort -n
Reverse Sort (-r): Sorts in descending order.
bash $ cat data.txt | sort -r
2. The uniq Command
uniq removes or reports repeated adjacent lines in a file. Crucially, the file must be sorted first for uniq to work correctly.
Removing Duplicates:
bash $ cat log | sort | uniq
Counting Occurrences (-c): Very useful for generating frequency reports.
bash $ cat access.log | sort | uniq -c | sort -nr