Back to course

56. Filtering Data: The `sort` and `uniq` Commands

Linux Basics: From Zero to CLI Hero

Organizing and Cleaning Data

1. The sort Command

sort rearranges the lines of text files or input alphabetically, numerically, or based on other criteria.

Basic Alphabetical Sort:

bash $ cat data.txt | sort

Numeric Sort (-n): Essential when sorting numbers, as standard sort treats them alphabetically (e.g., 10 comes before 2).

bash $ cat numbers.txt | sort -n

Reverse Sort (-r): Sorts in descending order.

bash $ cat data.txt | sort -r

2. The uniq Command

uniq removes or reports repeated adjacent lines in a file. Crucially, the file must be sorted first for uniq to work correctly.

Removing Duplicates:

bash $ cat log | sort | uniq

Counting Occurrences (-c): Very useful for generating frequency reports.

bash $ cat access.log | sort | uniq -c | sort -nr

Sorts log entries, counts unique occurrences, and then sorts the counts numerically in reverse.