Clean Up Bash History at Shell Startup

This blog post describes a simple technique that you can use to clean your Bash shell command history every time that you start a Bash shell. I use this with Windows Subsystem for Linux (WSL).

To simplify repeating complex commands between sessions, the Bash shell records the commands that you invoke. Technically, it appears that when you end a session, Bash updates the ~/.bash_history file with at most the number of commands specified by the HISTSIZE environment variable. This is not a simple append operation, as the HISTFILESIZE prevents unlimited growth of the .bash_history file. I assume that commands disappear from history in the order that they appear.

Unfortunately, Bash command history management does not remove duplicate entries, so enough “cd” commands could remove more complex commands from the history.

To remove duplicate entries, you can add commands like these to your Bash initialization process:

nl ~/.bash_history \
  | sort -f -b +1 -r \
  | sed 's/[[:blank:]]*$//' \
  | uniq -f 1 -i \
  | sort -n \
  | cut -c8- > /tmp/.bash_history.$LOGNAME
mv /tmp/.bash_history.$LOGNAME ~/.bash_history

While this can be achieved more efficiently in a few lines of many languages, it is a good example of how capable a few well-known Unix commands can be for text processing, and it does not depend on inline perl or even [efr]?grep. The output of the first command feeds into the second with a pipe (|) character, which transforms it and streams it to the next, in this case using greater than (>) to write the combined output to a file.

The uniq command can only remove duplicate lines that appear in sequence. The command sequence can be relevant to the user, and it can be helpful for the oldest commands to disappear from the history. To get around this, prepend each line with a number, sort, remove the duplicates, sort by the number, and remove the number.

The nl (number lines) command prepends the line number to the commands in the history so that the second sort can restore that order after invoking uniq (unique). The first sort command ignores the number from nl and sorts by the command values in reverse order. Reversing the order before removing duplicates will cause older instances to be removed, placing commands used more recently later in the history. The sed (stream editor, in this case performing a substitution) command strips any trailing whitespace from all lines. The uniq command removes duplicate commands, ignoring character case (individuals may choose to change this) and the first field that contains the number that nl added. The second sort command restores the original history order. The cut (trim characters) command removes the number that nl had added. The output goes to a temporary file that then replaces the original ~/.bash_history.

Backup your .bash_history before running this, then run it, and then compare your .bash_history file, potentially using WinMerge. Speaking of WinMerge, this is the alias that I use to launch WinMerge:

alias winmerge='cmd=`wslpath "C:\Program Files (x86)\WinMerge\WinMergeU.exe"` ; "$cmd" &'

You may want to review .bash_history periodically, especially to remove the oldest lines that have likely become irrelevant.

One thought on “Clean Up Bash History at Shell Startup

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: