寫了一個找重複文件的 Bash 腳本,通過比較文件大小和校验和來判斷文件是否(可能)是重複的:
程序
#!/usr/bin/env bash
## Summary: find duplicate files
## Meng Lu <lumeng.dev@gmail.com>
DIR=${1:-`pwd`} ## use provided path if available, otherwise the current path
FILENAME=`basename $0`
TMPFILE=`mktemp /tmp/${FILENAME}.XXXXXX` || exit 1
## one-line version
#find -P . -type f -exec cksum '{}' \; | sort | tee $TMPFILE | cut -f 1-2 -d ' ' | uniq -d | grep -if - $TMPFILE | sort -nr -t' ' -k2,2 | cut -f 3- -d ' ' | while read line; do ls -lhta "$line"; done
## multi-line version with comments
find -P . -type f -exec cksum '{}' \; | # find non-directory files and compute their checksum; -P: never follow symbolic links
sort | # sort by {checksum, file size, file name}
tee $TMPFILE | # save a copy in a temporary file and pass along
cut -f 1-2 -d ' ' | # keep only the checksum and file size
uniq -d | # remove uniq ones
grep -if - $TMPFILE | # greps from previously saved file list the lines of duplicate files identified by having same file size and checksum; - is from redirecting stdout to stdin
sort -nr -t' ' -k2,2 | # sort by descending file size
cut -f 3- -d ' ' | # keep only file name
while read line; do ls -lhta "$line"; done # do informative ls on all found duplicate files
註釋
find -P . -type f -exec cksum '{}' \;-P不找符號鏈接文件(symbolic links);-type f找文件而非文件夾;-exec cksum '{}' \;對每個找到的文件('{}')計算校驗和,cksum輸出校驗和 文件大小 文件名,其中文件大小是八進制數個數;
sort排序,爲uniq做準備;tee $TMPFILE把stdout流的內容一方面保存到臨時文件,一方面繼續沿着 pipe 傳遞到下游;cut -f 1-2 -d ' '只保留第1、2欄,欄目以空格分;uniq -d刪除唯一的亦即無重複的行;grep -if - $TMPFILE通過-將輸出流轉換爲輸入流,在預存的文件目錄中找重複文件的{校驗和,文件大小}出現的行,注意,這裏的行包含文件名;sort -nr -t' ' -k2,2對找出的重複的文件按大小降序排序;cut -f 3- -d ' '之保留保留文件名;while read line; do ls -lhta "$line"; done對每一文件打印詳細信息。
相關文章
- find 應用實例
- grep 應用實例