X-Git-Url: https://fleuret.org/cgi-bin/gitweb/gitweb.cgi?p=finddup.git;a=blobdiff_plain;f=finddup.1;h=46a4326fb9f2e1244aef1bfe9a93588f43505880;hp=9cc21b4f13e9f2c5536f0a89423ab9c37bc0a240;hb=e4133d06373b48e8509afd0811bb0a726d74f8a8;hpb=a61c9478f31b957e0d4007df9feddd6f0139ccf8 diff --git a/finddup.1 b/finddup.1 index 9cc21b4..46a4326 100644 --- a/finddup.1 +++ b/finddup.1 @@ -69,10 +69,19 @@ use MD5 hashing None known, probably many. Valgrind does not complain though. -The MD5 hashing often hurts more than it helps, hence it is off by -default. The only case when it should really be useful is when you -have plenty of different files of same size, which does not happen -often. +The MD5 hashing is not satisfactory. It is computed for a file only if +the said file has to be read fully for a comparison (i.e. two files +match and we have to read them completely). + +Hence, in practice lot of partial MD5s are computed, which costs a lot +of cpu and is useless. This often hurts more than it helps, hence it +is off by default. The only case when it should really be useful is +when you have plenty of different files of same size, and lot of +similar ones, which does not happen often. + +Forcing the files to be read fully so that the MD5s are properly +computed is not okay neither, since it would fully read certain files, +even if we will never need their MD5s. .SH "WISH LIST"