X-Git-Url: https://fleuret.org/cgi-bin/gitweb/gitweb.cgi?p=finddup.git;a=blobdiff_plain;f=finddup.1;h=46a4326fb9f2e1244aef1bfe9a93588f43505880;hp=9cc21b4f13e9f2c5536f0a89423ab9c37bc0a240;hb=e4133d06373b48e8509afd0811bb0a726d74f8a8;hpb=a61c9478f31b957e0d4007df9feddd6f0139ccf8

diff --git a/finddup.1 b/finddup.1
index 9cc21b4..46a4326 100644
--- a/finddup.1
+++ b/finddup.1
@@ -69,10 +69,19 @@ use MD5 hashing
 
 None known, probably many. Valgrind does not complain though.
 
-The MD5 hashing often hurts more than it helps, hence it is off by
-default. The only case when it should really be useful is when you
-have plenty of different files of same size, which does not happen
-often.
+The MD5 hashing is not satisfactory. It is computed for a file only if
+the said file has to be read fully for a comparison (i.e. two files
+match and we have to read them completely).
+
+Hence, in practice lot of partial MD5s are computed, which costs a lot
+of cpu and is useless. This often hurts more than it helps, hence it
+is off by default. The only case when it should really be useful is
+when you have plenty of different files of same size, and lot of
+similar ones, which does not happen often.
+
+Forcing the files to be read fully so that the MD5s are properly
+computed is not okay neither, since it would fully read certain files,
+even if we will never need their MD5s.
 
 .SH "WISH LIST"