X-Git-Url: https://fleuret.org/cgi-bin/gitweb/gitweb.cgi?p=finddup.git;a=blobdiff_plain;f=finddup.1;h=e262386a1ec035946a50828f69cf6582b3320429;hp=faaef4de3dc324f2786f6ebb38cf48608c5a926f;hb=ab7b6e26f35ac1dfc88d9bf1e09dd289a30ea782;hpb=ba0a93e5d70103f2aee5395eac74b05792aaa3a8 diff --git a/finddup.1 b/finddup.1 index faaef4d..e262386 100644 --- a/finddup.1 +++ b/finddup.1 @@ -61,30 +61,17 @@ show the real path of the files .TP \fB-i\fR, \fB--same-inodes-are-different\fR files with same inode are considered as different -.TP -\fB-m\fR, \fB--md5\fR -use MD5 hashing (if compiled with the option) .SH "BUGS" None known, probably many. Valgrind does not complain though. -The MD5 hashing is not satisfactory. It is computed for a file only if -the said file has to be read fully for a comparison (i.e. two files -match and we have to read them completely). - -Hence, in practice lot of partial MD5s are computed, which costs a lot -of cpu and is useless. This often hurts more than it helps. The only -case when it should really be useful is when you have plenty of -different files of same size, and lot of similar ones, which does not -happen often. - -Forcing the files to be read fully so that the MD5s are properly -computed is not okay neither, since it would fully read certain files, -even if we will never need their MD5s. - -Anyway, it has to be compiled in with 'make WITH_MD5=yes', and even in -that case it will be off by default +The current algorithm is dumb, that is it does not use any hashing of +the file content. I tried md5 on the whole file, which is not +satisfactory because files are often never read entirely hence the md5 +can not be properly computed. I also tried XOR of the first 4, 16 and +256 bytes with rejection as soon as one does not match. Did not help +either. .SH "WISH LIST"