# why is tar file showing different files/size to directory



## robert wild (May 28, 2017)

hi all,

Why am I getting different file counts and different size for the same directory and tar file, they are the same directory I have made into a tar so I really don't get it.


```
tar -tf /vol/cha-archive/audio/Even\ When\ I\ Fall\ -\ January\ 2017\ Project.tar | wc -l
    8536

root@archive:/ # find /vol/cha-work/_ARCHIVE/to_be_archived/audio/chad_r/2017-05-17/Even\ When\ I\ Fall\ -\ January\ 2017\ Project/ -type f | wc -l
    8464
  
137G   /vol/cha-archive/audio/Even When I Fall - January 2017 Project.tar

138G   /vol/cha-work/_ARCHIVE/to_be_archived/audio/chad_r/2017-05-17/Even When I Fall - January 2017 Project/
```
many thanks,

rob


----------



## ralphbsz (May 28, 2017)

We need a lot more detail to debug this.  To begin with: In your tar, you are counting everything that is getting tarred; in your find, you are only counting files.  The difference might be directories, or other things like soft links and FIFOs.  Next question: Which entries are missing?  You should make a list of the tar, and a list of the files, sort both, then look for the differences (the `join` with the "-v" option is convenient for that), and then look what the common characteristics of the missing entries are.  Maybe there was a permission problem when creating the tar, and not everything got tarred?

On the sizes: How did you calculate them (you are not showing the commands)?  Depending on rounding, 137G and 138G might be identical.  Also, if you use `du` to determine the "size" of a file, you'll get a wrong answer, since it measures the disk usage of a file, which is different from the "size" reported in `ls` or `tar`: The disk usage is the number of blocks allocated to the file, while the size is the offset of the highest readable byte.  Due to rounding to block boundary and sparse files, those can easily differ.  And depending on how you measure the size, you might also be measuring the size of directories (which is non-zero!), and in the case of tar, the size of the headers.

Here is what I would do: Do a `tar -vt ...`, then use an awk script to pull out only files, and for those only the size and the name.  Then use a find command to again find only files, and print the size and the name (the "-ls" option on BSD tar can be used for that, although the "-printf" option on gnu find is more convenient), and again extract just a sorted list of sizes and file names.  Then compare and contrast.


----------



## ShelLuser (May 28, 2017)

Look into mtree(8), this is probably the best way to verify if everything was or wasn't fully archived.


----------

