# Cannot Delete a Directory



## John Wright (Sep 21, 2017)

I have a directory that will not delete.  When I attempt `rm -rf camera` it just sits there and does nothing.  I let it sit for over 8 hours and still nothing.  I try to list the contents `ls camera` and again nothing, just sits there with a blinking cursor.  If I try `du` same outcome.

Anyone ever see this kind of an issue before and know how to fix it?

Thanks.


----------



## SirDice (Sep 21, 2017)

That sounds like a filesystem corruption. Is this on UFS or ZFS?


----------



## CyberCr33p (Sep 21, 2017)

Did you try to fsck the disk ?


----------



## SirDice (Sep 21, 2017)

CyberCr33p said:


> Did you try to fsck the disk ?


That would have been my suggestion too. But fsck(8) only works on UFS, not on ZFS. Hence my question.


----------



## John Wright (Sep 21, 2017)

Its UFS and in fact is NAS4Free.  Its been a while since I have used fsck, so will have to review my procedures, but that is something I hadn't thought of.  I use Fedora on my primary systems, so not too familiar with FreeBSD.  I will try fsck, thanks.


----------



## Minbari (Sep 21, 2017)

zpool scrub pool.


----------



## John Wright (Sep 21, 2017)

Correction, it is a ZFS file system


----------



## SirDice (Sep 21, 2017)

Make sure you have some backups beforehand. If this really is a filesystem corruption "fixing" it may inadvertently remove a bunch of stuff. Preferably run the fsck(8) from single user mode. Alternatively you can unmount the filesystem and run fsck(8) (but unmounting would be impossible if the filesystem happens to be /).

Edit: Right, ZFS. In that case try `zpool scrub <poolname>`. But this may not fix things and it's possible it's so corrupt even ZFS's self-healing can't deal with it any more. In which case there's nothing else to do but restore from backups.


----------



## John Wright (Sep 21, 2017)

I have never used zpool, so will need to do some research, from the man pages it looks like that may be what I need.


----------



## John Wright (Sep 21, 2017)

I found the issue while running rsync on the drive.  I have everything backed up and am in the process of making a second back just in case.  Once that is done I will begin to play around a little.


----------



## John Wright (Sep 21, 2017)

SirDice said:


> Make sure you have some backups beforehand. If this really is a filesystem corruption "fixing" it may inadvertently remove a bunch of stuff. Preferably run the fsck(8) from single user mode. Alternatively you can unmount the filesystem and run fsck(8) (but unmounting would be impossible if the filesystem happens to be /).
> 
> Edit: Right, ZFS. In that case try `zpool scrub <poolname>`. But this may not fix things and it's possible it's so corrupt even ZFS's self-healing can't deal with it any more. In which case there's nothing else to do but restore from backups.



Ok, understand.  I was actually trying to delete the directory, so since I can't delete the directory, my only other option may be to rebuild the pool somehow.  I am way over my head and will have to do a lot of research here, I have never actually fiddled with ZFS, only RAIDs on large servers, I used ZFS on my home NAS because my reading indicated it was a very fault tolerant system, and so far it has been.  This NAS has been running for about 5 years not without a hiccup, and I have been so pleased I am going to start converting some of my customers to the same system instead of Windows file servers.


----------



## SirDice (Sep 21, 2017)

John Wright said:


> I have never actually fiddled with ZFS,


ZFS is great, easy to use, etc. But if you have filesystem corruptions all bets are off. Then ZFS is anything but easy. 



John Wright said:


> because my reading indicated it was a very fault tolerant system, and so far it has been.


It absolutely is. But there's only so much the self-healing can fix. It's definitely not bulletproof.


----------



## John Wright (Sep 21, 2017)

SirDice said:


> ZFS is great, easy to use, etc. But if you have filesystem corruptions all bets are off. Then ZFS is anything but easy.
> 
> 
> It absolutely is. But there's only so much the self-healing can fix. It's definitely not bulletproof.



Pretty much like every other OS I have used.  Great while they are running, but when things go wrong, it can be difficult.  But that's why we need computer engineers and technicians.  If everything worked perfectly all the time I would be out of a job and life would be very boring!!


----------



## John Wright (Sep 22, 2017)

OK, zspool found no errors in the pool.  So must be something else wrong with the file.


----------



## ralphbsz (Sep 22, 2017)

OK, time to diagnose some more.  Just knowing that you have some commands that don't work, but zpool scrub found no problems, will not help us give any advice for how to proceed.

You say you have a directory called "camera", which is stored somewhere in ZFS.  First question: Make sure that directory is actually what you think it is: Go to the parent of the directory, and do a "ls -lF", which will show you whether that directory is in reality a link, and it shows you the link count (second field), which is the number of subdirectories in "camera".  Second step: Do `stat -x camera`, which will show you a lot of interesting statistics about it.  Anything unusual?  Is it on the same device as its parent directory?  How about permissions, ACLs, and such?

Next thing: You say that `rm -rf`, `ls` and `du` all hang.  Do you know anything about the content of that directory?  If you could get a listing of all things in that directory (not via using `ls`, which will hang, but for example from prior knowledge), you might be able to see whether the problem affects every entry in that directory, or only one.

When these commands hang, do you know what state they are in?  Do they use CPU time?  Can you interrupt them with control C?  Are there are console messages about hardware problems or ZFS internals at that time?  Ideally, you could drill down further and see exactly what they are doing: Either run `ls` under a debugger and see how for it gets, or use dtrace to see what system calls it makes, and what system call it hangs up on.


----------



## k.jacker (Sep 22, 2017)

Have you been using autofs to mount a camera on that directory?
Or could it be a auto-created directory by /etc/autofs/special_hosts, so related to a hostname on your network?
It would show the same behaviour if the device/filesystem to be mounted is not available.
In that case all autofs related services have to be stopped/disabled first.


----------



## John Wright (Sep 22, 2017)

ralphbsz said:


> OK, time to diagnose some more.  Just knowing that you have some commands that don't work, but zpool scrub found no problems, will not help us give any advice for how to proceed.
> 
> You say you have a directory called "camera", which is stored somewhere in ZFS.  First question: Make sure that directory is actually what you think it is: Go to the parent of the directory, and do a "ls -lF", which will show you whether that directory is in reality a link, and it shows you the link count (second field), which is the number of subdirectories in "camera".  Second step: Do `stat -x camera`, which will show you a lot of interesting statistics about it.  Anything unusual?  Is it on the same device as its parent directory?  How about permissions, ACLs, and such?
> 
> ...



I inserted some answers after the appropriate para above:

When I run ls -al on LHouse and then top here is the ls line:


```
PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 5387 root            1  89    0 32256K  2760K RUN     1  12:04  58.98% rsync
 2747 root           13  20    0   229M 14944K nanslp  1 648:40  51.95% fuppesd
 5385 root            1  52    0 32256K  3028K select  0  10:46  51.37% rsync
 5425 jwright         1  22    0 16872K  3380K zio->i  1   0:02   3.37% ls
```

So it appears ls is running, not sure what the state zio->i means.  I can kill the process with ^C.  But there are no console messages about anything.  I will research how to use a debugger and dtrace and try those.

Many thanks for you advice and assistence.


----------



## ralphbsz (Sep 22, 2017)

It seems that the directory is on the correct file system.  and the stat doesn't show anything extraordinary.  The directory LHouse has 98 subdirectories (a link count of 100), which is not particularly high (if it had a million, that would have raised eyebrows, but on a modern system 100 is uninteresting).

The thing which is really odd: The ls command is doing something (it is using some CPU time, 3.37% in the example above), and it is waiting for IO (that's what zio->i means).  Either that subdirectory is gigantic and the ls just takes a horrendously long time (seems very unlkely, doing an ls can not take hours in practice), or the IOs are ludicrously slow so the little bit of work the ls has to do takes very long (same argument, IOs can't be longer than ~30s apiece without raising error messages on the console), or something else is wrong.

My only suggestion is a bug in ZFS; I bet that something is stuck in a loop.  This would be a question for ZFS internals developers.


----------

