# Speed up search in large directories



## alexsc13 (Nov 30, 2011)

Is there a way to speed up the search in directories with ten thousands of files? I have to search for something with find quite often and it takes always around 10 minutes or more.


----------



## Martillo1 (Nov 30, 2011)

locate()


----------



## gpw928 (Dec 1, 2011)

Hi,

You follow a well trodden path.

Unix directories are linear by design, and thus, must be searched in a linear way (mostly).

It's a bad design decision to have tens of thousands of entries in a directory, because linear searching of this much stuff is slow.

It's probably trite to suggest, but don't make large directories...

If you have no control over the directory structures, then the suggestion of "locate" may be useful.  However it has two serious limitations, namely:


the directories of interest must be readable by the user "nobody"; and

it relies on a weekly cron job, "periodic weekly" to run the script /etc/periodic/weekly/310.locate, to populate a database, which may be out of date by up to 7 days.
You could run the 310.locate script more often (e.g. nightly -- see the periodic entries in /etc/crontab).  This will run slowly too, but hopefully at a time when nobody cares.

Cheers,

-- 
Phil


----------



## wblock@ (Dec 1, 2011)

/etc/periodic/weekly/310.locate can be run manually any time you feel like it will be needed, too.


----------



## fluca1978 (Dec 1, 2011)

I had a similar issue once, and the solution was to organize files in a more complex directory tree (it can be automated with scripts). In particular you can leave the flat space to the users, with a periodic-ally script that re-arranges files let say every night. 

locate does not check the real existence of the file, it reports the information about which files were present when it ran, so it can give you bad results in case of deletion or in case of last added files. 

Depending on your aim you can even consider placing meta-information about files in a database, but this become a quite complex solution (not difficult), so my first suggestion is to (automatically) arrange the directory tree.


----------



## alexsc13 (Dec 1, 2011)

I usually also try to use locate whenever possible but since there are files being changed all the time it is not really as reliable as I would like it to be and running /etc/periodic/weekly/310.locate few times a day doesn't seem like an ideal solution either.

I do not have any influence at the directory structure and cannot modify it either, so I guess this means there is nothing that can be done to improve it then.


----------



## fluca1978 (Dec 1, 2011)

What is the exact aim of your search? To check if a file exists? To find all files that matches a name? Or a modification date? Because you can always build your own _indexing solution_ if you have a clear selection criteria.


----------



## alexsc13 (Dec 1, 2011)

My aim is to find files by name or part of the name. I do not need the date owner permissions or anything else.


----------



## fluca1978 (Dec 1, 2011)

Test if a simple `$ ls` is faster than a full `$ find`.
Another solution that comes into my mind, and that I've applied once, is to keep files organized in subtrees and provide users with a flat space that has links to the files. In But if you cannot operate on the space this does not apply to you either.
Probably you can do something like this (and see if it is faster):
- keep a text file with the names of the files stored in the directory
- when searching for a file grep the file with the part of the name you have
- do an exact file access to found names

Of course the text file (index) can be updated as you need. It is a kind of poor-self-made-locate, I'm not sure it is worth implementing it (but it simple, 10 minutes scripting).
Advantages are that accessing a single file for grepping could be faster than doing an opendir to scan entries and that accessing with an exact file name could be again faster than an opendir + name scan. Assuming this is correct and it gives you faster results, I don't expect blinding fast access times. And by the way you are just delaying the real problem that is a too much unorganized flat space.


----------



## alexsc13 (Dec 1, 2011)

I am sorry, I think I did not explain detailed enough. I do not have one folder with all the files in it, I have one folder in which are more folders with each having like 20 files in it and another folder with again around 20 files in it. That is why a simple "l" wont cut it sadly.


----------



## gpw928 (Dec 1, 2011)

Hi,

It sounds like your directories are not huge.  This is encouraging, as it may be possible to optimise the search.

1.  What method are you currently using to locate files of interest?

2.  Does the name of each directory assist in any way to find the file(s) you want?

Cheers,

-- 
Phil


----------



## alexsc13 (Dec 1, 2011)

The directories are not huge at all, there are just so many of them.

To find the files I am looking for I use either find or locate right now.

And no, the directory name are just random generated by the software behind this.

What I was thinking about, in this structure of folders there are a few folders with many sub-folders and files in them which I never have to search. Would it be possible to permanently exclude them from searching.


----------



## User23 (Dec 1, 2011)

If you have a lot of directories and files on a UFS keep an eye on:

vfs.ufs.dirhash_mem vs vfs.ufs.dirhash_maxmem

and increase the vfs.ufs.dirhash_maxmem limit if vfs.ufs.dirhash_mem reached it or come close to it.
This can improve the performance under some circumstances.


```
sysctl -a | grep dirhash
vfs.ufs.dirhash_reclaimage: 5
vfs.ufs.dirhash_lowmemcount: 18905
vfs.ufs.dirhash_docheck: 0
[B]vfs.ufs.dirhash_mem: 8430
vfs.ufs.dirhash_maxmem: 2097152[/B]
vfs.ufs.dirhash_minsize: 2560
```


----------



## wblock@ (Dec 1, 2011)

alexsc13 said:
			
		

> What I was thinking about, in this structure of folders there are a few folders with many sub-folders and files in them which I never have to search. Would it be possible to permanently exclude them from searching.



Yes, see find(1)'s -depth n and -prune options.


----------



## alexsc13 (Dec 1, 2011)

Hmm doesn't look like that would help:


```
vfs.ufs.dirhash_reclaimage: 5
vfs.ufs.dirhash_lowmemcount: 224
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 231646
vfs.ufs.dirhash_maxmem: 2097152
vfs.ufs.dirhash_minsize: 2560
```


----------



## gpw928 (Dec 2, 2011)

Hi,

I concur with wblock.

You can optimise the search be excluding the directories you know are not of interest.

e.g. provided all directory names are unique, to find a file named f1 not underneath a directory named d2 or d3:


```
mkdir d1 d2 d3
touch d1/f1 d2/f1 d3/f1
ls -lad d?/*
-rw-r--r--  1 phil  wheel  0 Dec  2 14:33 d1/f1
-rw-r--r--  1 phil  wheel  0 Dec  2 14:33 d2/f1
-rw-r--r--  1 phil  wheel  0 Dec  2 14:33 d3/f1
find . -name d2 -prune -o -name d3 -prune -o -name f1 -print
./d1/f1
```

Cheers,

-- 
Phil


----------

