# grepping a grep result file using a wildcard is rubbish



## kenorb (Dec 13, 2010)

```
> ll | wc -l
     169
> grep "download" * | grep href > list.txt
load: 1.27  cmd: grep 9946 [wdrain] 681.27r 156.45u 104.06s 37% 1120k
load: 1.25  cmd: grep 9946 [wdrain] 686.78r 157.55u 104.84s 34% 1120k
load: 1.15  cmd: grep 9946 [wdrain] 693.05r 158.74u 105.69s 33% 1120k
^C
> time grep "download" * > list.txt
^C44.297u 32.942s 3:08.01 41.0%	107+1515k 220+106081io 0pf+0w
```
Already spent 15minutes to grep 169 text files (around 30k each) by one word, then cancelled to check what's going on, already tried 3-4 times, during this time I can't use my Desktop, because all 4 cores are almost 100%, WTF?!

How to install GNU grep?
UPDATE: I found it.

```
> sudo portinstall gnugrep
```


----------



## kenorb (Dec 13, 2010)

See:
http://www.mail-archive.com/freebsd-current@freebsd.org/msg124281.html
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Looks like it's more than 5 times slower?;/
OMG


----------



## kenorb (Dec 13, 2010)

```
> time grep "download" * > list.txt
/usr/local/bin/grep: writing output: No space left on device
/usr/local/bin/grep: writing output: No space left on device
/usr/local/bin/grep: write error
115.987u 121.518s 10:25.48 37.9%	216+1418k 1641+393603io 1pf+0w
-rw-r--r--  1 kenorb  kenorb    48G Dec 13 15:12 list.txt
```
48G????
Ah, I forgot, it's BSD! Star is alias for your whole drive, wherever you are. Very intuitive.


----------



## wblock@ (Dec 13, 2010)

kenorb said:
			
		

> ```
> > ll | wc -l
> 169
> > grep "download" * | grep href > list.txt
> ...



That's ridiculously slow, and obviously broken.  Running that grep sequence on /usr/src here (538M) takes only a few seconds the first time, and even less when the files are in cache.  wdrain implies something else is wrong.  Post an archive of the files and the exact commands used, and I'll test it on my machine.


----------



## kenorb (Dec 13, 2010)

I don't know how, but this works fine:

```
> grep -R "download" * | cat > list.txt
```
Without cat's pipe, by default FreeBSD assuming that I want to grep my whole drive, even if I'm in my folder with 129 files?

*OR*
It's a big loop by grep'ing the file which it appending the matches.


----------



## wblock@ (Dec 13, 2010)

kenorb said:
			
		

> ```
> > time grep "download" * > list.txt
> /usr/local/bin/grep: writing output: No space left on device
> /usr/local/bin/grep: writing output: No space left on device
> ...



No.
`% man -P'less +5/"Filename substitution"' csh`

But it sounds like you've proven that grep wasn't at fault.


----------



## wblock@ (Dec 13, 2010)

kenorb said:
			
		

> I don't know how, but this works fine:
> 
> ```
> > grep -R "download" * | cat > list.txt
> ...



Your grep command is changing (but that -R was in the original problem, wasn't it?).  -R (or -r) is a recursive grep.  * expands to every file *and* directory in the current directory, and grep searches them all recursively.


----------



## kenorb (Dec 13, 2010)

SUCCESS TEST ON PLAIN FILES:

```
perl -e '$i = 0; while($i++ < 100) { system("echo xx test xx > file$i.txt"); }'
> grep test * > zz.txt
> grep test * > zz.txt
> grep test * > zz.txt
> grep test * > zz.txt
```
No any problems.

FAIL TEST parsing html files:

```
> perl -e '$i = 1; while($i++ < 5) { system("wget -nc \"http://ai-contest.com/rankings.php?page=$i\""); }'
> grep "td" * > list.txt
# WORKS
> grep "td" * > list.txt
# WORKS
> grep "td" * > zz.txt
# BIG FREEZE UNTIL YOU RUN OF SPACE!
load: 0.66  cmd: grep 39619 [biord] 68.52r 34.09u 15.73s 72% 1156k
load: 0.74  cmd: grep 39619 [running] 78.90r 39.41u 18.58s 80% 1156k
load: 0.96  cmd: grep 39619 [running] 118.59r 60.64u 28.60s 82% 1156k
load: 0.75  cmd: grep 39619 [running] 267.90r 122.22u 62.36s 53% 1156k
```
For sure there is a bug.
I don't know what's the difference between list.txt and zz.txt, but on zz.txt it always freezing, on list.txt it doesn't;/
It freezing always when you use the name as last file in alphabetical order.
It does work when you grep "table", but doesn't when you grep "td". Crazy!


----------



## kenorb (Dec 13, 2010)

wblock said:
			
		

> Your grep command is changing (but that -R was in the original problem, wasn't it?).  -R (or -r) is a recursive grep.  * expands to every file *and* directory in the current directory, and grep searches them all recursively.



I tried -R only once, the rest examples are without -R.


----------



## wblock@ (Dec 13, 2010)

kenorb said:
			
		

> FAIL TEST parsing html files:
> 
> ```
> > perl -e '$i = 1; while($i++ < 5) { system("wget -nc \"http://ai-contest.com/rankings.php?page=$i\""); }'
> ...



No problem here.  Create an empty directory, put just those files in it, and try again.


----------



## kenorb (Dec 13, 2010)

Trying to debug the grep, giving the weird stuff:

```
39668: read(3,"xt:zz.txt:zz.txt:zz.txt:zz.txt:z"...,24576) = 24576 (0x6000)
39668: write(1,"zz.txt:zz.txt:zz.txt:zz.txt:zz.t"...,16384) = 16384 (0x4000)
39668: read(3,"t:zz.txt:zz.txt:zz.txt:zz.txt:zz"...,24576) = 24576 (0x6000)
39668: write(1,":zz.txt:zz.txt:zz.txt:zz.txt:zz."...,16384) = 16384 (0x4000)
39668: write(1,"z.txt:zz.txt:zz.txt:zz.txt:zz.tx"...,16384) = 16384 (0x4000)
39668: read(3,"z.txt:zz.txt:zz.txt:zz.txt:zz.tx"...,24576) = 24576 (0x6000)
39668: write(1,"txt:zz.txt:zz.txt:zz.txt:zz.txt:"...,16384) = 16384 (0x4000)
39668: read(3,"t:zz.txt:zz.txt:zz.txt:zz.txt:zz"...,24576) = 24576 (0x6000)
39668: write(1,":zz.txt:zz.txt:zz.txt:zz.txt:zz."...,16384) = 16384 (0x4000)
39668: write(1,"z.txt:zz.txt:zz.txt:zz.txt:zz.tx"...,16384) = 16384 (0x4000)
39668: read(3,":zz.txt:zz.txt:zz.txt:zz.txt:zz."...,24576) = 24576 (0x6000)
39668: write(1,".txt:zz.txt:zz.txt:zz.txt:zz.txt"...,16384) = 16384 (0x4000)
39668: read(3,"xt:zz.txt:zz.txt:zz.txt:zz.txt:z"...,24576) = 24576 (0x6000)
39668: write(1,"xt:zz.txt:zz.txt:zz.txt:zz.txt:z"...,16384) = 16384 (0x4000)
```
For sure it's a bug with loop.

This one:
http://savannah.gnu.org/bugs/?17457
After 4 years of reporting somebody decided that it can't be fixed, LOL!


----------



## wblock@ (Dec 13, 2010)

Huh.  So GNU grep at least does that, where the output file is read as input.  I thought it might be that, but couldn't duplicate it.  This is more a bug of expectations than anything else.  You see the way to avoid this, right?  Oh, and are you going to change the title of the thread to something more accurate?


----------



## kenorb (Dec 13, 2010)

Reported the bug here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=153124


```
> mkdir test3 && cd test3
> perl -e '$i = 1; while($i++ < 5) { system("wget -qnc \"http://ai-contest.com/rankings.php?page=$i\""); }'
> time grep "td" * > zz.txt
^T
load: 0.63  cmd: grep 39810 [wdrain] 28.95r 15.76u 6.99s 69% 1176k
load: 0.66  cmd: grep 39810 [running] 35.93r 18.80u 8.63s 68% 1176k
load: 0.66  cmd: grep 39810 [wdrain] 39.71r 20.64u 9.51s 73% 1176k
load: 0.68  cmd: grep 39810 [running] 43.32r 22.53u 10.36s 72% 1176k
```
Freeze.

On another console:

```
> truss -fp `pidof grep`

39810: write(1,".txt:zz.txt:zz.txt:zz.txt:zz.txt"...,16384) = 16384 (0x4000)
39810: write(1,"t:zz.txt:zz.txt:zz.txt:zz.txt:zz"...,16384) = 16384 (0x4000)
39810: read(3,".txt:zz.txt:zz.txt:zz.txt:zz.txt"...,28672) = 28672 (0x7000)
39810: write(1,".txt:zz.txt:zz.txt:zz.txt:zz.txt"...,16384) = 16384 (0x4000)
39810: read(3,"t:zz.txt:zz.txt:zz.txt:zz.txt:zz"...,24576) = 24576 (0x6000)
39810: write(1,"txt:zz.t^C(0x7000)
^C^C^C^C^C^C^C^C^C^Z
Suspended
> sudo killall -9 truss
```


```
> grep --version
grep (GNU grep) 2.5.1-FreeBSD
> uname -a
FreeBSD kenorb 8.1-STABLE FreeBSD 8.1-STABLE #4: Mon Nov 15 14:40:15 GMT 2010     root@kenorb:/usr/obj/usr/src/sys/BRO  amd64
```


----------



## wblock@ (Dec 13, 2010)

Put the output file in another directory--not a subdir if you're using *-r*--so that it isn't read as input, then written as output, then read as input, then written as output, then read as input, then written as output...


----------



## DutchDaemon (Dec 13, 2010)

The thread title should now look more informed.


----------



## phoenix (Dec 13, 2010)

Redirection occurs before shell expansion.  Thus, your command is creating the list.txt file *first*, then it is expanding *** to include all the files in the current directory *including* your output file.

I'm guessing, list.txt is listed alphabetically before any files that match the search string, thus it's empty when grep gets to it, so there's no problem.  zz.txt will be listed alphabetically at the end of the list of files, so grep will have written a bunch of lines to it already.  Once grep opens it for reading, you get into a loop, since every line matches, so every line is written out to the file, and grep never reaches the end of the file.

This is not a grep issue.  It's an "*I've written a stupid command that does exactly what I tell it to, but that's not what I want, therefore it's a bug*" error.  More commonly known as *PEBKAC*.

Re-do your command so that the output file is not in the same directory as your input files, or use a more restrictive wildcard search than just ***, or any number of other things that will avoid this issue.

Reading the man page for you shell of choice would also be helpful.  This is covered in there.

Oh, and you can close your PR.  It's not a bug in grep.


----------



## DutchDaemon (Dec 14, 2010)

And I'm closing this thread, because it *is* a bug. And rubbish


----------

