# strange grep greedy behaviour



## fedya (May 6, 2011)

Hi All,

How would you explain this:


```
# echo aabb1ccdg1hsfsdf | grep -o "^[^1]*1"
aabb1
ccdg1
```

Also:

```
# echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
[color="Red"]aabb1ccdg1[/color]hsfsdf
```

It seems wrong to me, regexp strikn selection should stop at first "1".  

What do you think? Is it a bug?

--fedya


----------



## SirDice (May 6, 2011)

The * is always 'greedy', it will parse the string from the back to the front.


----------



## fedya (May 6, 2011)

Look, awk gives us different result, which seems to me correct, splitting happens at first "1":


```
# echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $2}'
ccdg1hsfsdf
```


----------



## fedya (May 6, 2011)

SirDice: but how greedy [^1]* selects string, which contains "1"?


----------



## SirDice (May 6, 2011)

Oh, doh... Hehe.. It's actually simpler. You are correct, it should match the first 1.

But.. Your example matches twice:

```
echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
```
Matches both aabb1 and ccdg1 but since they're both on the same line it'll look like it matched aabb1ccdg1. So it's not 1 match but 2. As shown with the -o option.


----------



## fedya (May 6, 2011)

Thanks, SirDice, you're absolutely right. 

But then... FreeBSD's awk is wrong!  

Actually I discovered this behaviour on centos gawk, when I used similar regex as a word separator, and then results seemed wrong to me, but the classic awk seemed right.  Now it looks that the opposite is true:

FreeBSD awk:


```
echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $1 "==" $2 "==" $3}'
==ccdg1hsfsdf==
```

gawk (on CentOS):

```
# echo aabb1ccdg1hsfsdf | awk -F "^[^1]*1" '{print $1 "==" $2 "==" $3}'
====hsfsdf
```

But this probably deserves a separate thread.

--fedya


----------



## fedya (May 6, 2011)

After some consideration, now again I think, that GNU grep and gawk are wrong is this case, but classic awk is right.  See:



			
				SirDice said:
			
		

> Your example matches twice:
> 
> ```
> echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
> ...



Yes, the regexp ^[^1]*1 matches "aabb1", but no, it does not match "ccdb1", because there is a ^ beginning-of-the-line anchor. The anchor makes it impossible to make multiple matches inside the string, as I understand.

So my original question is still about GNU grep sanity is still valid.


----------



## Alt (May 6, 2011)

SirDice said:
			
		

> But.. Your example matches twice:
> 
> ```
> echo aabb1ccdg1hsfsdf | grep --color "^[^1]*1"
> ...



It's not true, as we can check this with this:


```
> echo aabb1[color="Red"]2[/color]ccdg1hsfsdf | grep --color "^[^1[color="red"]2[/color]]*1"
[color="red"]aabb1[/color]2ccdg1hsfsdf
```

It does not includes ccdg1. It looks like a grep bug... or not?


----------



## SirDice (May 7, 2011)

Yep, you are both right, it should only match "aabb1".

This stuff is tricky and I still trip on it after dealing with them for years.
It's no wonder you can write entire books on the subject of regex :e


----------

