# find replace question -- kind of



## kclark (Oct 25, 2012)

Weird question.  I have a file with several thousand lines in it.  Each line is a path to a file.  Is there a simple way to remove all lines that don't include "/a/b/c/" in them.

After I do this I want to compare a.txt against b.txt and output only the lines that are not in both files.


----------



## UNIXgod (Oct 25, 2012)

Not that weird. If you need to edit in place you can use ed() or ex(). If you want to script it use sed(). Even using vi() can aid if a visual editor is needed.

Read the respective man pages to find out how to invert your search with regular expression.

To compare diff() will help you here.


----------



## gpw928 (Oct 25, 2012)

Hi,

I suggest you abandon that screen editor, and apply yourself to ed(1) for every editing session during the next month.

Your slow initial progress will be rewarded.  You may even join the club for what Ritchie and Thompson described as "salvation through suffering".

When your stint is finished, you'll be an expert in ed(1), sed(1), and regular expressions, not to mention the ":" operator in vi(1).  You'll also discover that the arrow keys are for monkeys.

Failing that:


```
sed -e 's;^/a/b/c;' <file1 >file1a
```
The caret ("^") anchors to the left, so leave it out if the "/a/b/c/" is not at the start of the line.

Cheers,


----------



## jb_fvwm2 (Oct 25, 2012)

I'm not entirely sure, but 

```
cat a.txt a.txt b.txt | sort | uniq -u # if -u is correct, sort correct ...
```
is a trick I stumbled upon a while back.  Unsure if it is an answer to the latter part of the first post in this question, no time to re-test.  But I used it extensively... to maybe show lines in b.txt that exist but not in a.txt.


----------



## jalla (Oct 25, 2012)

kclark said:
			
		

> Weird question.  I have a file with several thousand lines in it.  Each line is a path to a file.  Is there a simple way to remove all lines that don't include "/a/b/c/" in them.



No need to muck around with ed, sed, etc.


```
grep '/a/b/c' originalfile > otherfile
```


----------



## fluca1978 (Oct 25, 2012)

jalla said:
			
		

> No need to muck around with ed, sed, etc.
> 
> 
> ```
> ...



Should be:


```
grep -v '/a/b/c' originalfile > otherfile
```

to get the lines that *do not* contains _/a/b/c_ as asked in the original post.

Anyway, this is of course possible with pretty much any text editor available on Unix (Emacs for instance), but the command line is usually the right and most automated way of doing such text manipulation.
For more complex text manipulation Perl can come in hand.


----------



## jalla (Oct 25, 2012)

fluca1978 said:
			
		

> Should be:
> 
> 
> ```
> ...



You may want to read the original post again


----------



## fluca1978 (Oct 25, 2012)

jalla said:
			
		

> You may want to read the original post again



Ops...you are right. Sorry.


----------



## PugTsurani (Oct 27, 2012)

Use comm(1) to show lines in one file but not the other. This command will output two tab-separated columns: lines only in a.txt and lines only in b.txt. Column 3 contains lines in both files and is suppressed by the *-3* option. I have a hard time remembering it's subtractive, not additive. The only requirement is that both files need to be sorted.
[cmd=""]comm -3 a.txt b.txt[/cmd]

shells/bash's process substitution can be used to combine *grep* with *comm* in a single command. The first Google hit for "bash process substitution" contains an example using *comm*: http://tldp.org/LDP/abs/html/process-sub.html. Remember, both inputs need to be sorted.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt)[/cmd]

Since *comm* uses a tab to separate the two columns representing each file, *sed* can be used to remove the leading and trailing tab output for file b.txt and a.txt respectively. Use ctrl-v <tab> to enter a literal tab in the command line. The space in the first command is actually a tab, shown as TAB in the second command.
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^	//g; s/	$//g'[/cmd]
[cmd=""]comm -3 <(grep '/a/b/c' a.txt | sort) <(sort b.txt) | sed -e 's/^TAB//g; s/TAB$//g'[/cmd]

Cheers!


----------



## UNIXgod (Oct 28, 2012)

PugTsurani said:
			
		

> Use comm(1) to show lines in one file but not the other. This command will output two tab-separated columns: lines only in a.txt and lines only in b.txt. Column 3 contains lines in both files and is suppressed by the *-3* option. I have a hard time remembering it's subtractive, not additive. The only requirement is that both files need to be sorted.
> [cmd=""]comm -3 a.txt b.txt[/cmd]
> 
> shells/bash's process substitution can be used to combine *grep* with *comm* in a single command. The first Google hit for "bash process substitution" contains an example using *comm*: http://tldp.org/LDP/abs/html/process-sub.html. Remember, both inputs need to be sorted.
> ...



The OP had PMed me and explained he was creating a port for his work. In this case he wouldn't have access to using bashisms. Though this is actually not explained in the post( which should have been ). It's very nice of you to sign up here to help a fellow user. Welcome to the FreeBSD forums!

There is a compare and contrast to your suggestion above at this link:

http://mywiki.wooledge.org/ProcessSubstitution

The example used is this syntax in bash:

```
diff <(sort list1) <(sort list2)
```
would be this in sh:

```
mkfifo /var/tmp/fifo1
mkfifo /var/tmp/fifo2
sort list1 >/var/tmp/fifo1 &
sort list2 >/var/tmp/fifo2 &
diff /var/tmp/fifo1 /var/tmp/fifo2
rm /var/tmp/fifo1 /var/tmp/fifo2
```

Though the second version is more verbose it is portable with all bourne derived shells:



> Process substitution is definitely not portable. You may use NamedPipes to accomplish the same things.


----------

