# need help with sed and regexps



## edhunter (Oct 16, 2009)

Hello guys
I need help with sed and regular expressions.
I have an input file containing text with html formatting.
I have to import this file into another program that respects only </br> tag.
I need to clean all html tags except 
 and variations of it, before importing this file.
All kind of br-s have to become </br>.

something like that:
1. 
,
,</br> ...  =>  </br>
2. "<whatever tag withot </br> >"  =>  ""

How could i do it using sed?


----------



## dennylin93 (Oct 16, 2009)

`$ sed 's/
/<\/br>/g'` should turn 
 into 
. All the other changes should work with similar variations.


----------



## Zare (Oct 16, 2009)

```
sed 's@<\([^
][^<>]*\)>\([^<>]*\)</\1>@\2@g'
```

Pipe the line into this and it should strip off all HTML tags, the content between the tags will remain intact, and 
 tags will remain too.

P.S.
Up The Irons!


----------



## edhunter (Oct 19, 2009)

10x \m/
but it didnt work
here is sample file:

```
line1<tag1>alabala
blabla</tag2>
line2<tag>blabla
<tag3>text<tag4>blabla


</br>
</br>
< br />
```

here is sed output:
	
	



```
sed 's@<\([^
][^<>]*\)>\([^<>]*\)</\1>@\2@g' test.txt
line1<tag1>alabala
blabla</tag2>
line2<tag>blabla
<tag3>text<tag4>blabla


</br>
</br>
< br />
```


I did what i want with 3 seds.

```
sed -e "s:<[^<>]*br[^<>]*>:uniqstring123:g" Export.TXT > out1.txt
sed -e "s:<[^<>]*>::g" out1.txt > out2.txt
sed -e "s:uniqstring123:</br>:g" out2.txt > FINAL.TXT
```

but my way seems very lame... thats why i need another solution


----------

