# Get specified section/table/content from file using sed/awk/perl, etc.



## kenorb (Dec 6, 2010)

Each file has the same structure as follows:


```
<html>
<head></head>
<body><p><table>My table here!</table></p>
</body>
</html>
```

I'm looking to dump only the section between <table> and </table> (including those tags).
I've spend a little while to find some solution, but still isn't clear for me, what's the easiest way to achieve that.


Tried following solutions:
http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/
http://www.unix.com/shell-programming-scripting/147347-how-get-one-particular-section-using-awk.html
http://www.unix.com/shell-programming-scripting/66251-remove-html-tags-bash.html
http://www.unix.com/shell-programming-scripting/58479-multiple-line-match-using-sed.html


A good start:

```
lynx --base --source http://ai-contest.com/rankings.php | less "+/table"
```


```
sed -n '1h;1!H;${;g;s/<h2.*/No title here/g;p;}' sample.php
```


```
perl -0777 -pe 's/\A[^\{]*\{//s; s/\}.*?\{/\n/sg; s/\}[^\}]*\Z//s'
```
http://www.grymoire.com/Unix/Sed.html#uh-47


----------



## wblock@ (Dec 6, 2010)

kenorb said:
			
		

> ```
> perl -0777 -pe 's/\A[^\{]*\{//s; s/\}.*?\{/\n/sg; s/\}[^\}]*\Z//s'
> ```
> http://www.grymoire.com/Unix/Sed.html#uh-47



Aaah!  My eyes!


```
perl -0777 -ne 'print $1 if /(<table>.*<\/table>)/' myfile.html
```

But properly parsing HTML is done with Perl modules, not raw regexes.


----------



## qsecofr (Dec 6, 2010)

A perl solution might include /usr/ports/www/p5-HTML-TableExtract.  Or search ports on "p5-HTML-Table" keyword..


----------



## kenorb (Dec 14, 2010)

wblock:
Thank you for the great example, It looks very simple, I like simple solutions, but even it's, something it's missing.
Tried this command, empty result.
Tried:

```
perl -0777 -ne 'print $1' *
```
Empty output.

```
> echo test | perl -0777 -ne 'print \$1'
SCALAR(0x80123fde0)>
```
What I'm missing?


----------



## wblock@ (Dec 14, 2010)

kenorb said:
			
		

> wblock:
> Thank you for the great example, It looks very simple, I like simple solutions, but even it's, something it's missing.
> Tried this command, empty result.
> Tried:
> ...



The entire regex, for a start.  A regex match to fill in $1.
`% man perlre | less +/Capture`
The "if" is also important.


----------

