# Script to retrieve data from web page



## balanga (Jan 12, 2020)

Any suggestions as to how to go about retrieving data - specifically a single string - from a web page...

I'm thinking of using www/lynx to retrive a page and then try to parse the result, but am not sure if lynx is capable of being scripted.  
Any advice welcome....


----------



## kpedersen (Jan 12, 2020)

You could consider fetch(1).
It is in FreeBSD base and allows you to download a single web page to a file or stdout. Then you could grep/sed for the string?


----------



## drhowarddrfine (Jan 12, 2020)

I see a lot of people using www/py-scrapy if you mean to scrape web sites.


----------



## msplsh (Jan 21, 2020)

Use something with a libxml binding.  I use PHP.


----------



## SirDice (Jan 21, 2020)

Use Perl, Python, Ruby, LUA, whatever. Almost all scripting languages have something for this. My personal favorite is still www/p5-libwww (yes, I'm still a Perl monger).


----------



## trev (Jan 22, 2020)

Lynx or Wget (especially useful for reloading needed cookies for some websites) in a Bourne shell script with grep and sed. (I've done it hundreds of times.)


----------



## msplsh (Jan 22, 2020)

So...

1. Get the HTML using

lynx
wget
perl & p5-libwww
curl
python & scrapy
python & something way simpler like the requests library
php & libcurl
fetch
2. Then parse the HTML for the string using

php & libxml
python & lxml (via libxml output that scrapy vends)
python & beautifulsoup
REXX & whatever it uses
grep (DON'T do this)
sed (DON'T do this)
I wrote a python script for tweets that uses requests and BeautifulSoup.  Just need two packages

`pkg install py36-requests py36-beautifulsoup`


----------

