# How do you view a webpage without a browser in a console?



## a6h (Nov 21, 2021)

How do you view a webpage without a browser in a console?


----------



## eternal_noob (Nov 21, 2021)

How to you fly to the next continent without using a plane?


----------



## hardworkingnewbie (Nov 21, 2021)

What's the purpose of that question? First of all there are enough browsers working in console, like Lynx, W3M and others. Lynx also has --dump.

Second you could just use your programming language of choice with an HTML parsing library of your choice, request the page, parse it and save its context as plain text. 

Either way is no rocket science, and there are enough premade scripts around for that.


----------



## Crivens (Nov 21, 2021)

eternal_noob said:


> How to you fly to the next continent without using a plane?


A really strong catapult may get you from europe to africa. Other than that, falling down the right mountain in the urals may land you in asia, or tripping on the right border crossing in central america may send you airborn to the next continent. See? Easy. And yes, that last clown I ate was over its best-before date.


----------



## eternal_noob (Nov 21, 2021)

`man telnet`


> falling down the right mountain in the urals may help you view a webpage without using the proper tool (browser).


----------



## George (Nov 21, 2021)

True IT-masochists use elinks/lynx/w3m.


----------



## eternal_noob (Nov 21, 2021)

Way better than telnet.


----------



## Alain De Vos (Nov 21, 2021)

wget,lynx,elinks


----------



## drhowarddrfine (Nov 21, 2021)

I don't. Unless you mean view the markup. And then I still don't.


----------



## covacat (Nov 21, 2021)

```
$fetch -o - https://www.freebsd.org|xmllint --xpath '//*/text()' --html - 2>/dev/null| grep .|more
```


----------



## Alain De Vos (Nov 21, 2021)

This trick does not work on random sites like,

```
https://www.theguardian.com/us-news
```
I think many pages are "too complex"


----------



## ralphbsz (Nov 22, 2021)

Used to use wget, changed to curl. But as Alain said, in practice modern web pages are no longer human-readable and don't consist of a single HTTP download.


----------



## a6h (Nov 22, 2021)

hardworkingnewbie said:


> Either way is no rocket science


As the title of the poll, i.e. "without a browser".

e.g. you need few hints from a handbook/faq/howto, enough to set some options right, in order to config a intro(4)/man(4) device correctly.


----------



## richardtoohey2 (Nov 22, 2021)

I'm with covacat - just use fetch - it's built-in, nothing to install.  That's usually enough to see what's in the page.


----------



## hardworkingnewbie (Nov 22, 2021)

vigole said:


> As the title of the poll, i.e. "without a browser".
> 
> e.g. you need few hints from a handbook/faq/howto, enough to set some options right, in order to config a intro(4)/man(4) device correctly.


If you would have read my whole post - which you did not - you would have also known that you could just use your language of choice with pre-bundled HTML parser.


----------



## drhowarddrfine (Nov 22, 2021)

ralphbsz said:


> Used to use wget, changed to curl.



But how do you display the pages using those? I'm not talking "with additional tools" like the fetch example earlier.


----------



## ralphbsz (Nov 22, 2021)

drhowarddrfine said:


> But how do you display the pages using those? I'm not talking "with additional tools" like the fetch example earlier.


Emacs, more, ... but for modern complex web pages, that's impractical. For simple stuff, it works great.
Often I actually use it to download pdf files or images.


----------



## a6h (Nov 22, 2021)

drhowarddrfine said:


> But how do you display the pages using those?


For simple operations works. For example, I somtetimes use it to grep(1) out the ftp mirrors from the FreeBSD Handbook | A.2. FTP Sites


----------



## zirias@ (Nov 22, 2021)

`fetch -o- <url> | less`


----------



## D-FENS (Nov 22, 2021)

George said:


> True IT-masochists use elinks/lynx/w3m.


nc 
lynx is a fine piece of software. The problem is that almost all websites today are infected with Javascript and they simply do not work.


----------



## gpw928 (Nov 22, 2021)

Currently, I don't (but have in the past):
	
	



```
[strand.312] $ grep "text/html;" ~/.mailcap
#text/html; w3m -I %{charset} -T text/html; copiousoutput;
#text/html;lynx -dump %s; nametemplate=%s.html; copiousoutput
#text/html;firefox %s; nametemplate=%s.html
#text/html;elinks -dump %s; nametemplate=%s.html; copiousoutput
#@#Sun Jan 19 14:57:59 AEDT 2020#text/html;chrome %s; nametemplate=%s.html
#@#text/html;iridium %s; nametemplate=%s.html
#@#text/html;w3m -cols 72 -I %{charset} -T text/html -s | sed -e 's:^[[:blank:]]*$::' | cat -s | less; copiousoutput
#@@text/html; lynx -dump -force_html -stdin | sed -e 's:^[[:blank:]]*$::' | less -s
#@#text/html;luakit %s >/dev/null 2>&1; nametemplate=%s.html
#@#text/html;midori %s >/dev/null 2>&1; nametemplate=%s.html
text/html;firefox %s >/dev/null 2>&1; nametemplate=%s.html
```


----------



## drhowarddrfine (Nov 22, 2021)

Zirias said:


> `fetch -o- <url> | less`


No, that only fetches the markup but doesn't display the page. That's also not on the list.


----------



## Alain De Vos (Nov 22, 2021)

When i as a European look at U.S. pages i first need to agree on the applicable law, before i can even see the first page. This is interactive ...
Not all internet pages are as simple as freshports .


----------



## zirias@ (Nov 23, 2021)

drhowarddrfine said:


> No, that only fetches the markup but doesn't display the page. That's also not on the list.


The markup is just a representation of the document, so sure that's a display. As curl and telnet are on the list, I'm pretty sure that's fine.


----------



## drhowarddrfine (Nov 23, 2021)

Zirias No. He asks "How do you view a webpage..." Looking at or downloading the markup using curl or telnet is not viewing a web page. I take that to mean NOT just wanting to look at the source markup.


----------



## hardworkingnewbie (Nov 23, 2021)

Aaron Swartz wrote THE ASCIINATOR for it: http://www.aaronsw.com/2002/html2text/

And there are many scripts around like this one.


----------



## SirDice (Nov 23, 2021)

If I recall correctly you're fond of Perl, www/p5-libwww is useful. Besides adding some useful Perl modules it also comes with a couple of command line utilities GET(1) and HEAD(1) for example.


----------



## a6h (Nov 23, 2021)

drhowarddrfine said:


> Zirias No. He asks "How do you view a webpage..." Looking at or downloading the markup using curl or telnet is not viewing a web page. I take that to mean NOT just wanting to look at the source markup.


I think I should said "read" or "fetch" instead of "view".



SirDice said:


> If I recall correctly you're fond of Perl, www/p5-libwww is useful. Besides adding some useful Perl modules it also comes with a couple of command line utilities GET(1) and HEAD(1) for example.



If perl would be an option then LWP::Simple module from the libwww-perl-6.58 works great.


----------



## zirias@ (Nov 23, 2021)

drhowarddrfine said:


> No.


No.

Please look up the definition of a "representation". In a nutshell, a representation is a format in which the data is presented or transported. There are human-readable ones and non human-readable ones. HTML clearly belongs to the first category (although you can obfuscate it like crazy...)


----------



## drhowarddrfine (Nov 23, 2021)

Zirias Please look up the definition of "view" which is what he asked for. However, now he says he meant "fetch" and not "view".

One views a web page through a browser or software that interprets the supplied markup. Few have any reason to look at that markup by downloading it.


----------



## zirias@ (Nov 23, 2021)

"View" does not imply a specific representation.

What was meant here was easy to deduce from the given options.


----------



## SirDice (Nov 23, 2021)

I've made a few 'web scrapers' for work. Needed to download some specific software, and it wasn't available in a 'regular' repository. So I had to scan the web pages for a specific link to a downloadable file. As long as nothing major changes on that particular page the downloader does what it's supposed to do. Used a fairly basic shell script for that to wget(1) the page, parse it somewhat with grep and then fire off another wget(1) to download the latest version of that software. 

Now I've used wget(1) in that case because that's what was available to me. On FreeBSD I would probably just use fetch(1) for this.


----------



## a6h (Nov 23, 2021)

drhowarddrfine and Zirias:
I think I'm the main cause of this mess here. Sorry about that.


----------



## drhowarddrfine (Nov 23, 2021)

YOU are the cause of ALL our problems vigole !


----------



## a6h (Nov 24, 2021)

drhowarddrfine said:


> YOU are the cause of ALL our problems vigole !


Indeed.


----------



## Hakaba (Nov 24, 2021)

To get the content with JavaScript inside a page, you could use node.
To inspect the content, you can use a test library like mocha. Export what you find into a readable format, you can use Babel and Istanbul (nyc, you see the logic ?)

After that you have more libs, dependencies and unknown code than the crappiest browser and you still have no clear view of the webpage...


----------



## MeowMan (Dec 5, 2021)

Alain De Vos said:


> When i as a European look at U.S. pages i first need to agree on the applicable law, before i can even see the first page. This is interactive ...
> Not all internet pages are as simple as freshports .


Same for me. And I do not know why... I mean, I can understand situation when the viewer discretion is advised and there is a question "are you 18?" However, what about situations when I just want to watch another season of Chernobyl on HBO and no, I am not an old Soviet Spy 

God damn it, I think sometimes we have a lot of things to agree with. 

P.S. I am from AU


----------



## drhowarddrfine (Dec 6, 2021)

Hakaba said:


> To get the content with JavaScript inside a page, you could use node.


You can just download it or inspect it and download it with the browser's inspector.


----------



## Hakaba (Dec 6, 2021)

drhowarddrfine said:


> You can just download it or inspect it and download it with the browser's inspector.


I mean to see the result of the JavaScript execution, not the file.
And to have an inspector, you need a browser.


----------



## drhowarddrfine (Dec 7, 2021)

Hakaba Same thing. The result of javascript execution is typically displayed on a web browser. Node is not going to do that for you in place of a browser.


----------



## trev (Dec 7, 2021)

SirDice said:


> I've made a few 'web scrapers' for work. Needed to download some specific software, and it wasn't available in a 'regular' repository. So I had to scan the web pages for a specific link to a downloadable file. As long as nothing major changes on that particular page the downloader does what it's supposed to do. Used a fairly basic shell script for that to wget(1) the page, parse it somewhat with grep and then fire off another wget(1) to download the latest version of that software.
> 
> Now I've used wget(1) in that case because that's what was available to me. On FreeBSD I would probably just use fetch(1) for this.


I too have needed to make "web scrapers" for work and used a combination of wget(1), fetch(1), lynx and w3m. From memory (it was  few years ago since I've retired) wget was preferred over fetch when I needed to "save state" so as to be able to retrieve images from some pages.


----------



## astyle (Dec 8, 2021)

Alain De Vos said:


> When i as a European look at U.S. pages i first need to agree on the applicable law, before i can even see the first page. This is interactive ...
> Not all internet pages are as simple as freshports .


That never happened to me. I was able to view espn.com (a Las Vegas-based site, BTW, even with offices in Connecticut (East Coast US)) just fine. Well, that info is from 2005, which is when I was in EU last time. REALLY need to go back at some point, but there's a LOT of ducks to get in a row for that to happen.


----------

