# Firefox is getting more intelligent (than us)



## hruodr (Sep 18, 2021)

See:






						Text Encoding no longer available in the Firefox menu panel | Firefox Help
					

The Firefox menu panel no longer has a Text Encoding submenu. It is still available in the Menu bar View menu and as an optional toolbar button.




					support.mozilla.org
				




Try to get the right encoding of:



			Kerberos: An Authentication Service for Computer Networks


----------



## Deleted member 30996 (Sep 18, 2021)

I know if you use the right-click option to "View Page Source" in Firefox-ESR it will highlight in red XHTML errors I've made the W3C validator points out that I can't readily see by glancing over the page of markup.


----------



## eternal_noob (Sep 18, 2021)

"I'm sorry, Dave. I'm afraid I can't do that."


----------



## memreflect (Sep 19, 2021)

Well, that's certainly interesting.  Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding".  In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.


----------



## Geezer (Sep 19, 2021)

_View: https://www.youtube.com/watch?v=97b6FfQbibM_


----------



## grahamperrin@ (Sep 19, 2021)

hruodr said:


> Try to get the right encoding of:
> 
> Kerberos: An Authentication Service for Computer Networks



Treated as UTF-8:




There's the menu option – *Repair Text Encoding* – however I don't expect it to be a panacea in cases such as this:







… _used to replace an incoming character whose value is unknown or unrepresentable in Unicode_ …



memreflect said:


> … the right option would have been to retain the menu …



Which encoding would you have chosen for <http://gost.isi.edu/publications/kerberos-neuman-tso.html>?


----------



## grahamperrin@ (Sep 19, 2021)

Via The Text Encoding Submenu Is Gone (2021-08-24):

chardetng: A More Compact Character Encoding Detector for the Legacy Web (2020-06-08)
For example, <https://www.cs.cornell.edu/courses/cs614/1999sp/notes99/Kerberos.html>:

appears wrong in Chromium
appears wrong in Falkon
appears wrong in Firefox ESR
appears OK following repair by Firefox 92 ☑


----------



## memreflect (Sep 19, 2021)

grahamperrin said:


> Which encoding would you have chosen for <http://gost.isi.edu/publications/kerberos-neuman-tso.html>?


ISO-8859-1, ISO-8859-15, or Windows-1252.  The text is in English, and many HTML pages written in English were published in one of those three encodings prior to the ubiquity of UTF-8 from what I've experienced.  On my system, Firefox's "Repair Text Encoding" happened to choose ISO-8859-2 instead, rendering © as Š in the Copyright line.  That's why I feel the menu should have been kept—in case Firefox guesses incorrectly.  On the other hand, if it works for 95% of pages, and newer pages/servers declare the character encoding, then I could see why the menu might have been removed, so perhaps those pages with no character encoding should simply be considered incompatible with the modern web.  After all, there ain't no such thing as plain text.


----------



## grahamperrin@ (Sep 19, 2021)

Mozilla bug 1731482 - Repair Text Encoding: page(s) not properly repaired (compared to e.g. Firefox ESR)

Incidentally:



memreflect said:


> ISO-8859-1, ISO-8859-15, or Windows-1252.



– off-topic from Firefox, none of those have the required effect, for the given page, in Falkon.


```
% pkg info -x falkon ; uname -aKU ; freebsd-version -kru
falkon-3.1.0_1
FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #109 main-n249408-ff33e5c83fa: Thu Sep 16 01:11:04  2021     root@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG  amd64 1400033 1400033
14.0-CURRENT
14.0-CURRENT
14.0-CURRENT
%
```


----------



## Deleted member 30996 (Sep 19, 2021)

memreflect said:


> Well, that's certainly interesting.  Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding".  In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.


I deleted the closing bracket for the title of the index.html page on my site,  then opened it as a file in Firefox-ESR and clicked the "View Page Source" option.

It highlights the beginning of the error in red and the metatag underneath the error is highlighted in red. Encoding is charset=utf-8 and it's valid XHTML 1.0 Transitional:


----------



## grahamperrin@ (Sep 20, 2021)

hruodr said:


> … Try to get the right encoding of: …



Firefox: Text Encoding menu functionality - Add-ons / Development - Mozilla Discourse



> … is an extension feasible? …



I sent an e-mail to the developer of the extension for Thunderbird.


----------



## hruodr (Sep 20, 2021)

grahamperrin said:


> I sent an e-mail to the developer of the extension for Thunderbird.


Thanks. 

I never use add-ons, I do not trust them. I wonder how an elementary functionality
disappeared, but the lot of bloat functionality remains and increases.

It is terrible that there are few alternative browsers.


----------



## memreflect (Sep 20, 2021)

grahamperrin said:


> Incidentally:
> 
> 
> memreflect said:
> ...


I just tried Falkon and Otter Browser, and changing the character encoding does not affect the rendering for me on any web pages I've tried.  While they both appear to refresh the page view, the page info still shows UTF-8 or unknown while the encoding menu indicates the character encoding I selected is active.

Selecting "Western" in www/firefox-esr renders the pages correctly, and the pages also display correctly in a terminal emulator with W3M when I change the encoding (`=` key to view page info where the character encoding of the page can be changed).

I took a look at the pages on a Chromebook because I don't feel like waiting for Chromium to build, and the pages rendered incorrectly there as well.  That was easily fixed by installing the Set Character Encoding extension and selecting one of the encodings I mentioned (ISO-8859-1 is noticeably missing, but it was succeeded by ISO-8859-15 and Windows-1252 anyway).


----------



## memreflect (Sep 20, 2021)

Trihexagonal said:


> I deleted the closing bracket for the title of the index.html page on my site,  then opened it as a file in Firefox-ESR and clicked the "View Page Source" option.
> 
> It highlights the beginning of the error in red and the metatag underneath the error is highlighted in red. Encoding is charset=utf-8 and it's valid XHTML 1.0 Transitional:


In SGML definitions of HTML (anything before XHTML 1.0 and "ISO HTML"), that would be an error as well, so I'm not sure what your point is here.  Are you suggesting that invalid markup is the cause of the character encoding trouble being discussed in this thread?

Off-topic:


Spoiler



You could shorten things to `<title>Your title here</><meta ...>` and it would still be valid HTML, but most HTML parsers would have trouble with that and such usage is discouraged by the W3C and the W3C HTML validator anyway.  The shortest valid HTML 4.01 Strict document (if you ignore the lack of a doctype) is `<title//<p>`.  For more information about these SGML features that few browsers (if any) have implemented, SGML - Markup minimization (Wikipedia) and Understanding HTML and SGML (W3C) are two useful resources.  I am glad XML, and consequently XHTML, simplified things significantly with crazy features like those!


----------



## astyle (Sep 20, 2021)

If Firefox were in fact intelligent, it wouldn't be so bloated to the point that just one tab takes up 900MB. I'm really grateful that FreeBSD forums are not addled with ads like other sites often are.


----------



## grahamperrin@ (Sep 20, 2021)

This thread is a point of reference in the bug report. It might help to keep things on topic; text encoding edge cases that are not properly repaired by the repair feature.


----------



## hruodr (Sep 20, 2021)

grahamperrin said:


> not properly repaired by the repair feature


It is not possible to repair anything, it is only heuristics. At best they should bring back the menu.


----------



## astyle (Sep 20, 2021)

hruodr said:


> It is not possible to repair anything, it is only heuristics. At best they should bring back the menu.


Sometimes, they just hide the menu under some cute-looking button.  Happens during nearly every update, and I have to play hide-and-seek all over again.


----------



## Deleted member 30996 (Sep 21, 2021)

memreflect said:


> In SGML definitions of HTML (anything before XHTML 1.0 and "ISO HTML"), that would be an error as well, so I'm not sure what your point is here.


Let me try to explain it so you can understand, memreflect.

The topic of the Thread is "Firefox is getting more intelligent (than us)".

I followed that up with the second post to this tread with:



Trihexagonal said:


> I know if you use the right-click option to "View Page Source" in Firefox-ESR it will highlight in red XHTML errors I've made the W3C validator points out that I can't readily see by glancing over the page of markup.



My point was to show how it was getting more intelligent and could identify markup errors in 'View Page Source" option to view the raw XHTML markup I might not readily see.

You followed that up with:



memreflect said:


> Well, that's certainly interesting.  Neither the server nor the HTML declares a character encoding for the document, so nobody can blame Firefox incorrectly guessing it's ISO-8859-2 when you "Repair Text Encoding".  In my opinion, the right option would have been to retain the menu and add the new menu item for old pages like this.


Which was an erroneous statement on your part. It _does_ make a declaration of character encoding of "utf-8' in the xml version declaration preceding the DocType and in the metatag shown in my "View Page Source" screenshot.


```
<?xml version='1.1' encoding='utf-8'?>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=utf-8" />
```

You are the one who said it couldn't be blamed for incorrectly guessing ISO-8859-2 as the character encoding. From what exactly did you draw the conclusion from that it incorrectly "guessed" the character encoding?



memreflect said:


> Are you suggesting that invalid markup is the cause of the character encoding trouble being discussed in this thread?


I purposely deleted the closing bracket of the Title of my index.html page, loaded it in Firefox-ESR as a file and took a screen shot to document the claim I made of the ability of Firefox-ESR to "highlight in red XHTML errors".

Is that clear to you now? All my markup is valid XHTML 1.0 Transitional, and FYI, my CSS is valid CSS level 3 + SVG.



memreflect said:


> You could shorten things to `<title>Your title here</><meta ...>` and it would still be valid HTML, but most HTML parsers would have trouble with that and such usage is discouraged by the W3C and the W3C HTML validator anyway. The shortest valid HTML 4.01 Strict document (if you ignore the lack of a doctype) is `<title//<p>`.


It would not be valid XHTML (and if it's not valid XHTML it's not considered to be XHTML at all), the validation abilities of which was my addition to the thread topic of how Firefox is getting more intelligent (than us).




memreflect said:


> For more information about these SGML features that few browsers (if any) have implemented, SGML - Markup minimization (Wikipedia) and Understanding HTML and SGML (W3C) are two useful resources.  I am glad XML, and consequently XHTML, simplified things significantly with crazy features like those!


For a more information in the differences in XHTML Versus HTML.


----------

