# Do not some system utilities stagnate?



## mkru (Aug 17, 2021)

I will provide an example, the `grep` program. It is one of the most famous Unix utilities and comes with the system by default. But lets face the truth, its performance and experience is poor comparing to ripgrep or The Silver Searcher.  I can't even remember when I have used grep without passing any extra options such as `-E`, `-r` or `-n`. It has no multi threading support, and it is not capable of making any use of files such as .gitignore for excluding directories.

Some of its drawbacks, such as enabling frequently used options by default can be easily solved with aliases. However, some of the functionalities are simply missing and one has to use different program. I am aware, that grep has to stay backward compatible, and one can't simply change its user interface. But one could simply add options for multi threading and parsing .gitignore files that would be disabled by default. The first thing I do after the system installation is installing ripgrep. I really do not like this fact. This could  be avoided if system utilities get backward compatible improvements. Is there any rule forbidding improving standard utilities? 

I see a trend to re-implement  tools with only slight differences. In my humble opinion this is insane. It is like neglecting  several dozen years of history. It is a bit like improving via revolution instead of evolution. I understand that such approach might be preferred in Linux, where there is simply a kernel and multiple distributions. However, does FreeBSD also has to follow this path?


----------



## richardtoohey2 (Aug 17, 2021)

mkru said:


> Is there any rule forbidding improving standard utilities?


The risk of introducing regressions, I'd imagine.


----------



## mer (Aug 17, 2021)

POLA is likely a factor.
Utilities that have become a standard, their interfaces, if that changes, a lot of folks would complain.
If you completely rip out the guts, but don't change anything a user would see and you don't introduce regressions,  then noone would complain.  But that is a non trivial task;  lots of people have scripts expecting the output to be a certain way.  Change the way something is spelled and you break a lot of scripts.

Sometimes the slight differences in a reimplementation are because adding those to the original is not easy.


----------



## Phishfry (Aug 17, 2021)

The base size of FreeBSD is growing very large. So much so that the disc1.iso will no longer fit on a CD.
At some point someone has to say no.
Instead of bloating FreeBSD why not include these ports you prefer in a custom image?

Some people hate change.
I would like to see a FreeBSDX where no new features are added until 95% of the current PR's are fixed.
Then add in newer hardware support., newer cyphers and critical newer software.
Not somebodies Google Summer of Code project which upended the bootloader for no good reason.


----------



## ralphbsz (Aug 18, 2021)

Welcome to open source, and openly administered systems.

Don't like the grep that ships with the system? Install another one. Or take the source for the existing one and modify it. Or write a new one and install it as /usr/local/bin/grep (that's actually not such a good idea, but it would work).

It's not just that some people hate change. But there is a lot of value in the smallest bit of code that takes care of 90% (or 70% or 99%) of the use cases. Every bit of code that isn't there isn't broken, doesn't need maintenance when the environment changes, doesn't need to be argued over. One of the reasons I like the *BSDs (in particular OpenBSD) is that they are just smaller than competing solutions, which makes them easier to use and understand.

If you want to have some fun, try agrep and the glimpse program. That's a game changer.


----------



## mkru (Aug 18, 2021)

Please note, that I am aware that introducing backward incompatible changes is not unacceptable. I mean small backward compatible incremental improvements. I just feel like keeping some utilities the way they are and installing more and more similar tools is not inline with the Unix philosophy. In the long term one might end with bunch of legacy tools installed by default, and a bunch of newer, "better" versions of tools doing the same stuff installed "by hand". And in the end, the final size of the system is even larger.


----------



## Beastie7 (Aug 18, 2021)

mkru said:


> I just feel like keeping some utilities the way they are and installing more and more similar tools is not inline with the Unix philosophy



You must be new to BSD Unix engineering principles. FreeBSD isn’t just some toy to be altered on whimsical nonsense. It’s a research and development project.

Take these kinds of propositions back to Linux.


----------



## sko (Aug 18, 2021)

mkru said:


> However, some of the functionalities are simply missing and one has to use different program



Then just use awk?
grep was designed to do only one job: fast, simple searches in text strings. For this it is still by far the best, simplest, fastest and most widely available tool and therefore also THE first choice for scripting (if we ignore the incompatibilities GNU introduced with their variant...). If you need more functionality, just use the fully-fledged awk language from which grep originally descended and which is still faster than parsing with any modern/new/hip language or tool.
I still use awk to convert parts and price datasets with nearly 1mio entries for inport in our DMS; including all price calculations, discounts and tax (we only get end price and matrices with discount codes). The import tool of the DMS (written in .NET) takes around half an hour and can't do all the price calculations - awk does the complete job in ~2-3 seconds on a 4 year old mid-range desktop.
And because you mentioned it in regards of performance: Multithreading isn't the solution to anything - it was at first merely a workaround for OSes that can't fork properly and have insane overhead on forking. Or better say: there is this one OS that still hasn't figured this out properly and hence needs multithreading for everything to keep up. MT introduces a HUGE can of worms that you definitely don't want in simple, small tolls that are supposed to 'just work'™
Split the work and fork multiple instances; this will beat MT in speed and especially complexity (and therefore stability) 99% of the time.

Why nowadays every tool that is working and, for its intended and well-defined usecase, feature-complete is called "outdated" and "backwards" just because it doesn't get bloated and breaks compatibility (or completeley) every few days because of some hip new language and moronic release cycles (i.e. "we're too lazy to test, we just release betas every 2 days..."). I don't want such software in my workhorse-OS and I think thats true for most users of BSD and/or "real" UNIXes - everyone who wants something that constantly changes and breaks is free to use a linux distribution of his choice...


----------



## mkru (Aug 18, 2021)

I feel deeply misunderstood. I have highlighted several times, that backward compatibility is the priority and that most of the problems can be easily solved with aliases. The main concern is performance which is much worse. You do not care when one tool takes 10 ms to do the job and the second one takes 1 s. However, you start caring when the first one takes 1 s and the second takes 100 s. Replacing grep with awk is not a good choice when you simply want to search for word occurrences. Splitting the work and forking multiple instances is conceptually fine, but it requires you to write and remember relatively long commands. Of course one can write custom scripts.


----------



## mer (Aug 18, 2021)

mkru said:


> The main concern is performance which is much worse. You do not care when one tool takes 10 ms to do the job and the second one takes 1 s. However, you start caring when the first one takes 1 s and the second takes 100 s.


That is exactly the point I was trying to address.  If you make that change without changing the interface (POLA, Principle Of Least Astonishment) and without any regressions, it's likely that noone cares the change was made and will support the change.

As others have said, there is also a big advantage to the "Don't fix it if it isn't broken" methodology.  Stagnant or "it's not broken"?

But one needs to truly understand the root cause of why the performance is bad.

Some tools work perfectly fine, good performance up to some limit.  Is it worth rewriting them to make them work better with technical overload or do you create a new tool, targeted for the new load from scratch?

Go look in the hackers mailing list.  There was recently something about "sysctl being slow if you have thousands of ZFS datasets".  There was a bit of root cause analysis that got to a possible cause,  ideas were tossed out as to how to change it for the better, but then the question was raised "Is this actually the appropriate method for me to get this data or is there a better way?".

That email thread I think demonstrates your point and the point I've been trying to make.
Link to the first post in the email thread if you want to follow it.




__





						sysctl is too slow
					





					lists.freebsd.org


----------



## Hakaba (Aug 18, 2021)

`grep` change from the GNU version to a BSD version in RELEASE 13.0.
(See the 13.0 releases notes)
So, this is not a good exemple.

As I know, I see regular news about changes in FreeBSD. A popular one is the network stack optimized for (by?) Netflix.
But each time I update the system, there is changes inside 'FreeBSD' tools.

Maybe you have a bad perception because you focus your sight where there is no change ?


----------



## chungy (Aug 18, 2021)

Backwards compatibility can really be a huge deal, it's why grep doesn't default to incompatible modes accessible by switches. Sysadmins don't like it when they have to constantly change their scripts to keep up with new opinions by developers; almost any script written on FreeBSD 25 years ago, using only the base programs, should still function the same on current systems.

Scriptability itself is a pretty big deal: programs like ripgrep and The Silver Surfer are oriented toward an interactive environment, displaying all kinds of colors and formatting that only makes sense to an eyeball. Such features only make it harder for scripts to parse.

All in all, the base utilities are being improved, but an emphasis on backwards compatibility and the cost to maintain it in the future are taken into account. Swapping out grep with rg would be a huge and breaking change and make a lot of angry sysadmins. Keeping grep as it is, while having the option to install rg from ports is the more sensible option. As a user, you can even alias grep=rg if you really want.


----------



## astyle (Aug 18, 2021)

chungy said:


> Backwards compatibility can really be a huge deal, it's why grep doesn't default to incompatible modes accessible by switches. Sysadmins don't like it when they have to constantly change their scripts to keep up with new opinions by developers; almost any script written on FreeBSD 25 years ago, using only the base programs, should still function the same on current systems.
> 
> Scriptability itself is a pretty big deal: programs like ripgrep and The Silver Surfer are oriented toward an interactive environment, displaying all kinds of colors and formatting that only makes sense to an eyeball. Such features only make it harder for scripts to parse.
> 
> All in all, the base utilities are being improved, but an emphasis on backwards compatibility and the cost to maintain it in the future are taken into account. Swapping out grep with rg would be a huge and breaking change and make a lot of angry sysadmins. Keeping grep as it is, while having the option to install rg from ports is the more sensible option. As a user, you can even alias grep=rg if you really want.


I completely agree, I don't want to remember to tack an 'a' character to the `grep`. I'm all for performance improvements, and maybe a few new args, but not for name changes. As Hakaba pointed out, FreeBSD would switch from a GNU implementation of `grep` to a FreeBSD one, I don't really care either way, as long as the name and basic args stay the same. Isn't that why we have POSIX standards?


----------



## richardtoohey2 (Aug 18, 2021)

mkru said:


> I feel deeply misunderstood. I have highlighted several times, that backward compatibility is the priority


I think most people understand that, but I think the point is you might be underestimating the amount of effort it would take to be sure that any changes were 100% not introducing any regressions.  Yes, you might make some things and use cases x10 faster - great - but if the changes accidentally break hundreds or thousands of production servers - not so great.

So do the improvements outweigh the risk of unintended consequences?  Often not.


----------



## mkru (Aug 19, 2021)

Reading the comments one might get an impression that the whole system is extremely fragile.


----------



## Geezer (Aug 19, 2021)

mkru said:


> Reading the comments one might get an impression that the whole system is extremely fragile.


I don't get that.


----------



## Menelkir (Aug 19, 2021)

mkru said:


> Reading the comments one might get an impression that the whole system is extremely fragile.


It will be fragile if it was a constant moving target. You don't change a thing just because of a minor version number or single feature for the sake of change in something that works in the same way for years. That's exactly where the ports system shines, the base system is and should be immutable as possible and rock solid.


----------



## kpedersen (Aug 19, 2021)

mkru said:


> Reading the comments one might get an impression that the whole system is extremely fragile.


Not the system as such. But certainly people's scripts. They often make many assumptions as to what features their version of grep provide.

The busybox grep, gnu grep and bsd grep are all subtly different which can potentially break scripts. People would rather progress in this world rather than revisiting the same old work covered over and over again fixing breakages and regressions.

In some ways I am not convinced that "modern" developers even have the discipline to be able to develop a 100% compatible implementation of grep. In any language. Those days have passed.


----------



## sko (Aug 19, 2021)

kpedersen said:


> In some ways I am not convinced that "modern" developers even have the discipline to be able to develop a 100% compatible implementation of grep. In any language. Those days have passed.



Primarily because such a plan would in a very early stage end in a total war about licensing and ideology BS, because it seems nowadays in many software/OS-related communities those questions seem to be much more important than good, working code...


----------



## ShelLuser (Aug 19, 2021)

If it isn't broke, why try to fix it? 

I don't see any advantages here, other than catering to a possible small group of people who are dying to go reinvent the wheel. And maybe plaster their own name onto the project? 

Talk is cheap.. if you think grep can be improved then fork and improve. Provide a port, see how often it's going to be used (I seriously doubt this part) and then you have something to show for. Who knows, it might even end up in the base system in a year of 5 - 8 providing it's actually successful.


----------



## mkru (Aug 19, 2021)

Geezer said:


> I don't get that.


It is about the mindset. There is no room for improvements, probably there are no regression tests. People are afraid of touching anything, because something can break. And I will highlight it once again, I mean backward compatible improvements, not breaking changes.


----------



## Geezer (Aug 19, 2021)

I don't get that  "_the whole system is extremely fragile._" It isn't.


----------



## Argentum (Aug 19, 2021)

mkru said:


> I will provide an example, the `grep` program. It is one of the most famous Unix utilities and comes with the system by default. But lets face the truth, its performance and experience is poor comparing to ripgrep or The Silver Searcher.  I can't even remember when I have used grep without passing any extra options such as `-E`, `-r` or `-n`. It has no multi threading support, and it is not capable of making any use of files such as .gitignore for excluding directories.


You can use FreeBSD ports for exotic things - textproc/ugrep for example. I have it. See ugrep(1). I think there are more. Also agree that the base should not be polluted with too many short living things.


----------



## mer (Aug 19, 2021)

mkru said:


> t is about the mindset. There is no room for improvements, probably there are no regression tests. People are afraid of touching anything, because something can break. And I will highlight it once again, I mean backward compatible improvements, not breaking changes.


If this were true, there would be no FreeBSD-CURRENT, FreeBSD-STABLE, FreeBSD-RELEASE.
Folks are not afraid of "touching something because it will break", yes there are plenty of tests around including regression testing.

A lot of folks get confused between "base" and "not-base".  Give me a reason why "base" should not be as stable as possible, with documented and designed changes from version to version or release to release.
Ports are the place to demonstrate improvements in applications.


----------



## Menelkir (Aug 19, 2021)

Argentum said:


> You can use FreeBSD ports for exotic things - textproc/ugrep for example. I have it. See ugrep(1). I think there are more. Also agree that the base should not be polluted with too many short living things.


Some time ago, I don't remember what I was doing that I neeeded a flag that grep didn't had. I've installed textproc/ugrep and that's all.
Also, if you really miss something that grep do on linux userland, just install textproc/gnugrep then.


----------



## Hakaba (Aug 19, 2021)

Again, `grep` change in the system base. This is a concrete proof that changed happens.
End of polemic, no ?

If you argue that the system seems fragile, I see 0 bugs in relation with the `grep` change... That sound like a rock solid base, no ?
And finally, the base system is a base... So there is a lot of specific usage that use base tools to build edifices. So the major release exists to prevent any breaking change for the user.

Maybe you have a better example than grep ?


----------



## msplsh (Aug 19, 2021)

mkru said:


> I mean backward compatible improvements, not breaking changes.


This is another one of those "why don't people do stuff" questions where the people in position to answer them are not here on the forum, will not come here, and everybody gets to make a strawman that looks like them and knock other people's strawmen over.

That being said:

Fixing other people's stuff isn't always fun
Starting over with a new tool is often "easier"
Making something faster sometimes requires an architecture change that will make preventing user facing changes way more work than the programmer wants to do
This whole thing is mostly a volunteer effort
Installing some other program is easy.  Eventually if it's super popular and mostly backward compatible, it will go into base.  If it wholly replaces the other program, the old program will eventually get removed.  This process takes an incredible amount of time and the removal makes a small number of noisy people who hate change, angry.


----------



## kpedersen (Aug 19, 2021)

Hakaba said:


> Maybe you have a better example than grep ?


One I could suggest is the system compiler [old gcc 4.x] > [clang]. Granted it is a much more extreme example.

I am personally shocked at either how little this broke in terms of the ports or how efficient the ports collection was at supporting this large change. Many patches must have been made behind the scenes.

Clang is better than the old gcc which stagnated for a while. So it certainly shows that the FreeBSD project isn't afraid of change when it is actually a big benefit. Otherwise I think most engineers prefer to iterate rather than rewrite.

So if anyone did have any improvements to grep, I would certainly encourage them to try to implement them into our grep. However rewrite a new grep entirely doesn't seem like the correct solution. No matter how much backwards compatibility they try to go for. It will never be 100% (quirks and all).


----------



## msplsh (Aug 19, 2021)

Clang took a long time, and this timeline doesn't include the work of bringing Clang up on BSD which is a couple years of patching before.


----------



## msplsh (Aug 19, 2021)

mer said:


> Is it worth rewriting them to make them work better with technical overload or do you create a new tool, targeted for the new load from scratch?
> 
> Go look in the hackers mailing list. There was recently something about "sysctl being slow if you have thousands of ZFS datasets". There was a bit of root cause analysis that got to a possible cause, ideas were tossed out as to how to change it for the better, but then the question was raised "Is this actually the appropriate method for me to get this data or is there a better way?".


In this instance, replacing the linked list with a tree structure is actually a really good idea instead of doing it another way, pretty "easy" to do in isolation, and apparently NetBSD already does this.  1 & 4 from my list applies, however.


----------



## mer (Aug 19, 2021)

msplsh said:


> In this instance, replacing the linked list with a tree structure is actually a really good idea instead of doing it another way, pretty "easy" to do in isolation, and apparently NetBSD already does this. 1 & 4 from my list applies, however.


Yep sometimes the technical change is isolated and could be done without impact, but it will boil down to priorities and desire.
If one has the desire and prioritizes it, go for.

mkru you do realize that even without being a committer you could actually file a bug/enhancement with a patch, don't you?  If this specific example of grep bothers you so much, why not do so?


----------

