# How to delete a file whose name begin with -



## hruodr (Mar 26, 2021)

I have a file -tmp-3. The command line comands interpret the - as a flag even if I write ls *tmp-3.

Is this a big in (Free)BSD?


----------



## im (Mar 26, 2021)

`rm ./-filename` works for me.

Read the manual rm(1)

```
NOTES
     The rm command uses getopt(3) to parse its arguments, which allows it to
     accept the `--' option which will cause it to stop processing flag
     options at that point.  This will allow the removal of file names that
     begin with a dash (`-').  For example:

           rm -- -filename

     The same behavior can be obtained by using an absolute or relative path
     reference.  For example:

           rm /home/user/-filename
           rm ./-filename
```


----------



## hruodr (Mar 26, 2021)

So simple, im, thanks. Full path.

Buit still I wonder, why names got by glob are still seen as options. That may be risky.


----------



## zirias@ (Mar 26, 2021)

Well, the full path is a workaround, the canonic marker for "end of options" in a Unix command is `--`, so everything after that is treated as "positional argument", even if it starts with a dash.


hruodr said:


> Buit still I wonder, why names got by glob are still seen as options. That may be risky.


What do you mean with "got by glob"? It's on the commandline and `-` is the special character for an option. It was always that way on any Unix-like system.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> What do you mean with "got by glob"?


The file was named -tmp-3. Also `ls *tmp-3` does not work.


----------



## zirias@ (Mar 26, 2021)

hruodr said:


> The file was named -tmp-3. Also `ls *tmp-3` does not work.


So, you know the `*` is expanded by your shell and not by `ls`?


----------



## hruodr (Mar 26, 2021)

Zirias said:


> So, you know the `*` is expanded by your shell and not by `ls`?



OK, I see, it is a consistent, undesired behaviour.


----------



## SirDice (Mar 26, 2021)

hruodr said:


> It is a bug in my oppinion.


It's not a bug, it's expected behavior.


----------



## zirias@ (Mar 26, 2021)

hruodr said:


> es, but tcsh and sh are doing that. It is a bug in my oppinion.


It is specified in POSIX that globs are expanded by the shell (and if you _want_ to pass them unexpanded to a command, you have to escape them). Your "opinion" doesn't count here.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> you have to escape them


How?


----------



## zirias@ (Mar 26, 2021)

Methods for escape something in a shell are putting it in double-quotes, single-quotes, or (for single characters) put a backslash in front of it. For details read sh(1).

Be aware neither `ls` nor `rm` understand globs.


----------



## hruodr (Mar 26, 2021)

Well I see the ratio and to do something else would make things "unnecessarily" complicated.
Undesired behaviour is also not the right word. It remains risky.


----------



## SirDice (Mar 26, 2021)

You escape characters using the backslash `\`. To get a _literal_ question mark for example: `touch \?`. Or use single quotes, `touch '?'`. Single and double quotes are treated differently too. There's a difference between `touch "$foo"` and `touch '$foo'`.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> Methods for escape something in a shell are putting it in double-quotes, single-quotes, or (for single characters) put a backslash in front of it.


All that does not work in the case treated here. Perhaps it is you who have to learn something?


----------



## hruodr (Mar 26, 2021)

SirDice said:


> You escape characters using t


Yes, but by quoting the commands will still consider the argument as option.


----------



## SirDice (Mar 26, 2021)

That's what the `--` is for. `rm -- "${file}"`


----------



## zirias@ (Mar 26, 2021)

Are you kidding? You're nagging about "original Unix behavior" pretty often, but obviously have no clue how things work. I *TOLD* you in the same post, that neither `ls` nor `rm` understand globs!

The behavior of FreeBSD is POSIX compliant, and of course you will find the same behavior in any other Unix-like system (if it isn't broken).

Specifically, the double-dash is documented here (guideline 10):


			Utility Conventions


----------



## hruodr (Mar 26, 2021)

Zirias said:


> but obviously have no clue how things work.


I am not as perfect as you are. Thats all.

BTW. Your proposal of quoting seems to be based on my own lapsus. But anyway, you are sure better, I am
not a computer scientist and also do not want to be one.


----------



## zirias@ (Mar 26, 2021)

Not knowing things isn't a problem at all, but basing strong opinions ("bug", "you have to learn something", etc…) on a wrong understanding is.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> but basing strong opinions ("bug", "you have to learn something", etc…) on a wrong understanding is.


Your strong opinions are always based on right understanding, because you always understand things right
and your understanding will never appear later to be wrong.


----------



## SirDice (Mar 26, 2021)

Don't claim something is a bug just because you misunderstood something.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> The behavior of FreeBSD is POSIX compliant, and of course you will find the same behavior in any other Unix-like system (if it isn't broken).
> 
> Specifically, the double-dash is documented here (guideline 10):
> Utility Conventions


From all your interventions, this is the only what I appreciate and thank. Also for reminding me that the
glob is done by shell (lapsus) and hence the command would still behave in the same way. Note that
by recommending quoting you fell in the very same lapsus.


----------



## hruodr (Mar 26, 2021)

SirDice said:


> Don't claim something is a bug just because you misunderstood something.


But for not doing that, you must know that you misunderstood something, and hence it would not
be misunderstanding.

And I did not claim it is a bug, I wrote "in my opinion it is a bug", and that is a call to contradict it.

Also not a strong opinion was the question: "Perhaps it is you who have to learn something?".
Reasons: (1) there is always something to learn, (2) Zirias do not need to learn anything, but
I wrote for that reason "perhaps".


----------



## zirias@ (Mar 26, 2021)

hruodr said:


> Note that by recommending quoting you fell in the very same lapsus.


No:


Zirias said:


> Methods for escape  [..]
> 
> Be aware neither `ls` nor `rm` understand globs.


----------



## hruodr (Mar 26, 2021)

Quoting is not globing. BTW, Why were you teaching methods for escape totaly out of context?


----------



## zirias@ (Mar 26, 2021)

You asked


----------



## hruodr (Mar 26, 2021)

Zirias said:


> You asked


Where?


----------



## olli@ (Mar 26, 2021)

hruodr said:


> Where?


You wrote: _“It is a bug in my oppinion.”_
Then Zirias replied: _“It is specified in POSIX that globs are expanded by the shell (and if you want to pass them unexpanded to a command, you have to escape them).”_
And then you asked: _“How?”_


----------



## olli@ (Mar 26, 2021)

You shouldn’t claim something is a bug or "risky" if you're not familiar with it at all. That will provoke exactly the reactions that can be seen in this thread.
 
The way globbing and parsing of command line options works is POSIX-compliant, and it works the same for every operating system that strives to be POSIX-compliant. And it is only risky if you don't know how it works. And the way it works is documented in the manual pages.
 
BTW, asking how to remove a file starting with a dash is a typical beginner's question. A beginner should never claim something is a bug, but instead try to learn why things are the way they are. Also, I can’t help but notice that quite a lot of threads you are involved with turn into non-technical arguments and finger-pointing. You need to work on your attitude.


----------



## hruodr (Mar 26, 2021)

Olli, yes, understanding things literally, you are right. I reacted fast and understood the unexpected hint of quoting as a solution to the original problem.

UPDATE: And I am sure that also experts from time to time do error of beginners. It is posix, it is consequent
behaviour, but in my opinion a risky. There is a reason why rm has a flag -i.


----------



## zirias@ (Mar 26, 2021)

hruodr said:


> UPDATE: And I am sure that also experts from time to time do error of beginners. It is posix, it is consequent
> behaviour, but in my opinion a risky.


An "expert" in the meaning you seem to imply will, when confronted with "unexpected" behavior, do some research, or just ask, instead of prematurely claiming something is a "bug" or "wrong".

And there's absolutely nothing "risky" with POSIX specifying that options start with a dash, and a double-dash marks the end of options. If you're doing it wrong, it will just not work.


hruodr said:


> There is a reason why rm has a flag -i.


There is, but that's a whole other topic not related here.


----------



## SirDice (Mar 26, 2021)

Yeah, the `-i` option is nice but not for automated scripts. With scripts you really need to handle filenames properly, especially if you store those filenames in variables that are dynamically filled. Most sysadmins take care of naming files properly (they know certain characters are difficult to handle, like the aforementioned `-`), but leave it to users to come up with some really funky filenames. Filenames that can royally screw up your scripts if you don't account for them.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> An "expert" in the meaning you seem to imply will, when confronted with "unexpected" behavior, do some research, or just ask,


I just asked, in this informal forum. I think this is what makes the forum a nice distraction, a little small talk,
a little battlefield. Of course, before asking in the mailing list, I would do some, or more, research.



Zirias said:


> instead of prematurely claiming something is a "bug" or "wrong".



I did not claim, I said my opinion, that in your opinion does not count. Then why is that so a big offense?!



Zirias said:


> And there's absolutely nothing "risky" with POSIX specifying that options start with a dash, and a double-dash marks the end of options. If you're doing it wrong, it will just not work.



I knew the double dashes on only few programs. I never needed them before. The fact that this double
dash mechanism is necessary, tells that it is not a trivial problem.


----------



## olli@ (Mar 26, 2021)

hruodr said:


> UPDATE: And I am sure that also experts from time to time do error of beginners. It is posix, it is consequent
> behaviour, but in my opinion a risky. There is a reason why rm has a flag -i.


Let me answer by giving three quotes from my collection of usenet signatures.

"UNIX was not designed to stop you from doing stupid things,
because that would also stop you from doing clever things."
 ― Doug Gwyn, UNIX developer

"If you aim the gun at your foot and pull the trigger, it's
UNIX's job to ensure reliable delivery of the bullet to
where you aimed the gun (in this case, Mr. Foot)."
 ― Terry Lambert, FreeBSD-hackers mailing list

"Unix gives you just enough rope to hang yourself --
and then a couple of more feet, just to be sure."
 ― Eric Allman, author of BSD sendmail and syslog

And finally: Recommended reading: Unix philosophy


----------



## zirias@ (Mar 26, 2021)

hruodr said:


> I knew the double dashes on only few programs. I never needed them before. The fact that this double
> dash mechanism is necessary, tells that it is not a trivial problem.


It is _very_ trivial, but of course, you have to understand the concept. For the commandline invocation of a conforming utility, there are a few simple rules:

options start with a dash and must come first
the first non-option (positional parameter) marks the end of options
in case the first(!) non-option _does_ start with a dash, insert a double-dash before, which _explicitly_ marks the end of options
I just noticed this forum has a nice "ignore" feature…


----------



## olli@ (Mar 26, 2021)

hruodr said:


> I did not claim, I said my opinion, that in your opinion does not count. Then why is that so a big offense?!


Saying “in my opinion” is redundant. Of course it is your opinion, even if you don’t mention it.
But still, in this case it is a _wrong_ opinion because it is based on ignorance.
 


> I knew the double dashes on only few programs. I never needed them before. The fact that this double
> dash mechanism is necessary, tells that it is not a trivial problem.


Actually it _is_ trivial, as soon as you are aware of how the parsing of command line arguments works in UNIX.
 
By the way, you don’t necessarily have to use double dashes. You can also give the relative file name (`./-file`) or the absolute file name (`/dir/-file` or `$PWD/-file`). Of course, this won’t work if the argument is not an actual file, for example when doing things like `expr -- -100 + 300` or `grep -- -strike- somefile` – in these cases you have to use double dashes, indeed.


----------



## shkhln (Mar 26, 2021)

hruodr said:


> I knew the double dashes on only few programs. I never needed them before. The fact that this double
> dash mechanism is necessary, tells that it is not a trivial problem.


It is. You can't have a text interface without some meta-chars at least.


----------



## hruodr (Mar 26, 2021)

Zirias said:


> It is _very_ trivial, but of course, you have to understand the concept. For the commandline invocation of a conforming utility, there are a few simple rules:



Well, perhaps basic programs follow that rules. I know a lot of programs that do not.



olli@ said:


> Saying “in my opinion” is redundant. Of course it is your opinion, even if you don’t mention it.
> But still, it is a _wrong_ opinion because it is based on ignorance.



Excuse me, but I cannot follow your absolute relativism.



shkhln said:


> It is. You can't have a text interface without some meta-chars at least.



Yes, it is trivial that it needs an extra mechanism. Does this mean that the original problem is trivial?


----------



## shkhln (Mar 26, 2021)

hruodr said:


> Yes, it is trivial that it needs an extra mechanism. Does this mean that the original problem is trivial?


For comparison, writing sed (grep, etc) snippets for FreeBSD ports requires people to juggle 3 levels of escaping. That's make, shell and sed rules. You never know whether you need $, $$ or $$$.


----------



## olli@ (Mar 26, 2021)

hruodr said:


> Well, perhaps basic programs follow that rules. I know a lot of programs that do not.


No, all standard UNIX utilities follow those rules. They use getopt(3) or similar library functions for parsing of command line options, or getopt(1) for shell scripts, or the shell’s built-in function for that purpose.
It is true that some 3rd-party programs do it differently, though. But rm(1) is clearly a standard UNIX utility.
 


> Yes, it is trivial that it needs an extra mechanism. Does this mean that the original problem is trivial?


Yes, in this case it is trivial. If you think about it, if there are characters with special meaning (like “`-`”), then there _must_ be a way to remove that special meaning. Like `&lt;` in HTML when you want to display the character “`<`” which has a special meaning.


----------



## hruodr (Mar 26, 2021)

shkhln said:


> For comparison, writing sed (grep, etc) snippets for FreeBSD ports requires people to juggle 3 levels of escaping. That's make, shell and sed rules. You never know whether you need $, $$ or $$$.



You understood "trivial" as "easy", I meant it as "demanding an extra mechanism", and the real meaning
of trivial is: "there are many ways".



olli@ said:


> then there _must_ be a way to remove that special meaning.


That is, using your vocabulary, an opinion based on ignorance and hence wrong: that something is needed 
does not mean that it must exist.


----------



## SirDice (Mar 26, 2021)

shkhln said:


> For comparison, writing sed (grep, etc) snippets for FreeBSD ports requires people to juggle 3 levels of escaping. That's make, shell and sed rules. You never know whether you need $, $$ or $$$.


Don't get me started on the backslash hell you can find yourself in with certain utilities. Dealing with Windows shares for example, `\\servermame\share` -> `\\\\servername\\share`. And sometimes you need to 'escape' the escape so you end up with `\\\\\\servername\\\share`. Or how about the infamous 'wave' escape; `s/\/usr\/local\//\//`. It'll drive you completely bonkers if you don't pay enough attention.


----------



## olli@ (Mar 26, 2021)

hruodr said:


> That is, using your vocabulary, an opinion based on ignorance and hence wrong: that something is needed
> does not mean that it must exist.


Since UNIX supports file names that start with a dash, obviously a way exists to handle them.


----------



## shkhln (Mar 26, 2021)

hruodr said:


> and the real meaning of trivial is: "there are many ways".


A trivial problem is a problem that can not be reduced (simplified) further. You can look up the meaning in a dictionary if you want.


----------



## hruodr (Mar 26, 2021)

olli@ said:


> Since UNIX supports file names that start with a dash, obviously a way exists to handle them.



Under the supposition that Unix has a solution for everything. Yes, for example with a little of C code.



shkhln said:


> A trivial problem is the one that could not be reduced (simplified) further. You can look up the meaning in a dictionary if you want.


Do not need a dictionary. Tri=three, via=way. It hints that there are many ways.


----------



## _martin (Mar 26, 2021)

I guess the conversation is stirred to something different now but it may be that this thread will be thrown at some user who's actually googling the problem. The explanations why this is happening are mentioned already and even how to work around it. There's also another way (of few others) how to delete the files with find:


```
# the numbers on left are the inode numbers
$ ls -lai
total 10
2011 -rw-r--r--   1 martin  martin   0 Mar 26 12:59  --what is?this
2001 drwxr-xr-x   2 martin  martin   3 Mar 26 12:59 .
   2 drwx------  16 martin  martin  41 Mar 26 12:59 ..
$

# find the inode number in question
$ find . -xdev -type f -inum 2011
./ --what is�this
$

# see what really is in the name of the file
$ find . -xdev -type f -inum 2011 -print0 | hd
00000000  2e 2f 20 2d 2d 77 68 61  74 20 69 73 88 74 68 69  |./ --what is.thi|
00000010  73 20 00                                          |s .|
00000013
$
# remove
$ find . -xdev -type f -inum 2011 -exec rm {} \;
```
The -xdev is important as you want to search only on a current FS. There can be two files with the same inum across the FS.


----------



## shkhln (Mar 26, 2021)

hruodr said:


> Do not need a dictionary. Tri=three, via=way. It hints that there are many ways.


One has to wonder how many other words you have arbitrarily redefined.


----------



## zirias@ (Mar 26, 2021)

SirDice said:


> Or how about the infamous 'wave' escape; `s/\/usr\/local\//\//`.


Most implementations (e.g. sed(1)) will allow any delimiter for e.g. replace commands. So, just write `s:/usr/local/:/:`


----------



## hruodr (Mar 26, 2021)

shkhln said:


> One has to wonder has many other words you have arbitrarily redefined.



From a classical, well respected dictionary (Georges, Ausführliches lateinisch-deutsches Handwörterbuch, 
1913-1918):



> triviālis, e (trivium), I) dreifältig, dreifach, germanitas, Arnob. 3, 34. – II) allgemein zugänglich, allbekannt, gewöhnlich, gemein, scientia, Quint.: ludii ex circo, Suet.: verba, Suet.: carmen, Iuven.: mos, Calp.


allgemein zugänglich = generally accessible.

It is you or your dictionary what is redefining the word.


----------



## SirDice (Mar 26, 2021)

Zirias said:


> Most implementations (e.g. sed(1)) will allow any delimiter for e.g. replace commands. So, just write `s:/usr/local/:/:`


I know  It was just an example of how weird things can get.


----------



## SirDice (Mar 26, 2021)

hruodr said:


> Do not need a dictionary. Tri=three, via=way. It hints that there are many ways.


That's not the correct way to interpret the word 'trivial', not the correct etymology either: https://www.merriam-webster.com/dictionary/trivial



hruodr said:


> allgemein zugänglich = generally accessible.


The correct translation would be 'commonly accessible'. _im Allgemeinen_ does translate to _In general_ or _generally_, but in this context it is translated to 'common' as in _im allgemeinen Interesse_ -> 'in the common interest'.


----------



## olli@ (Mar 26, 2021)

hruodr said:


> Under the supposition that Unix has a solution for everything. Yes, for example with a little of C code.


Did you seriously believe there might be no way to handle file names that begin with a dash? I don’t think so.
This is just a rearguard action on your part.
 
In an attempt to keep this thread on topic, a loosely related remark:
 
By the way, UNIX (more exactly: POSIX) file names are allowed to contan almost _any_ character, with only two obvious exceptions: the slash (`/`) that is used for separating components within a directory path, and the NUL byte that is used to terminate strings in the UNIX API. The first can be worked around by using unicode characters that look similar, for example “`∕`” (division slash, U+2215):

```
$ cd /tmp
$ mkdir test
$ cd test
$ touch this∕is∕allowed
$ ls -l
total 0
-rw-------  1 olli  wheel  0 Mar 26 14:49 this∕is∕allowed
```
(You should use an appropriate locale, of course, e.g. UTF-8.)


----------



## hruodr (Mar 26, 2021)

SirDice said:


> The correct translation would be 'commonly accessible'.


Yes, commonly is much better translation.

About the etymology: it is trivial. It is trivial how it got the present meaning, as early as in antiquity.
Perhaps shkhln needs a dictionary because he lives far away. It is a Latin word, it is living sure in a 
lot of romanic languages, it came sure to english with the french speaking normanic invasion of 
William the Bastard.


----------



## zirias@ (Mar 26, 2021)

olli@ said:


> The first can be worked around by using unicode characters that look similar, for example “`∕`” (division slash, U+2215):


In _that_ context, it might be worthwile to also mention that Unix filenames and filesystems by design know nothing about encoding, they are just `char` sequences. So, using anything outside US-ASCII:


olli@ said:


> (You should use an appropriate locale, of course, e.g. UTF-8.)


you should always use the _same_ locale that was in effect when naming the file


----------



## olli@ (Mar 26, 2021)

hruodr said:


> Yes, commonly is much better translation.
> 
> About the etymology: it is trivial. It is trivial how it got the present meaning, as early as in antiquity.
> Perhaps shkhln needs a dictionary because he lives far away.


Please try not to get personal.

Yes, the word “trivial” comes from the latin language and means “three ways”. But _no_, it does _not_ mean that a problem has three (or multiple) solutions, as you tried to guess. Rather, it relates to a place where three roads meet, so it is a public place where people gather together, that’s why “trivial” literally means and “a public place” and hence the meaning “commonplace”.


----------



## olli@ (Mar 26, 2021)

Zirias said:


> you should always use the _same_ locale that was in effect when naming the file


Yes, of course, that’s what I meant. You need to use UTF-8 for _both_ creating and using the file. That should go without saying.
Personally I use UTF-8 everywhere nowadays, so it’s not a problem.

Edit: By the way, UTF-8 is also compatible with POSIX semantics, because a valid UTF-8-encoded file name never contains bytes with the value 0x2F (`/`) or 0x00 (NUL). This does _not_ hold true for most other encodings, e.g. you cannot use UTF-16 for file names. Therefore UTF-8 is really the best choice.


----------



## zirias@ (Mar 26, 2021)

olli@ said:


> That should go without saying.


Not necessarily, e.g. NTFS on Windows always uses UTF-16 internally and the system does conversion to/from your current locale. So I think it's important to know Unix design is much simpler here 


olli@ said:


> This does not hold true for most other encodings


Except for UTF-16 and UCS-4, which other encodings would pose that problem? Definitely not those in ISO8859


----------



## hruodr (Mar 26, 2021)

olli@ said:


> But _no_, it does _not_ mean that a problem has three (or multiple) solutions, as you tried to guess.


Not in that strength. Not three, and may be not many. The place where the roads meet. The ways that
bring you to the solution. And even if it is only one long, steep way with a lot of obstacles, it remains:
trivial. And I did not try to guess: I do know what trivial means. I do not need to search the word in a
dictionary as I also do not search the word chair. And, using your own words, your opinion is wrong due to
ignorance.


----------



## olli@ (Mar 26, 2021)

Zirias said:


> Not necessarily, e.g. NTFS on Windows always uses UTF-16 internally and the system does conversion to/from your current locale. So I think it's important to know Unix design is much simpler here


Don’t get me started about NTFS … 
Windows 10 still disallows creating files named “`con:`” or “`<foo>`” (like MS-DOS, 40 years ago).


----------



## olli@ (Mar 26, 2021)

hruodr said:


> Not in that strength. Not three, and may be not many. The place where the roads meet. The ways that
> bring you to the solution.


That is your personal interpretation, but it is not the actual etymology of the word “trivial”. If you don’ believe SirDice, shkhln or me, _please_ look it up in a dictionary or on Wikipedia. I'm not trying to insult you, I'm just trying to explain what the correct facts are.


----------



## zirias@ (Mar 26, 2021)

olli@ said:


> Don’t get me started about NTFS …
> Windows 10 still disallows creating files named “`con:`” or “`<foo>`” (like MS-DOS, 40 years ago).


Well, _THAT_'s a different story and not really NTFS' fault. Interesting read about it:








						Read and Share Twitter Threads easily!
					

Thread Reader helps you read and share the best of Twitter Threads




					threadreaderapp.com


----------



## olli@ (Mar 26, 2021)

Zirias said:


> Well, _THAT_'s a different story and not really NTFS' fault.


Actually, it’s two unrelated issues. The first are reserved file names (like con, aux, lpt), the second are invalid characters (e.g. :, <, >, ?, * and a few others).
It’s true that it is not NTFS’ fault, it’s rather Windows’ fault, but NTFS suffers from it.

You _can_ create such file names when you attach an NTFS disk to a POSIX system. But when you move it back to a Windows system, it won’t be able to handle them.

Another source of trouble is the case-insensitivity of NTFS. But this really starts to get off-topic.


----------



## zirias@ (Mar 26, 2021)

olli@ said:


> It’s true that it is not NTFS’ fault, it’s rather Windows’ fault, but NTFS suffers from it.


Well, only when used in combination with win32. I didn't try, but would just assume you can happily name a file "con" when using the NT Kernel's interface instead of win32 (or maybe a different subsystem, if there were any…)

Anyways, seeing how this stems from the Unix idea of "device files", poorly adapted first to a very simplistic OS (CP/M), then just kept forever in an evolving OS (MS-DOS → Windows), is kind of amusing. I generally like when unnecessary breaking changes are avoided (one reason to prefer FreeBSD over Linux), but this seems a bit over the top 


olli@ said:


> Another source of trouble is the case-insensitivity of NTFS. But this really starts to get off-topic.


Yes, sorry. I guess I should leave here, don't see anything _on_ topic that wasn't already explained


----------



## hruodr (Mar 26, 2021)

olli@ said:


> That is your personal interpretation, but it is not the actual etymology of the word “trivial”.


No, it is not my personal interpretation. I have it from someone else. And my source is not some dictionary
or wikipedia.

I also know, how etymology is done. It is difficult to speak about facts. But in your discourse there are absolute
facts or opinions acording to your arbitrariness.


----------



## SirDice (Mar 26, 2021)

Sigh, seriously. You pointed to a description from an dictionary from 1918. Languages evolve. I'm pretty sure that dictionary from 1918 has words in it that are not in use any more. And it probably doesn't have a whole bunch of new words that are used nowadays. There are a lot of words in _any_ language that changed meaning in the past 100 years.


----------



## ralphbsz (Mar 26, 2021)

hruodr said:


> So simple, im, thanks. Full path.
> 
> Buit still I wonder, why names got by glob are still seen as options. That may be risky.



It is risky. It is a bad thing. But since it has been standardized and gone over for about 30 or 40 years, it is considered correct behavior. Matter-of-fact, in the early 90s someone published a book called "The Unix haters handbook", and the example of having a file called "-Rf" in your directory and then doing "rm *" is chapter 1 of that book.

I consider it to be a nasty design flaw. In a well-designed system, this should not happen; there should be a clear distinction between options/flags/parameters and arguments to commands. Commands should be able to read what the command line was *before* globbing if they want to, and perform globbing only if appropriate, and how. But that's not the way Ritchie and Thompson wrote the first version, and their bad decision and shortcut has now become enshrined. Sad.

Other examples where this goes wrong: "find . -name foo*" fails if there are multiple files whose names start with foo in the current directory, while the intent of the find command is to find all files whose names start with foo. Or "echo Should we go to lunch ?" will spit out gibberish if the current directory contains files that have single-character file names, while "echo Should we go to lunch?" only does if the current directory contains files that are named "lunch" plus a single character. In the presence of unicode, all that goes completely insane, because the definition of "single character" becomes complex: what looks on the screen like a single character might not actually be one. And getting back: This command

```
echo "Should we go to lunch ?"
```
will work, independent of what files exist in the current directory. But that's not intuitive. Why do users need to learn how globbing and the shell work internally, and complicated quoting rules?

Well-designed operating systems (such as VMS or MVS/TSO) do not have these problems in their shells. But for Unix, this is water under the bridge.


----------



## ralphbsz (Mar 26, 2021)

olli@ said:


> Don’t get me started about NTFS …
> Windows 10 still disallows creating files named “`con:`” or “`<foo>`” (like MS-DOS, 40 years ago).


And I actually think this is a good thing. I think file system standards should disallow creating file names that are likely to cause confusion. Like ones that contain newlines, invisible special characters, or non-normalized unicode sequences. For example, I can create a file whose name is a single space. You do ls, you don't see it, but the count of files is off by one. I don't think this should be allowed.

One of my favorite examples: It is possible to create two files in a directory, one whose name is "a with acute accent", and the second one has a name that is two characters, namely "a" and "overstrike acute accent". Those are separate files, but both file names look the same on the screen, even if you quote them. The even more insidious version of this is to use the two-character sequence "turkish i without dot" and "overstrike accent dot", which looks exactly like an "i", and it doesn't even have any accents. So now you have to files that look like they have the same name, and the only way to distinguish them is "ls -1 | hexdump -C". This is just insane.

But then, Unix has been this way for 50 years, it can't be changed now, it has such a high market share that no replacement will ever be created, and we'll live with it.


----------



## SirDice (Mar 26, 2021)

ralphbsz said:


> For example, I can create a file whose name is a single space.


I recently ran into something similar. Some id-ten-t developer had created a file with a space at the end of the filename. And some of my colleagues couldn't figure out why they couldn't get rid of it


----------



## scottro (Mar 26, 2021)

In the early 2000's I worked at a fashion company. One big problem was that designers would accidentally create files ending with jpg<space>. So, we'd wind up with directories having files like 123.jpg 123.jpg.  Eventually, I think I did cat -e on the list and renamed the oddly named files. It was funny but not funny.  They created these files on Windows and/or Mac, but as has been pointed out, Unix and Unix like systems allow it too.


----------



## SirDice (Mar 26, 2021)

I'm going to add this to my list of things to do for new  junior admins. Create a bunch of weird files then ask them to delete some of them. See how long it'll takes them to figure it out. Like we used to send out newbies to reproduction to fetch a stack of 'landscape' paper.


----------



## _martin (Mar 26, 2021)

SirDice said:


> I'm going to add this to my list of things to do for new junior admins.


We used to have a small test for our junior admins at uni at campus. We did `touch asd && chflags schg asd`, gave them root and told them to remove the asd file. It used to be kinda fun.


----------



## hruodr (Mar 26, 2021)

ralphbsz said:


> I consider it to be a nasty design flaw.


I would not condemn it so hard, it is the price of some simplicity that I like, but conscience of the risk 
is necessary. As said, not only beginners could do stupid things.


----------



## hruodr (Mar 26, 2021)

SirDice said:


> Sigh, seriously. You pointed to a description from an dictionary from 1918. Languages evolve. I'm pretty sure that dictionary from 1918 has words in it that are not in use any more. And it probably doesn't have a whole bunch of new words that are used nowadays. There are a lot of words in _any_ language that changed meaning in the past 100 years.


It is one of the best latin-german dictionaries. Classical latin, not even middle latin. Should classical
latin have changed in the past 100 years?! Yes, the word "trivial" has perhaps also a new despective
meaning. It is clearly a metaphor, beginning with speaking of a "way" when solving problems you
see a metaphor. Perhaps it is not so clear how exactly it arose, if it is a truth from the street (trivium)
as @oli want to see it or a more comprehensive metaphor, but the meaning is clear and coincides
with what Georges says about the old word.

I ask me, if @oli and all those computer scientists that understand words literally are able to see the
relation between "ready" and "to ride" (bereit und reiten, fertig und fahren), or never noted it.


----------



## hruodr (Mar 26, 2021)

ralphbsz said:


> But then, Unix has been this way for 50 years, it can't be changed now, it has such a high market share that no replacement will ever be created, and we'll live with it.


Yes. This is a strange thing. Unix is good enough, so that there is no place for something better.


----------



## SirDice (Mar 26, 2021)

hruodr said:


> Classical latin, not even middle latin. Should classical
> latin have changed in the past 100 years?!


Classic Latin is a dead language. While lots of European Romance languages (French, Spanish, Italian, etc) are certainly based on that (and by extension other languages that picked up Latin words) their exact meaning or representation evolves over time. I'm pretty sure you wouldn't be able to understand _Althochdeutsch_ (Old High German) just as much I can't understand _Oudnederlands_ (Old Dutch). Heck, even in modern times I sometimes barely understand what someone from Limburg or Brabant says, I certainly don't know any Frisian (which is a whole different language of its own).


----------



## hruodr (Mar 26, 2021)

SirDice said:


> I'm pretty sure you wouldn't be able to understand _Althochdeutsch_


Yes. I can read that. With more effort old saxon. I do not know if old low frankish.
But definitively not old english, old nordic, or frisian.


----------



## SirDice (Mar 26, 2021)

hruodr said:


> Yes. I can read that. With more effort old saxon. I do not know if old low frankish.


Ok, well I assumed you couldn't because the majority of people don't regularly read old texts. But if you do understand it you should be able to understand it's quite different from modern German because the German language evolved over time. English is an even weirder language because it's basically three or four different languages rolled into one which then got influenced by a lot of other European languages (including but not limited to the Romance and Latin languages). And in modern times even American English and British English are slightly different, not only in spelling (color vs. colour for example) but also in words and meaning (trainers vs. sneakers).


----------



## hruodr (Mar 26, 2021)

SirDice said:


> But if you do understand it you should be able to understand it's quite different from modern German because the German language evolved over time.


I have the impression that the oldest traded big text, old german tatian, is the easiest to read. Of course
it is a german column and a latin column, it is a known text (gospel harmony), but also its dialect, ostfränkisch,
helps. And Notker, alemannisch, is much easier to understand than modern swiss german. But yes, one needs 
from time to time a dictionary, one must know a little of the gramar.


----------



## mefizto (Mar 27, 2021)

Hi SirDice, scottro,

could you please clarify what is the problem with a filename/folder with a space at the end?  I would like to clean files inherited from Windows machine, with spaces both at the end and within the filename.  To test deletion/renaming, I created - as a user - a file, a filename of which ends with space, and had no problem to delete it or rename (`mv`) it.

Kindest regards,

M


----------



## Deleted member 30996 (Mar 27, 2021)

hruodr said:


> Your strong opinions are always based on right understanding, because you always understand things right
> and your understanding will never appear later to be wrong.


You can't be offended when you're wrong and confronted with the facts or it falls on you.


----------



## hruodr (Mar 27, 2021)

Trihexagonal said:


> You can't be offended when you're wrong and confronted with the facts or it falls on you.


I do not understand what you mean. Perhaps you did not understand the context?

If the last is true, what would you think if someone reproaches you of having said the above strong opinion
being wrong about your understanding of the context?

That was exactly the context. Omniscient people was reproaching me not being omniscient as they are.

The issue of the facts became clear almost at the beginning of this thread. More clarity brought later
only ralfbsz. I was and am not offended, only amused of the computer scientific bigotry.

Well, perhaps you understand now that you did not understand the context. But as you said your "strong
opinion", you did not know it. You learn, hence you are also not omniscient.


----------



## Deleted member 30996 (Mar 27, 2021)

hruodr said:


> Well, perhaps you understand now that you did not understand the context. But as you said your "strong
> opinion", you did not know it. You learn, hence you are also not omniscient.


Well perhaps you didn't understand what I meant. 

I meant that when you're wrong and bitch about being confronted with the facts it's you that looks the part.


----------



## hruodr (Mar 27, 2021)

Trihexagonal said:


> I meant that when you're wrong and bitch about being confronted with the facts it's you that looks the part.


I still do not understand. I have absolute no problem of being confronted with the facts. It is absolutely no
offense. The essence of a dialog is contradiction: thesis–antithesis-synthesis.

Or were you only enunciating a general principle, just so, out of the blue?


----------



## ralphbsz (Mar 28, 2021)

mefizto said:


> could you please clarify what is the problem with a filename/folder with a space at the end?


I'm not the people you asked, but here are the two problems I experience.

First, confusion. Imagine I create two files, one named "a", the other named "a_space"_. I do ls, I see two files, both look like their names are just a. That can't be, there can't be two files with the same name! Or I create three files, named "a", "b", and "_space_". I do ls, all I see is two files.

The other problem is in scripts: when parsing script lines (which are after all just clear text, where strings are mixed with key words and values), file names with spaces get parsed as two separate words. Try this (in an empty directory), and I'm using sh/bash syntax for loops (not csh/tcsh syntax), and ">" is the shell prompt:

```
> touch a b "c d"
# You have created three files: "a", "b", and "c d"

stat -r *
16777221 91145641 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 a
16777221 91145642 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 b
16777221 91145643 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 c d
# This works, because the shell expands * to three strings, and passes the three strings to the stat command.

> for f in *
> do
>   stat $f
> done
16777221 91145641 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 a
16777221 91145642 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 b
stat: c: stat: No such file or directory
stat: d: stat: No such file or directory
# This doesn't work, because the "*" in shell was expanded into three strings, but when the string "c d" was passed
# to stat, the shell parsed it as "stat c d", which means: stat the files c and d.

> for f in *
> do
>   stat "$f"
> done
16777221 91145641 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 a
16777221 91145642 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 b
16777221 91145643 0100644 1 568476 89939 0 0 1616905731 1616905731 1616905731 1616905731 4096 0 0 c d
# Great, now it works.
```

So you always have to remember to quote all strings that have been passed around, to preserve the spaces. But careful: everytime you use a string that is quoted, the quotes get stripped off. So you always add quotes back in. This gets particularly tedious when you pass strings that are file names to other shell invocations (for example running commands via ssh on other nodes). It's terrible time-consuming, hard to debug, and error prone.

For that reason, I actually think that file names should not be allowed to contain spaces (or other invisible or undisplayable characters), nor special characters like quotes of all types, $-signs, redirect characters like ">". Personally, I would not even allow two files to exist in the same directory whose only difference is the case (so file names should be case blind but case preserving). So if you create a file called "Elephant", it should always be returned as "Elephant", but it should be illegal to create files whose names are "elephant" or "elePhant". All that would prevent so many little annoying problems. But I know that Unix diehards disagree with my opinion.


----------



## mefizto (Mar 28, 2021)

Hi ralphbsz,

thank you very much for the examples and detailed explanation.

I have been reading upon that because I have received (useful) data, but with directory and filenames riddled with not UTF-8 characters, spaces and other characters, mixture of capital and lower case letters, etc.  Because there are literally thousands or perhaps even hundred thousands of directories/files, it would be impossible to sort it out by hand.

So I started writing a script its main goal being to  correct the above, but it is not as simple as some of the script snippets pretend.  Cf. https://dwheeler.com/essays/filenames-in-shell.html.

So, I am really unsure how to approach it.

Kindest regards,

M


----------



## scottro (Mar 28, 2021)

Hi mefizto,
I think that ralphbsz explained the problem well. My specific problem was, that designers were creating files, and we would have two files, one called 123.jpg and one called 123.jpg<space>.  A week or so later, they needed to pull the file--they were using software designed for the fashion industry, which had records, and if their boss said, I need # 123.jpg, they would't know which style to pull and show their boss. Sometimes, (often, if I remember correctly), they would be duplicates.  This was the only time I ran into this, and it was over 10 years ago, so I don't clearly remember the details. But the trouble is, for me, that if trying to track down a file, I wouldn't be able to distinguish between one with the space at the end and the one without the space.)
I also vaguely remember having a script to take created images, resize them, and put them somewhere, and those images with the space at the end would get missed.


----------



## mefizto (Mar 28, 2021)

Hi scottro,

thank you for the reply.  Yes, ralphbsz did a great job; I originally addressed you and SirDice, because you had dealt with the problem, and I know that some of at least the directories have a space at the end, so I was hoping that yoou might remember how you detected it and remove it, as it does not seem to be as easy as my naive brain thought.
Kindest regards,
M


----------



## scottro (Mar 28, 2021)

I think I put it in my post, I did something like cd into the directory and did `ls * |cat -e`. This puts a $ at the end of each file name so I would get things like 

```
123.jpg$
123.jpg $
```
Then I was able to remove the offending files


----------



## mefizto (Mar 28, 2021)

Hi scottro,

you indeed did.  However, few articles that I read, _e.g._, https://mywiki.wooledge.org/ParsingLs, advise against parsing the `ls`.  Since this topic is new to me, and the entire data set I have is already a mess, I do not want to make it worse.

So, more reading on my part.

Kindest regards,

M


----------



## Deleted member 30996 (Mar 29, 2021)

One of the first things I learned was not to use spaces in a file name.

I use underscores or hyphens, letters and numbers all saved with UTF-8 encoding.


----------



## hruodr (Mar 29, 2021)

And I in addition avoid Non-Ascii. I never put consciously an hyphen at the beginning. If filenames 
were restricted, I would not miss much.


----------



## decuser (Mar 30, 2021)

SirDice said:


> That's what the `--` is for. `rm -- "${file}"`


I had to read this thread CAREFULLY before I understood it. Thanks to im's answer, and this clarification, I have a new tool for my arsenal --, duh, RTFM .


----------



## Deleted member 30996 (Mar 30, 2021)

hruodr said:


> And I in addition avoid Non-Ascii. I never put consciously an hyphen at the beginning. If filenames
> were restricted, I would not miss much.


All the saved styles I copy from USB stick to /local/share/fluxbox/styles/ as saved as US-ASCII. 
I never use a space or hyphen to begin a file name though.


----------



## memreflect (Mar 30, 2021)

To answer the original question, expressions that begin with wildcards should start with ./, as in ./*tmp-3 or ./???.  This even works in shells like zsh that support non-standard wildcards like **.

As for listing files containing non-graphical characters (e.g. ASCII blank/space and control characters like horizontal-tab and line-feed), `env ls -q ./*[![:graph:]]*` should work (env(1) is necessary to avoid shell aliases/functions).  If you need to handle file names in a machine-parsable way, just use an expression containing wildcards like ./*.

Of course, if you disabled pathname expansion using `set -f` in your script, wildcards are unavailable anyway, and you'll need to find an alternative solution that works for you (e.g. temporarily enabling pathname expansion, saving the results of a wildcard expression to a bash array, and using that array as necessary).

Aside from the advice about not parsing ls(1) output, there are many other bits of advice about shell scripting that basically say "don't do this" or "it's best to do <X> this way when possible."  For example, `cd bin` has two different behaviors, depending on whether CDPATH is set or unset:

CDPATH unset
./bin exists - `cd ./bin`
else - "No such file or directory" error

CDPATH=$HOME:. (note: . must be explicitly included for the current working directory to be searched)
$HOME/bin exists - `cd $HOME/bin`
./bin exists - `cd ./bin`
else - "No such file or directory" error

Executing `cd ./bin` limits the potential for accessing the incorrect directory, though you may want to use `cd ./bin >/dev/null` to avoid unnecessary output when CDPATH is set.  devel/hs-ShellCheck is useful for detecting many possible issues, but I'm pretty certain this isn't one of the things it catches; I don't think there are a lot of scripts that get tripped up by CDPATH because it doesn't seem to be used much.  Depending on what you script does, you probably can also just `unset -v CDPATH` as well.  I guess that's yet another thing to include before you actually start writing a script 

Welcome to shell—we don't have nasal demons, but you can still get burned


----------



## hruodr (Mar 31, 2021)

memreflect said:


> `env ls -q ./*[![:graph:]]*` should work


Does shell support real regular expressions?! The wildcard would need a ".".


----------



## SirDice (Mar 31, 2021)

hruodr said:


> Does shell support real regular expressions?!


It's not a regular expression, it's a character class. Regular expressions can use character classes too, that's true. But a character class doesn't mean it's a regular expression.


```
An asterisk (`*') matches any string of characters.  A question mark
     (`?') matches any single character.  A left bracket (`[') introduces a
     character class.  The end of the character class is indicated by a `]';
     if the `]' is missing then the `[' matches a `[' rather than introducing
     a character class.  A character class matches any of the characters
     between the square brackets.  A locale-dependent range of characters may
     be specified using a minus sign.  A named class of characters (see
     wctype(3)) may be specified by surrounding the name with `[:' and `:]'.
     For example, `[[:alpha:]]' is a shell pattern that matches a single
     letter.  The character class may be complemented by making an exclamation
     point (`!') the first character of the character class.  A caret (`^')
     has the same effect but is non-standard.
```


----------



## olli@ (Apr 6, 2021)

Personally I use all kinds of characters in file names, including spaces and UTF-8 characters. I really don’t see a reason to restrict myself in this regard. I mean, we’re not in the 80s anymore, and we’re not using MS-DOS anymore. FreeBSD has great support for unicode, wide characters and UTF-8. For example, a friend of mine has an “ü” in his name, so when I have a file name that contains his name, well, then I need UTF-8 for that. I’m certainly not going to cripple his name by restricting myself to US-ASCII like in the previous century. UTF-8 exists for 25 years, there is no excuse not to support it.

Regarding spaces in file names – I think this is just normal. Windows and Mac people use spaces in file names for ages. Why shouldn’t we? Is UNIX too dumb to handle it properly? No, BSD and Linux (and others) have zero problems with that. There sometimes is 3rd-party software that trips over it, but that’s clearly a bug in that software that needs to be fixed. In most programming languages, a file name is just a string, and it doesn’t care what kind of characters it contains. The only problem are macro languages and batch languages like the bourne shell, but when you take a course in shell programming, using proper quoting is one of the very first things that you’re taught. My own shell scripts are always space-save (they _have_ to be, as I am not hesitant to use spaces in file names myself).

Having spaces at the _end_ of file names might be a different matter. Personally I haven’t done that so far, just because I didn’t have a reason to, but I don’t think it should be forbidden. Someone might come up with a use for it. It’s not the UNIX philosophy to forbid things. As Doug Gwyn phrased it: “UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.”

By the way, when handling file names with spaces in the shell, you don’t have to care about the quoting yourself most of the time. Modern shells (zsh, bash) automatically add quotes or escapes when necessary, when using the file name completion features (usually bound to the `Tab` key). For example (zsh):

```
$ touch foo "foo "
$ ls foo<Tab>
foo    foo\
```
Pressing the `<Tab>` key repeatedly cycles between the “`foo`” and “`foo\`” completions.


----------

