# Unix philosophy, new ideas and brain storming



## hamad_al_marri (Aug 14, 2020)

Hi everyone,

As most of us know that unix philosophy is one of fascinating ideas where simple, well done small programs work together to accomplish one large task. However, as far I know that one problem with unix philosophy is that each program deals with plain text input/output. For example cat a.file | grep something. Maybe in 70s there wasn't good structured data such as xml, json, yaml and others, I don't know.

I was thinking what if the OS (say freebsd) adds one more file descriptor (STD IO) to STDIN, STDOUT, and STDERR. Let's call it STDSTRCT for structured data. This file descriptor (FD) is used by programs and it opens by default by the kernel for every new task just like STDIN, STDOUT, and STDERR. So the process can write/read from this FD for structured data. Let's assume we only deal with json format. Process A can print the following to STDOUT


```
1  Edward the Elder  United Kingdom  House of Wessex  899-925
2  Athelstan         United Kingdom  House of Wessex  925-940
3  Edmund            United Kingdom  House of Wessex  940-946
4  Edred             United Kingdom  House of Wessex  946-955
5  Edwy              United Kingdom  House of Wessex  955-959
```

Which is fine but also process A could write to STDSTRCT too the same data but this time in json format


```
[
  {
    "ID": 1,
    "Name": "Edward the Elder",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "899-925"
  },
  {
    "ID": 2,
    "Name": "Athelstan",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "925-940"
  },
  {
    "ID": 3,
    "Name": "Edmund",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "940-946"
  },
  {
    "ID": 4,
    "Name": "Edred",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "946-955"
  },
  {
    "ID": 5,
    "Name": "Edwy",
    "Country": "United Kingdom",
    "House": "House of Wessex",
    "Reign": "955-959"
  }
]
```

this json is not printed to STDOUT, it is just written to STDSTRUCT FD (let's say fd=3). Then if piped A | B, process B can handle both (optionally STDSTRUCT). Process B doesn't have to deal with STDSTRUCT input, it can just simply discard it. Or B has the chance to process json format or any structured data if B wants that (it is optional). Wouldn't be great to make this kind of changes to the kernel level so it becomes easy for developers to adapt processing structured data? I believe it is possible to do this technique with shm_open but it is kind of extra work for programs developers.  Regarding which format, maybe it needs a standard such as first 64 bytes or whatever written to STDSTRUCT are to indicate which format is the rest of the stream will be.

Please let me know about your opinions and ideas regarding dealing with structured data in unix way.

Thank you.


----------



## SirDice (Aug 14, 2020)

hamad_al_marri said:


> However, as far I know that one problem with unix philosophy is that each program deals with plain text input/output. For example cat a.file | grep something. Maybe in 70s there wasn't good structured data such as xml, json, yaml and others, I don't know.


XML, JSON, YAML and others are all _text_ based for a reason. Why would it need to be output to a different handle than STDOUT? Most applications have a simple switch to change the output format, if they support any other type of structured output.

This is a solution looking for a problem that isn't there.


----------



## George (Aug 14, 2020)

stdin, stdout, stderr are for text, yaah.
For structures, you can use /dev/ files via ioctls.
Or use other forms of interprocess communication (maybe a pipe? Or dbus?).


----------



## Mjölnir (Aug 14, 2020)

SirDice said:


> XML, JSON, YAML and others are all _text_ based for a reason. Why would it need to be output to a different handle than STDOUT? Most applications have a simple switch to change the output format, if they support any other type of structured output.


Because then the program reading from the pipe does not know about this switch and must guess the data format.  E.g. HFS filesystem tells the data format of a file in it's metadata.  Doing similar for the file descriptor mentioned by the OP would be possible.  I.e. have three metadata fields `fd.formats_available[]`, `fd.format_default`, `fd.format`.  The clipboard in KDE works this way.  E.g. you can copy text in a word processing app, and when you paste it in a console app, it's plain text, whereas when the app supports formatted text, it can take that from the clipboard. 


> This is a solution looking for a problem that isn't there.


But it could be an enhancement.


----------



## ljboiler (Aug 14, 2020)

mjollnir said:


> Because then the program reading from the pipe does not know about this switch and must guess the data format.


 Why would that 'reading' program need to know about the switch?   That 'reading' program should not be guessing the data format; it should be written and documented that it understands, by default, format X for input - use switch Y if the input is in format Y, switch Z if the input is in format Z...

Then you chain all the small, simple programs together with the appropriate switches to get the job done.


----------



## SirDice (Aug 14, 2020)

hamad_al_marri said:


> Then if piped A | B, process B can handle both (optionally STDSTRUCT).


There's only one STDIN. How is it supposed to switch between two outputs from the previous command? A pipe connects one output to one input, more specifically, it connects the STDOUT of the first to STDIN of the second. While you can certainly create a pipe(2) between two arbitrary filehandles in C or other languages, the shell specifically connects STDOUT to STDIN.


----------



## TracyTiger (Aug 14, 2020)

hamad_al_marri said:


> However, as far I know that one problem with unix philosophy is that each program deals with *plain* text input/output.
> 
> Please let me know about your opinions and ideas regarding dealing with structured data in unix way.



SirDice and others have given a good response to your suggestion and request for ideas.

I believe the primary issue here is that you've phrased the "philosophy" in a way that is misleading.  UNIX utilities have been historically built to deal with text, not restricted to "plain text" as you put it.

Most (all?) programs expect input to be "formatted/structured/arranged/..." in some manner so that they can do their job.  For example a program may look for a newline to help parse the input.

Another part of the UNIX "philosophy" or just program development in general, is the idea of creating general purpose tools.  Utilities/programs handle text.  You can "structure" the text any way you want (JSON, XML) and write utilities to handle those structures.  Those here-today gone-tomorrow structures don't need to be cluttering up the kernel.


----------



## TracyTiger (Aug 14, 2020)

mjollnir said:


> But it could be an enhancement.



There is room for improvement in the of UNIX magic numbers "system" (file(1)) but probably very difficult or impossible to overhaul in a practical way.


----------



## msplsh (Aug 14, 2020)

If you were able to "tag" the pipe or device via whatever method with a MIME type, that's going to get real tricky real fast.

* You'll have to set it when you open it
* People will want to change it once it's set
* Race conditions on checking the type if you can change it
* Building parsers for those types into the kernel (this will probably be the worst)

Maybe if you could essentially "subclass" things into stdout.json / stdout.ASN1 / stdout.CBOR and have a userspace parser kick into action, but then all those parsers are going to need to match the same standard API, and good luck getting anybody to agree on that.  Functionally, that's not going to be much different than `csv_program > csv2bdb > bdb_input_program`

Your "structured" data is basically a rowset.  What about hierarchal or tagged data?

Different formats exist because people have different needs.


----------



## hruodr (Aug 14, 2020)

Elazar said:


> stdin, stdout, stderr are for text, yaah.



And also for non-text.


----------

