Binary Representation of Source Code

Represents the source code. Is not any kind of executable code (ASM/bytecode). Not an IR.
Take any valid text source code, turn it into the binary representation and back again and end up with the same byte for byte file.
Not storing individual token (ie no LEFT_BRACE). But do need to keep things like whitespace and comments.
Edit source code not text.
But still allows for people to use standard text editors.
Also allows for non-text sourcecode specific editors.
- Quick and efficient editing of the binary format (ie quickgo/quickrust concept programs).
- Graphically represent source code (not the same as a graphical programming language, ie blocky, just an eaiser way to see read code).
- Having things like frames around things like data structures and function definitions.
- Could have UML like representations (Not advocating for UML specifically, but it's a possibility).
- Easy/quick navigation of source code. Things like goto definition would be much easier to represent.
Makes tooling much easier. Can allow for libraries for manipulation of the code that tooling can use.
Down side, any time you have an invalid syntax everything breaks. But that happens anyway with normal code...
Could use a virtual filesystem to automatically convert stored binary to text or visa versa.
Any text you edit could basically have any syntax you like, although obviously a standardised version would be best.
Could allow for syntax changes.
Could allow for special keywords for editing with a basic text editor (ie 'def myfunctionname' could be hooked to actually insert a function definition nearby on file save and the 'def' keyword removed).
Would be easier with a well defined syntax for the source code (ie define tabs vs spaces, number of newlines between functions).
But might be better to just store tabs/spaces and newlines in the binary format.

Schemas not 'data structures'

struct definitions are normally mixed in with the the procedural instruction source code.
Structures are a binding of **data types** to **variable names**.
Separate the **representation** from the **implementation**.
- Standard native in memory with the same performance and so on.
  - Allow for separate memory layout. Some arch (for example Cell processors require memory padding).
  - In memory ordering.
  - Endianness?.
- Serilization.
- Database backed.
Older OOP languages like C++ and Java also bind **methods/member functions** to **data structures**.
Newer languages like Rust and Go move away from OOP and use interfaces (ie traits) primarily.
Conceptually design it as **API's** not bound functions.
- Allow for standard native function calls, or RPC calls, etc... IPC or networked.
- Some kind of distributed backend (Raft, Blockchain)
Could allow security definitions in the schema. Ie, who can edit this variable. Allows you to separate the security implementation stuff from the data structure.

API Versioning

API version as a hash of the binary representation of the API?
- Need to deal with non-breaking things. Like changing the order of functions.
Function definitions and the like could be tracked, and breaking changes to syntax noted automatically.
- Allow adding fields with defaults without api change.
- Allow optional named arguments.
Implementation stuff is harder (ie we changed the format of the string this function returns but the signature is the same).
- Functions that have no source code changes can be safely ignored.
- Changing the implementation doesn't mean the result is different (ie optimisation).
- Changing the implementation of a function could accidentally change the result (bug). Being told when that happens is handy.
- Allow specifying functions for specific API versions so if you do change the implementation you can keep backwards compatibility.
- How do consumers choose which version (ie specific version they used, or 'latest'?)... Compiled binaries could keep a list of the api version used.
- Unit tests could provide a hint. (ie if this unit test changed...), but doing something like adding an extra test or changing the order doesn't mean the implementation's result is different.
  - Automatic 'quickcheck' when possible? Compiler can implement a unittest with no effort from the programmer and log results. But you won't know when it's possible (ie halting problem, use of globals/statics, side effects, etc...). Maybe just best effort (ie if it didn't finish in 1 second and/or used more than 512kb of ram, kill the test). Don't store the result of tests that returns a lot of stuff. Do store the meta information about killed tests and the number of items returned (or even better a hash of the items returned, pointers would be a pain though...).
  - 'quickbench'? To benchmark performance? Obvious problems of different hardware but could still be useful. Probably not for API versioning

Everything works as a module/library

Main is just a function gets passed an os.args, stdin, stdout, etc... into.

Everything as an interface

For example, file access.
- No global fopen("filename");
- Instead open(os::filesystem, "filename"). Although a os.open wrapper could be used for the lazy, maybe it should be avoided since it's use should be discouraged, especially in libraries. You don't want to use a library that forces a config file to be stored in a specific location when you want to use a database as a backend store for configuration files.
- In many ways a file over a network connection is the same as a file on a harddisk.
  - A hdd can die, become full or be removed. A network cable can be unplugged (plus the files on the other end are going to be stored on a harddrive anyway which have the same problem).
  - Differences?
    - Metadata stuff.
      - Rename files, can't normally rename HTTP. Renaming doesn't effect the actual file, it effects the filesystem's index.
      - Linking files. Can't link a HTTP file, locally (well not without the OS doing it, or a virtual filesystem library but that's out of the scope of a programming language). Once again not about files, but about the underlying filesystem.
      - Timestamps, ditto.
      - synchronisation and atomic operations. A file stored on a disk can be fsynced so you know it's stored. An atomic operation can allow a file to be replaced in place. A database on the otherhand might be holding the file in ram.
      - Files are lockable. Prevent multiple processes trying to write to the same file at the same time.
println("This is bad as it's pain to override");
- fprintln(stdout, "This is better but more typing");
- Monkey patching can work but deals with globals which adds threading issues...

Programming Language

Contents

Binary Representation of Source Code

Schemas not 'data structures'

API Versioning

Everything works as a module/library

Everything as an interface

Navigation menu

Programming Language

Binary Representation of Source Code

Schemas not 'data structures'

API Versioning

Everything works as a module/library

Everything as an interface

Navigation menu

Search