• Welcome to Battlezone Universe.
 

News:

Welcome to the BZU Archive dated December 24, 2009. Topics and posts are in read-only mode. Those with accounts will be able to login and browse anything the account had access granted to at the time. No changes to permissions will be made to be given access to particular content. If you have any questions, please reach out to squirrelof09/Rapazzini.

Main Menu

OLang

Started by OvermindDL1, August 25, 2008, 05:19:54 PM

Previous topic - Next topic

OvermindDL1

Was just going to ask a few questions of GSH over PM, but figured it would be nice to get some thoughts from other (up'n'coming) programmers around here.

I have been making a new language with a C style syntax, but following the Actor model, rather then the OO model.  I am horrible at coming up with names, so I have been asking for thoughts through designing the syntax of the language.  Right now I am on trying to decide whether I should force the programmer to state whether a function is actor based (a continuation in a sense) or not; I can easily have it not be, but forcing them to state so or not helps with documentation.  If I do choose a word, what word would be good to use?

Either way, I intend to make another scriptor for BZ2, not for the sense that I expect people to use it, but rather that it is a perfect scenario to test it in as speed is a concern and it uses an event based model (although I wish there were far more events generated instead of just 'testing' for things...).  So as it is designed, people could play with it...

Either way, the link to the discussion is currently:  http://www.overminddl1.com/forum/index.php/topic,845.0.html


Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


Red Devil

What box???

Nielk1


Click on the image...

OvermindDL1

Actually asking more about syntax design rather then name. :P

Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


Red Devil

 I don't care so much about syntax, because that can be learned easily enough.  What I absolutely detest are cyrptic error messages that don't point to what the actual the problem is.
What box???

OvermindDL1

I am trying to integrate as much of the lexer and parser together as possible, will make error messages easier to create as well as making the parsing nice and fast (it is designed to be loaded in time-sensitive things, like game engines or other scripting things).  As such I think I am going to have to make a compromise on a few things, like using header files (probably call them something else, header is not 'right').

I am actually kind of split on some of the grammar right now as well.  See, some constructs like "if", and "while", and "for", and so forth are very explicit, I *know* that certain things are coming next, making it much better to create error messages for, but some things like how you define a C style function (retType funcName(funcArgType, funcArgName, ...)) makes it near impossible to parse until you have just about all of it, but the python method of putting  "def", or the lua method of putting "function" in front of it well defines it so you know what to expect.  Problem is that the C-style is so well known and so prevalent (everything from Java to C# copies it) that it may become more of an issue for others to learn.  Half-tempted to just throw out such ideas and do what I want, problem is that it will then become this weird combination of C, Erlang, Ada, Pascel, and a few other things no doubt, just because they each do something 'correct'.  One nice thing about keeping it a near pure C-style is that existing IDE's would be able to work well with it then.  I am rather addicted to Visual Assist, cannot really stand programming without it anymore, just makes things so much faster/easier and keeping the C syntax would allow it to still work with this thing.

Thing is, if someone was defining a C-style function, but something was wrong with their syntax then it would just bail out of that entire grammar path and try something else for it to fit, if nothing did then it would say there is invalid grammar for that whole section.  I can do a few work-arounds, such as enforcing no back-tracking when it hits the first open parenthesis, so it would error out there if there was something wrong inside of it, but that would still not work if there was something wrong in the declaration itself, like in the return type since it is specified first.

So, as you can see, it is not so much of an issue of 'what' the grammar should be, but how best to create it so it parses clean while giving sensible results for both good and bad input.

Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


Red Devil

Keeping the C syntax might be convenient, but it then it'd still be C, basically.  Don't bother yourself with people adjusting.  If it's intuitive, people will learn it easy enough. 

I want something as powerful as Assembly, as simple as BASIC, and as widespread as C/C++.
What box???

OvermindDL1

As stated though, I am basing it on an Actor model, meaning the syntax of the language is going to change, a lot, but the part I am talking about is the grammar.  Here are some major differences between this and C so far (all subject to change of course):
OLang (working title)C
Does not allow global mutable variables (constants are fine since they do not change)Has global mutable variables
Pointer integer math is disallowedPointer integer math is allowed (always considered a design fault, huge source of bugs, note this is not all pointer math, just those involving direct conversion to integers)
Pointers, thus far, are only allowed in function signatures, the argument list to be exact (still wondering if I should allow direct pointer access, or have them be treated like references, will probably just do them like references), so cannot use them to declare an arbitrary typePointer's are allowed to be used just about anywhere a type can be declared
Function calls as C do exist, but only if the function does not have any point in it where it is at all capable of being suspended (described in a sec)Function calls are pretty basic, stuff return address on stack, stuff arguments on stack, call/jump to function
Function calls are heavily tail-recursive capableI have not met a C compiler yet that even really knows what tail-recursive is, which makes sense since it is not part of the standard
A callstack can be 'unhooked' from the current executable queue, rather any function that is capable of or calls any function that is capable of unhooking the callstack actually does not use the CPU's stack (well it will for variables that do not need to be persisted), but the compiler determines how much 'stack' space a callstack will at most need when calling a new function as a new 'tasklet' (going in stackless python terms) and will allocate that from the heap (maybe a pool or a GC, have not decided yet) and it will be implicitly passed along the callstack of function calls as an initial parameter (I was also thinking of having it be 'stackless' in such a way that each new function call would allocate a new little stack just necessary for that function to operate, and it would work well and the performance issue of malloc'ing from the heap could be minimized by using a pool or GC, I am heavily leaning toward having the memory trade-off of just allocating it all at once, there will be fewer internal pointer usages and it will be slightly faster overall, not to mention it allows for inlining opportunities that the individual method really works against).  In effect, this language is fully coroutinable, just that it will be mostly hidden to the user layer due to the event systemYou can 'unhook' a callstack in C by using things like longjmp, but you tend to lose that callstack as well, eventually overwriting it, to do the same thing in C as to the left involves a lot of extra code and a rather large performance hit (and in C++ it becomes near impossible to do well due to the exception mechanism not knowing about it)
Function callstacks are the main form of data storage here (yes yes, very 'functional' in design, but I am not taking it to the extreme that *every* mutable object is a function), to change data in another callstack then you need to call on something that the other callstack is listening to, rarely (if ever I am leaning toward) will you hold a communication channel with another channel, but rather will just hold a more generic communication channel that you do not know what it goes to, but will have a more well defined pipeline of communication (kind of a combination of Erlang and Stackless Python).  This makes it so multi-threading any given amount of these 'tasklets' to any number of cores, or even any number of machines (I am not going to keep an invisible distinction between tasklets on other 'cores' or tasklets on other machines, this will be visible to the end user, I found this system modeling language that does it well and will model after it a bit for this area, the purpoes of not hiding the 'length' of the channel is that you may want to async communicate on a channel if remote, and synchronously communicate if it is local, or whatever other else, and if you do not know what is what then your call could take a few instructions, or it could end up taking a few seconds if the net connection is slow, better to let teh programmer decide that on a case-by-case basis.The main method of data storage in C are structs (or classes in C++).  Anyone can alter anything at any time, this makes multi-threading in C (and other C style languages like Java and the .NET languages) very unwieldy, ugly, and hard to debug.  You have to take care to use synchronization primitives such as mutex's and semaphores', or use atomic operations if the variables are small enough, along with read/write barriers, and if you forget to use the same style *everywhere* a certain memory is accessed, then you can run into pretty hard to diagnose bugs and other errors (C and C style langugaes are *not* made for multi-threading).
The main way to communicate between 'objects' is to communicate over a channel, never have to worry about synchronization (as, quite frankly, this can never be enforced anyway in the Actor model, it is a boon to not have to worry about such things anyway, and games lend themselves perfectly to this design)The main way to communicate between 'objects' in C or C++ is to call 'member' functions or just set variables directly, again synchronization is an issue as you have to remember to do it everywhere
Syncronization is not an issue because if you do anything that can unhook a tasklet, and it does unhook while waiting for some sort of callback or message, then another tasklet will go ahead and instantly run, cooperative multi-threadingThere is no real threading library in C (and even in C++, only the latest standard, C++0x, has a threading library), but the two usual forms are to either 'fork' a process into multiple processes (which windows does not support, only Posix based systems), or to create a new thread that wraps an arbitrary function with a certain signature.  Synchronization is costly in such ways that it is both difficult to program properly, even when programmed properly race-conditions and deadlocks can easily occur, and when a thread does block on some synchronization primitive it 'freezes' the thread, and another thread may or may-not run depending on the time taken and when the operating system feels like it has a chance to switch threads (which in posix based systems is usually pretty fast, in Windows XP and earlier it sucks, but that is one thing Vista and higher fixed pretty well, Posix systems still do it faster though due to the design), and even when a thread is still active it can still be switched away, even though it may only be a couple of instructions away from finishing what it is doing) this is pre-emptive multithreading, things can change and switch at any time, it is slower, but generally easier to program for compared to cooperative multithreading, except for the Actor design which is designed to work with cooperative multithreading, making it even easier to work with then normal pre-emptive threading, which, honestly, is not too hard to do, pre-emptive is horrible to try to synchronize.
Each tasklet, when created, has a set amount of memory, and only enough memory to do what it needs (as stated though, I have still not completely settled on allocating all the memory up-front, or do it on a per function basis, I would love discussion on this), so each one can be rather small in size, potentially allowing thousands, if not hundreds of thousands or even more of these to run simultaneously.  These will still transparently be run on operating system threads though, but at a ratio of only one thread per cpu core (or whatever the embedder of the language decides to set it at).Using pure operating system threads as C and C type languages use are subject to vast restrictions, especially in memory.  The memory that a thread can allocate on creation can be set, but (at least on windows, I do not know about posix systems) the allocated memory will be the same as the stack size the program was initially compiled with (usually 2 or 4 megs).  That means that for every thread, they will take *at least* 2-4 megs of ram, and, at least on my system, creating upwards of a thousand threads causes the application to crash, vastly limiting the amount you can have (that is not what they were designed for anyway, there were designed to be a real 1-to-1 mapping to the number of cores in a computer, but so many people try to insist on having, like three, one for rendering, one for game logic, and one for everything else, which causes switching to occur on single or dual-core systems, and causes at least one core to always be idle on quad or higher core systems, very inefficient, especially as some of the threads can be idle waiting for others to finish up, still contributing to more cpu downtime, it is an easy to program, but, quite frankly, piss-poor design)

I would add more, but I need to leave.  Let me finish up by clearing something up first.  The functions that are capable of 'unhooking' in my language do not use any persistent stack (they only use it for very temporary values that will not need to be persistent across function calls), and due to not needing any real stack, this makes tail-calling really easy, meaning I can have functions call function call functions ad infinitum, while never using any stack, meaning that, for example, when one 'callstack' is 'unhooked' (do remember that its 'stack' is actually a heap/pool/gc allocated memory chunk that its top-most used memory location is passed as the first hidden variable in functions) then I do not need to longjmp or raise an exception or anything to get back down to unwind to call something else, nope, all I need to do is just tail-call the next resuming point.  This works because I split every function up at every point it can be unhooked, so a function like this (in pseudo-C):
int myFunc(float f, string s)
{
    int offset = 4;
    if(s=="something")
        offset = sendSomewhereMightUnhookReturnsInt(s);
    return (int)f+offset;
}


Would actually be split kind of like this:

void myFunc!entry(void *stackTop, float f, string s)
{
    // The int offset = 4 does not need to be placed here since it is not used yet
    // s does not need to be persisted, hence stored on the stackTop since it is only used here
    // the f does need to be persistent though...
    (float)(stackTop+=sizeof(float)) = f;
    if(s=="something")
    {
        (stackTop+=sizeof(FunctionPointer)) = (void*)&myFunc!SecondPoint;
        sendSomewhereMightUnhookReturnsInt(stackTop, s); // It will actually stuff its return value in what it calls
        // Note, this function never returns, in this style no function that can unhook will ever return, so we will not actually reach this point
    }
    myFunc!SecondPoint(stackTop, 4); // This is the only place the offset value is used, never before, so it can be inlined here
}

void myFunc!SecondPoint(void *stackTop, int returnValue) // Note, it accepts the return value of the function that cause the split
{
    // Only the previous 'f' was passed to me on the callstack, get it:
    float f = (float)(stackTop);
    stackTop -= sizeof(float);
    // Something wanted an int returned from the myFunc function, so we do this:
    void (*cb)(void*, int) = (void (*)(void*, int))stackTop;
    stackTop -= sizeof(FunctionPointer);  // And yes, many of these function pointers will disappear if the callstack is explicitly known by the compiler
    (*cb)(stackTop, (int)f+returnValue);
    // Never gets here
}


It is actually a little better looking when generated (I have a couple more implicit things, register usage is definite, etc...), this is just to get the style across.

I am also still debated on whether to use a C++ style exception mechanism that knows about the embedded Stack, or if I should use the Erlang method...  I could do both...  They would work well together...



Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


OvermindDL1

Also, for note, tail calls are different from normal function calls.  Up above where I said call/jump, a normal function is "call"ed, where a tail call capable function is one where you just "jump" to it, not altering the stack (no function pointer, no exception information, etc...), but you cannot really jump back easily, hence why you split it into different sub-functions of your one large function, so it can jump to a different spot again without worry of state.  Consequently, if anything wants dynamic stack allocation (the alloca function) in my language, that will be disallowed in functions that are tail called, they will have to do it on the heap, or just have a static array big enough for what they want, however alloca will be perfectly allowed in normal function calls (which is what most will probably be anyway).

Also, I was thinking while in lab class today, I could do without 'headers' if I just enforce that there can be no cyclic dependency, so if someone needs some kind of dependency that is cyclical, they could just move that part to a different file and import it separately into each of the other ones, so no biggie.  Would that be a good compromise?

Also, as for making things like 'array' types, I got to thinking that I could streamline the language a little more if I did out with array types and just have people use the array template, if a size was passed in then it would be static, if not it would be dynamic (like a C++ vector), either way it would have a size element and all sort (wonderful for template use), so it would be a little more wordy to define an array, but it is more explicit and it makes for fewer special cases for operator overloading as the operator would just be a member of the array template anyway.

Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


Red Devil

I think the more flexibility/options you give programmers, the better, but, at the same time, it may leave the door open to hackers - like you. :p

I'm starting to think that you're the guy that makes Skynet...
What box???

CmptrWz

No, he doesn't make Skynet. He just makes the tools that someone else uses to make Skynet. :P

Red Devil

But, doesn't he get killed by one of John's men then after that chess match?   :-o  You don't play chess, do you OM??
What box???

Nielk1

One thing I liked from VB, and there is little, was how in a function's parameters you could write ByRef and ByVal. It was a nice bit of plain English.

Click on the image...

OvermindDL1

#13
I used to play chess, but everyone around here could not beat me, except for one person, but he beat everyone (I never saw or heard of him losing, and he played probably thousands of games while he was here, a few a day for years).  Nowadays I play Go, a lot more thinking involved, Chess got too repetitive, Go is a lot more fun in my opinion, but I still suck at it.  I am up for a game though if anyone wants, Google "Panda Go" for a good computer Go multiplayer client.

And Nielk1, that is what the & means in C, without it means ByVal, with it means ByRef, and a lot easier to type.  I like to program fast and read code fast, adding more character makes that take longer forcing my eyes to scan over more data, and no I do not like to the extent Perl is, so do not worry about that. :P

EDIT:  And technically, a language like this (or erlang, or what-not) would be perfect for something like SkyNet to be made in since it can be near perfectly distributed.  :)

Generated by OvermindDL1's Signature Auto-Add Script via GreaseMonkey


Nielk1

I knew how c++ uses the &, I just wanted to point out I liked the plain English. And I'm sure the Value/Reference and the Values/Address methods have some internal differences too.

My bro was in the High school chess club, won a PC CD chess game there once, but I don't think he ever defeated the teacher who is actually internationally rates (not the best by far).

Click on the image...