Euphoria To C Translator


1. Introduction
2. Installation
3. C Compilers Supported
4. How to Run the Translator
5. Dynamic Link Libraries (Shared Libraries)
6. Executable Size and Compression
7. Interpreter vs. Translator
8. Legal Restrictions
9. The Complete Edition Translator
10. Frequently Asked Questions
11. Common Problems



1. Introduction

The Euphoria to C Translator will translate any Euphoria program into equivalent C source code.

There are versions of the translator for Windows, DOS, Linux and FreeBSD. After translating a Euphoria program to C, you can compile and link using one of the supported C compilers. This will give you an executable file that will typically run much faster than if you used the Euphoria interpreter.


2. Installation

It is assumed that you have already installed the Euphoria 2.4 Interpreter package on your system, and that your EUDIR and PATH environment variables are set correctly.

For each supported C compiler, on Windows, DOS, Linux or FreeBSD, you'll find a .ZIP file on the RDS site containing these files:

     1. the translator (one per platform)
           
           ec.exe   - DOS 
           ecw.exe  - Windows 
           ecu      - Linux 
           ecu      - FreeBSD
           
     2. a run-time library (one per C compiler)
           
           ec.lib   - DOS (Watcom)
           ec.a     - DOS (DJGPP)
           ecw.lib  - Windows (Watcom)
           ecwl.lib - Windows (Lcc)
           ecwb.lib - Windows (Borland)
           ecu.a    - Linux (GNU)
           ecu.a    - FreeBSD (GNU)
            
     3. (Watcom only) files to support the CauseWay DOS extender

           cwc.exe    - file compressor
           le23p.exe  - format converter
           cwstub.exe - the DOS extender
   
     4. (DJGPP only) the Allegro graphics library, compiled specially for
                     the translator. Download liballeg.zip
To install the Euphoria To C Translator, put the required .exe's and library files into your euphoria\bin directory. The euphoria\include directory already contains euphoria.h, a C include file needed by all translated programs.

Note:
The liballeg.zip file is special. After installing the DJGPP C compiler, you should unzip liballeg.zip and put liballeg.a into your DJGPP\LIB directory.


3. C Compilers Supported

The Translator currently works with GNU C on Linux or FreeBSD, with either Watcom C or DJGPP C on DOS, and with either Watcom C, Lcc or Borland 5.5 on Windows. The Watcom and GNU C implementations are 100% compatible with the Euphoria Interpreter. The others are about 99% compatible. We recommend Borland over Lcc. Borland compiles faster, produces better code, and has fewer bugs compared to Lcc.

The Translator has been tested with GNU C and the ncurses library available with Red Hat Linux 5.2 or later, and FreeBSD 4.5 or later.

It has been tested with Watcom C/C++ 9.5, 10.6 and 11.0. Watcom 11.0 is open source and free. Look for it at: http://www.openwatcom.org

The Watcom DOS32 package includes the CauseWay DOS extender and file compressor. CauseWay is now open source and free. You can find out more about it at: http://www.devoresoftware.com

emake.bat and objfiles.lnk will link in the CauseWay extender automatically. Other DOS extenders, such as DOS4GW, do not work well with the Translator.

The Translator looks for "WATCOM", "LCC", "BORLAND" or "DJGPP" as either environment variables or directories on your PATH. It will generate an emake.bat file that invokes the appropriate compiler and linker.

Notes:

  • Unlike Watcom, DJGPP does not map DOS low memory into the same segment as other memory. Machine code routines written for the Watcom-based Euphoria Interpreter or Translator will not work with DJGPP, and will likely crash if they try to access low memory, such as video memory. Euphoria's peek(), poke(), mem_copy(), mem_set() etc. will work correctly, as the Translator uses a special DJGPP macro to access low memory. You can port these machine code routines to DJGPP, but you'll need to consult the DJGPP docs for the possible ways of accessing low memory.

  • DJGPP fully supports long filenames for reading, writing and creation. Watcom doesn't support creation.

  • DJGPP supports a few more text modes, e.g. 35-line mode.

  • DJGPP lets the user abort a program at any time, by typing control-c.

  • The Lcc implementation ignores lock_file() and unlock_file(). They do nothing.

  • The Translator does not use Lcc's -O optimization flag in emake.bat. This flag is currently too unreliable. If you want to boost your speed, you can experiment by adding -O for some .c files. It will work ok in most cases. If it doesn't, please report the error to the Lcc developers, not to RDS.

  • Warnings are turned off when compiling with emake.bat. If you turn them on, you may see some harmless messages about variables declared but not used, labels defined but not used, function prototypes not declared etc.

  • On Windows, the Watcom linker issues a warning that it can't open graph.lib. You can ignore this. graph.lib is not used. There doesn't seem to be an easy way to suppress this message.

  • The Microsoft C++ compiler for Windows is not yet supported. However you can probably import the C files generated by ecw.exe, and the run-time library file for Borland, Lcc or Watcom into a Microsoft project, and compile/link with only minor glitches.


4. How to Run the Translator

Running the Translator is similar to running the Interpreter. On DOS you would type:

       ec shell.ex
    or 
       ec shell
 
but instead of running the shell.ex program, the Translator will create several C source files. It will also create a file called emake.bat that will compile and link the C files for you. Just type:
       emake
 
When the C compiling and linking is finished, you will have a file called: shell.exe

When you run shell.exe, it should run the same as if you had typed: ex shell
to run it with the Interpreter, except that it should run faster, performing more "sorts per second".

Note to Linux and FreeBSD users:
The files will be called emake and shell, and you type ./emake to perform the compiles and link, and ./shell to run the shell sort program.

Command-Line Options

If you happen to have more than one C compiler for a given platform, you can select the one you want to use with a command-line option:
-bor
-lcc
-wat
-djg
on the command line to ec or ecw. e.g.

      ecw -bor pretend.exw
 
To make a Windows .dll file, or Linux or FreeBSD .so file, just add -dll to the command line. e.g.
      ecw -bor -dll mylib.ew
 
Note:
For Lcc and Borland there is no standard environment variable, so the Translator will search your PATH variable looking for a likely compiler directory. It looks in standard places such as: ..\LCC, ..\BCC.., ..\Borland.. etc. If you've installed in a nonstandard place you might have to rename your installation directory.


5. Dynamic Link Libraries (Shared Libraries)

Simply by adding -dll to the command line, the Translator will build a Windows .dll (Linux/FreeBSD .so) file instead of an executable program.

You can translate and compile a set of useful Euphoria routines, and share them with other people, without giving them your source. Furthermore, your routines will likely run much faster when translated and compiled. Both translated/compiled and interpreted programs will be able to use your library.

Only the global Euphoria procedures and functions, i.e. those declared with the "global" keyword, will be exported from the .dll (.so).

Any Euphoria program, whether translated/compiled or interpreted, can link with a Euphoria .dll (.so) using the same mechanism that lets you link with a .dll (.so) written in C. The program first calls open_dll() to open the .dll or .so file, then it calls define_c_func() or define_c_proc() for any routines that it wants to call. It calls these routines using c_func() and c_proc(). See library.doc for the details.

The routine names exported from a Euphoria .dll will vary depending on which C compiler you use.

GNU C on Linux or FreeBSD exports the names exactly as they appear in the C code produced by the Translator, e.g. a Euphoria routine

      global procedure foo(integer x, integer y)
would be exported as "_0foo" or maybe "_1foo" etc. The underscore and digit are added to prevent naming conflicts. The digit refers to the Euphoria file where the symbol is defined. The main file is numbered as 0. The include files are numbered in the order they are encountered by the compiler. You should check the C source to be sure.

Lcc would export foo() as "__0foo@8", where 8 is the number of parameters (2) times 4. You can check the .def file created by the Translator to see all the exported names.

For Borland the Translator also creates a .def file, but this .def file renames the exported symbols back into the same names that you used in your Euphoria source, so foo() would be exported as "foo".

For Watcom the same renaming as with Borland occurs, but instead of a .def file, an EXPORT command is added to objfiles.lnk for each exported symbol.

With Borland and Watcom you can edit the .def or objfiles.lnk file, and rerun emake.bat, to rename the exported symbols, or remove ones that you don't want to export. With Lcc you can remove symbols but you can't rename them.

Having nice exported names is not critical, since the name need only appear once in each Euphoria program that uses the .dll, i.e. in a single define_c_func() or define_c_proc() statement. The author of a .dll should probably provide his users with a Euphoria include file containing the necessary define_c_func() and define_c_proc() statements, and he might even provide a set of Euphoria "wrapper" routines to call the routines in the .dll.

When you call open_dll(), any top-level Euphoria statements in the .dll or .so will be executed automatically, just like a normal program. This gives the library a chance to initialize its data structures prior to the first call to a library routine. For many libraries no initialization is required.

To pass Euphoria data (atoms and sequences) as arguments, or to receive a Euphoria object as a result, you will need to add the following new data types to euphoria\include\dll.e:

      -- New Euphoria types for .dll (.so) arguments and return values:
      global constant
		      E_INTEGER = #06000004,
		      E_ATOM    = #07000004,
		      E_SEQUENCE= #08000004,
		      E_OBJECT  = #09000004
Use these in define_c_proc() and define_c_func() just as you currently use C_INT, C_UINT etc. to call C .dll's and .so's.

Currently, file numbers returned by open(), and routine id's returned by routine_id(), can be passed and returned, but the library and the main program each have their own separate ideas of what these numbers mean. Instead of passing the file number of an open file, you could instead pass the file name and let the .dll (.so) open it. Unfortunately there is no simple solution for passing routine id's. This might be fixed in the future.

Euphoria .dlls (.so's) can also be used by C programs as long as only 31-bit integer values are exchanged. If a 32-bit pointer or integer must be passed, and you have the source to the C program, you could pass the value in two separate 16-bit integer arguments (upper 16 bits and lower 16 bits), and then combine the values in the Euphoria routine into the desired 32-bit atom.


6. Executable Size and Compression

On DOS32 with Watcom, if the Translator finds the CauseWay files, cwc.exe and le23p.exe in euphoria\bin, it will add commands to emake.bat that will compress your executable file. If you don't want compression, you can edit emake.bat, or remove or rename cwc.exe and/or le23p.exe.

On Linux, FreeBSD, Windows, and DOS32 with DJGPP, emake does not include a command to compress your executable file. If you want to do this we suggest you try the free UPX compressor. You can get UPX from: http://upx.sourceforge.net

The Translator deletes routines that are not used, including those from the standard Euphoria include files. After deleting unused routines, it checks again for more routines that have now become unused, and so on. This can make a big difference, especially with Win32Lib-based programs where a large file is included, but many of the included routines are not used in a given program.

Nevertheless, your compiled executable file will likely be larger than the same Euphoria program bound with the Interpreter. This is partly due to the Interpreter being a compressed executable. Also, Euphoria statements are extremely compact when stored in a bound file. They need more space after being translated to C, and compiled into machine code. Future versions of the Translator will produce faster and smaller executables.


7. Interpreter vs. Translator

All Euphoria programs can be translated to C, and with just a few exceptions noted below, will run the same as with the Interpreter (but hopefully faster).

The Interpreter and Translator share the same parser, so you will get the same syntax errors, variable not declared errors etc. with either one.

The Translator reads your whole program before trying to do any translation. Occasionally it might catch a syntax error that the Interpreter doesn't see, because the Interpreter starts executing top-level statements immediately without waiting for the end of your program. This also means that if you have top-level statements in your program that modify a file that is later included, you will get a different result with the Translator. Very few programs use this "dynamic include" technique.

The Interpreter automatically expands the call stack (until memory is exhausted), so you can have a huge number of levels of nested calls. Most C compilers, on most systems, have a pre-set limit on the size of the stack. Consult your compiler or linker manual if you want to increase the limit, for example if you have a recursive routine that might need thousands of levels of recursion. Modify the link command in emake.bat. For Watcom C, use OPTION STACK=nnnn, where nnnn is the number of bytes of stack space.

Note:
The Translator assumes that your program has no run-time errors in it that would be caught by the Interpreter. The Translator does not check for: subscript out of bounds, variable not initialized, assigning the wrong type of data to a variable, etc.

You should debug your program with the Interpreter. When C code crashes you'll typically get a very cryptic machine exception. If your translated .exe program does crash, the first thing you should do is run your program with the Interpreter, using the same inputs, and preferably with type_check turned on.

Some of the run-time routines are still capable of catching an error and reporting it to the file ex.err.


8. Legal Restrictions

As far as RDS is concerned, any executable programs that you create with this Translator may be distributed royalty-free. You are free to incorporate any Euphoria files provided by RDS into your application.

You may not distribute any Complete Edition Translator, or any run-time library file that comes with the Complete Edition for any platform.

You may not distribute any hacked or cracked (reverse engineered) versions of any library or Translator, whether it's part of the Public Domain or Complete Edition.

In January 2000, the CauseWay DOS extender was donated to the public domain by Michael Devore. He has surrendered his copyright, and encourages anyone to use it freely, including for commercial use.

In general, if you wish to use Euphoria code written by 3rd parties, you had better honor any restrictions that apply. If in doubt, you should ask for permission.

On Linux, FreeBSD and DJGPP for DOS32, the GNU Library licence will normally not affect programs created with this Translator. Simply compiling with GNU C does not give the Free Software Foundation any jurisdiction over your program. If you statically link their libraries you will be subject to their Library licence, but the standard compile/link procedure in emake does not statically link any FSF libraries, so there should be no problem. The ncurses library is the only one statically linked, and although the Free Software Foundation now holds the copyright, ncurses is not subject to the GNU Library licence, since it was donated to FSF by authors who did not wish the GNU licence to apply to it. See ncurses.h for the copyright notice.

Disclaimer:
This is what we believe to be the case. We are not lawyers. If it's important to you, you should read the GNU Library licence, the ncurses licence, the legal comments in DJGPP, Lcc and Borland, and Michael Devore's read.me file on his site, to form your own judgement.


9. The Complete Edition Translator

Programs (including .dll's and .so's) created using the free Public Domain Translator will display a brief message on the screen prior to execution.

When you register for the Complete Edition Translator, you'll get:

  • The Complete Edition Translator for all platforms and all supported C compilers.

  • No more message and delay at the start of execution.

  • with trace and trace(3) will let you see the Euphoria statement that was being performed (by the C code) when a crash occurred. You'll also see a trace of the (up to) 500 preceding Euphoria statements. Look for a file called ctrace.out (produced by a main program) or ctrace-d.out (produced by a .dll or .so) in the current directory. ctrace.out is used as a circular buffer to hold traced statements. Once 500 statements are written to ctrace.out, the file pointer is reset to the start. This prevents the file from getting too large. The Euphoria statement that was being executed when the program terminated, is the one just prior to the line: "=== THE END ===". If you get a run-time error message, this line might be displayed for you automatically.

  • With most of the C compilers, pressing control-C will stop a program. ctrace.out will show where the program was executing when it was stopped. This is a good way to diagnose infinite loops.

  • Your program will run much slower when ctrace.out is being written. You can help the situation by calling trace(3) and trace(0) to select the portions of your program that you want to trace. You can also use without trace to eliminate tracing in certain parts of your code.

  • Euphoria statements will be inserted as comments in the C source. You'll have a clearer idea of how each Euphoria statement has been implemented in C.

  • The usual benefits of being a registered user:

    • free upgrades for at least 12 months

    • discounts on future upgrades

    • 3 months free tech support from RDS

    • ability to vote $3.00 per month in the Micro-Economy


10. Frequently Asked Questions

Q - How much of a speed-up should I expect?
A - It all depends on what your program spends its time doing. Programs that use mainly integer calculations, don't call run-time routines very often, and don't do much I/O will see the greatest improvement, currently up to about 5x faster. Other programs may see only a few percent improvement.

The various C compilers are not equal in optimization ability. Watcom, GNU C and DJGPP produce the fastest code. Borland is fairly good. Lcc lags slightly behind the others, even when its -O flag is used.

Borland compiles the fastest. Watcom compiles the slowest.

Q - Should I register for the Translator without first registering for the Interpreter package?
A - Debugging large programs will be easier if you have the Complete Edition Interpreter. Ideally you should develop and debug your code thoroughly with the Interpreter, and use the Translator as the final step to gain speed. You can however, buy the Interpreter or Translator package separately, and buy the other package later.

Q - If I buy the Interpreter Source Code, will I be able to modify the Translator run-time library?
A - No. You will however be able to view the source to most of the routines in the Translator run-time library, and include modified versions of some of the routines in your program. You will also be able to build your own interpreter.

Q - What if I want to change the compile or link options in emake.bat?
A - Feel free to do so, however you should copy emake.bat to your own file called (say) mymake.bat, then run mymake.bat after running the Translator. Occasionally the number of .c files produced by the Translator could change.

Q - How can I make my program run even faster?
A - It's important to declare variables as integer where possible. In general, it helps if you choose the most restrictive type possible when declaring a variable.

Typical user-defined types will not slow you down. Since your program is supposed to be free of type_check errors, types are ignored by the Translator, unless you call them directly with normal function calls. The one exception is when a user-defined type routine has side-effects (i.e. it sets a global variable, performs pokes into memory, I/O etc.). In that case, if with type_check is in effect, the Translator will issue code to call the type routine and report any type_check failure that results.

On Windows and DOS we have left out the /ol loop optimization for Watcom's wcc386. We found in a couple of rare cases that this option led to incorrect machine code being emitted by the Watcom C compiler. If you add it back in to your own version of emake.bat you might get a slight improvement in speed, with a slight risk of buggy code. For DJGPP you might try -O6 instead of -O2.

For DOS we use the Watcom /fpc option which generates calls to run-time routines to perform floating-point operations. If the machine has floating-point hardware it will be used by the routine, otherwise software emulation will be used. This slows things down somewhat, and isn't needed on Pentiums, but it guarantees that your program will run on all 386 and 486 machines, even if they lack floating-point hardware. The DOS run-time library, ec.lib, was built this way, so you can't simply remove this option.

On Linux or FreeBSD you could try the O3 option of gcc instead of O2. It will "in-line" small routines, improving speed slightly, but creating a larger executable.


11. Common Problems

Many large programs have been successfully translated and compiled using each of the supported C compilers, and the Translator is now quite stable.

Note:
On Windows, if you call a C routine that uses the cdecl calling convention (instead of stdcall), you must specify a '+' character at the start of the routine's name in define_c_proc() and define_c_func(). If you don't, the call may work when running the exw Interpreter, but will probably fail (crash) when you translate and compile with Borland or Lcc.

In some cases a huge Euphoria routine is translated to C, and it proves to be too large for the C compiler to process. If you run into this problem, make your Euphoria routine smaller and simpler. You can also try turning off C optimization in emake.bat for just the .c file that fails. Breaking up a single constant declaration of many variables into separate constant declarations of a single variable each, may also help. Euphoria has no limits on the size of a routine, or the size of a file, but most C compilers do. The Translator will automatically produce multiple small .c files from a large Euphoria file to avoid stressing the C compiler. It won't however, break a large routine into smaller routines.

Send bug reports to rds@RapidEuphoria.com
In particular, let us know if any translated program does not run the same when compiled as it does when interpreted.