Initial import of l2c

[snipe.git] / README
diff --git a/README b/README

index 0e3bf3da60d20ade4382a58905e607f906ee0afc..66a2e239e1769abfc350a7c1424d9808f9015427 100644 (file)
--- a/README
+++ b/README
@@ -1,144 +1,69 @@
-(* README
- * Author: Frank Pfenning <fp@cs.cmu.edu>
- *)
-
------------------------------------------------------------------------
-Welcome to 15-411 F08!
------------------------------------------------------------------------
-
-This is some starter code for the L1 compiler you have to build for
-the Lab1.  It contains a lexer, parser, translator, and even a code
-generator, except that the code generator creates pseudo assembly
-language with fictitious instructions and an unlimited number of
-registers.  We took some care to use good style (according to the
-instructor); you may consider this a model for your own coding.  Feel
-free to modify any and all of this code as you see fit.
-
-Bug reports to the instructor Frank Pfenning <fp@cs.cmu.edu> are
-particularly welcome and will be noted in the extra credit category.
-
------------------------------------------------------------------------
-SML Notes
------------------------------------------------------------------------
-There are many different compilers for SML, perhaps the most
-popular ones are
-
-  SML/NJ  -- http://www.smlnj.org/
-  MLton   -- http://www.mlton.org/
-  Poly/ML -- http://www.polyml.org/
-
-In this class we will be using SML/NJ v110.59.  Please make sure your
-code compiles under specifically this version on the lab machines
-where it is the default and can be invoked simply with "sml" in a
-shell.
-
-If you develop your implementation on other machines, similar versions
-of SML/NJ are likely to be compatible, but you should certainly check
-your code on the lab machines.
-
-For (almost universal) Standard Basis Libraries, see
-http://www.standardml.org/Basis/index.html.  Further resources, such
-as documentation for ML-Lex and ML-Yacc, and documentation for the SML/NJ
-specific libraries which are used in the starter code, can be found at
-
-  http://www.cs.cmu.edu/~fp/courses/15411-f08/resources.html
-
-------------------------------------------------------------------------
-Source Files
-------------------------------------------------------------------------
-The following are the source files for the L1 compiler
-
-README               -- this file
-
-Makefile             -- makefile for the compiler
-                        For a quick test
-
-    % make l1c          (generates file bin/l1c.heap.<os-tag>)
-    % bin/l1c --verbose ../tests/test1.c
-
-                       should generate ../tests/test1.s in pseudo assembly
-
-    % make clean        (removes generated files)
-    % make TAGS         (creates file TAGS, for Emacs tags commands)
-
-compile-l1c.sml      -- SML commands that will create bin/l1c.heap.<os-tag>
-bin/l1c              -- the script that will run the exported SML heap
-
-sources.cm           -- lists all source files, including libraries,
-                        and lexer and grammar specifications
-                       For a quick test
-
-    % sml
-    - CM.make "sources.cm";
-    - Top.test "--verbose ../tests/test1.c";
-
-                        should generate ../tests/test1.s in pseudo assembly
-
-parse/ast.sml        -- definition and printer for abstract syntax trees (AST's)
-parse/l1.lex         -- L1 lexer
-parse/l1.grm         -- L1 grammar
-parse/parse.sml      -- L1 parser
-parse/parsestate.sml -- L1 parser support for error messages
-
-type/typechecker.sml -- (trivial) type-checker for AST
-
-trans/temp.sml       -- functions to generate and track temp's
-trans/tree.sml       -- definition and pretty printer for IR trees
-trans/trans.sml      -- translation from AST to IR trees
-
-codegen/assem.sml    -- pseudo assembly format for this starter code
-codegen/codegen.sml  -- pseudo code generator
-
-util/errormsg.sml    -- error message utilities
-util/flag.sml        -- library for defining flags
-util/mark.sml        -- library for tracking source file positions
-util/safe-io.sml     -- I/O utilities
-util/symbol.sml      -- symbol table library
-util/word32.sml      -- machine word utilities for two's complement interpretation
-
-top/top.sml          -- top level function for export to binary and testing
-
-------------------------------------------------------------------------
-Debugging Hints
-------------------------------------------------------------------------
-You can use
-
-  - Top.test "--verbose --dump-ast --dump-ir --dump-assem file.l1";
-
-to print information from all the phases of the current compiler.
-
-If you want to see the internal representations, you can call directly
-on SML's top level:
-
-  - val ast = Parse.parse "file.l1";
-  - val ir = Trans.translate ast;
-  - val assem = Codegen.codgen ir;
-
-This will use SML's internal printing function to print the data
-structures.  However, not everything will show.
-
-"-" means that the type is opaque.  Sometimes you can replace an opaque
-    signature ascription ":>" with a transparent one ":" to see the info.
-    For reasons of general hygiene, however, you should change it back
-    before handing in.
-
-"#" means that the printing depth is exceeded.  Use
-
-      - Control.Print.printDepth := 100;
-
-    to increase the depth if you need to see more.
-
-"..." means that the printing length is exceeded.  Use
-
-      - Control.Print.printLength := 1000;
-
-    to increase the length if you need to see more.
-
-------------------------------------------------------------------------
-Library Hints
-------------------------------------------------------------------------
-See util/symbol.sml for some uses of libraries provided with SML/NJ
-(and some other SML implementations).  BinaryMapFn and
-BinarySetFn are likely of general use.  To see their interface,
-you can check http://www.smlnj.org/doc/smlnj-lib/Manual/toc.html.
-I found binary maps and binary sets to be occasionally helpful.
+README
+------
+
+This compiler is a big long chain of modules that transform l2 code into
+x86_64 assembly.
+
+These modules include:
+
+  * The parser.  The parser was mainly brought in from lab 1, and mainly
+    just a straight-forward extension of the l1 parser.  We continued to
+    mark expressions, and pass marking through as needed so that we could
+    produce reasonable error messages up through translation stage.  We
+    introduced all needed grammar with no shift/reduce conflicts, but for
+    one in the IF/ELSE stage, with a construct such as:
+      if (x)
+      if (z)
+      a
+      else
+      b
+    (indentation intentionally omitted; there are at least two legitimate
+    ways to parse that!)
+  * The typechecker.  This module was completely rewritten since lab1.  Three
+    checks are instituted: a check to see if the program has misplaced break
+    or continue statements, a check to see that the program returns in all
+    control paths, and a check that all variables are initialized in all
+    control paths before usage.
+    
+    The return and break check is essentially implemented per the rules; the
+    only thing of interest for the variable initialization routine is that
+    there is a helper that computes all assigns to extend contexts from
+    block contents.  It was determined that returning 2 accumulators from
+    varcheck would lead to returning 17 accumulators, which would lead to
+    1984193248148132 accumulators; and 238547854478 accumulators leads to
+    the foldl, and foldl leads to anger, anger leads to hate, and hate leads
+    to the Dark Side.
+  * The translator is mainly intact; it was determined that the IR will have
+    basic control flow instructions of labels, jumps, and jump if not
+    conditional, which we deemed sufficient to implement all forms of l2
+    control.
+  * The munch module was fully rewritten; we now munch directly to
+    pseudo-x86_64, in that it has temporaries allowed in it as well.  We
+    believe that this allows us to generate much more optimal code than
+    munching into three op, converting from three to two, then converting
+    two to x86_64; in particular, we can run liveness on the x86_64
+    directions directly, which makes translation significantly easier (we do
+    not have to worry about mashing necessary registers).
+  * The liveness analyzer was also fully rewritten; it is now fully
+    def-use-succ, giving us very pretty rules, and a lot of very ugly code
+    to mash them together.  Luckily, the ugly code need not be touched ever
+    again.
+  * The grapher had about 4 characters of inconsequential change that had
+    the useful property of speeding it up by two orders of magnitude.  You
+    need not worry about it.
+  * The orderer and colorer had no changes.
+  * A new module was introduced -- in particular, the solidifier.  The
+    solidifier takes pseudo-x86_64 that is annotated with register locations
+    and emits needed spill and unspill operations to get everything into
+    real registers that the x86_64 chips can access.
+  * The peepholer remains pretty simple; redundant moves are optimized out,
+    and hence the code size drops by a factor of 1.5 or so.
+  * The stringifier is of no interest to you, for it does real things that
+    interact with the real world, and that is not of interest to people who
+    write in ML.
+
+We believe that it's fully functional; we have not had a case in quite some
+time that caused us to generate incorrect code (at least, when we should
+generate code).  The internal debug mechanisms are very useful; often a
+line-by-line examination of dumps after each translation phase can narrow
+bugs down into single lines of ML code.