X-Git-Url: http://git.joshuawise.com/snipe.git/blobdiff_plain/12aa4087bee3e70f170d7457794921de4e385227..0a24e44d4e9f82f8d3d83de8e58c83c8cf2868b6:/README diff --git a/README b/README index 0e3bf3d..66a2e23 100644 --- a/README +++ b/README @@ -1,144 +1,69 @@ -(* README - * Author: Frank Pfenning - *) - ------------------------------------------------------------------------ -Welcome to 15-411 F08! ------------------------------------------------------------------------ - -This is some starter code for the L1 compiler you have to build for -the Lab1. It contains a lexer, parser, translator, and even a code -generator, except that the code generator creates pseudo assembly -language with fictitious instructions and an unlimited number of -registers. We took some care to use good style (according to the -instructor); you may consider this a model for your own coding. Feel -free to modify any and all of this code as you see fit. - -Bug reports to the instructor Frank Pfenning are -particularly welcome and will be noted in the extra credit category. - ------------------------------------------------------------------------ -SML Notes ------------------------------------------------------------------------ -There are many different compilers for SML, perhaps the most -popular ones are - - SML/NJ -- http://www.smlnj.org/ - MLton -- http://www.mlton.org/ - Poly/ML -- http://www.polyml.org/ - -In this class we will be using SML/NJ v110.59. Please make sure your -code compiles under specifically this version on the lab machines -where it is the default and can be invoked simply with "sml" in a -shell. - -If you develop your implementation on other machines, similar versions -of SML/NJ are likely to be compatible, but you should certainly check -your code on the lab machines. - -For (almost universal) Standard Basis Libraries, see -http://www.standardml.org/Basis/index.html. Further resources, such -as documentation for ML-Lex and ML-Yacc, and documentation for the SML/NJ -specific libraries which are used in the starter code, can be found at - - http://www.cs.cmu.edu/~fp/courses/15411-f08/resources.html - ------------------------------------------------------------------------- -Source Files ------------------------------------------------------------------------- -The following are the source files for the L1 compiler - -README -- this file - -Makefile -- makefile for the compiler - For a quick test - - % make l1c (generates file bin/l1c.heap.) - % bin/l1c --verbose ../tests/test1.c - - should generate ../tests/test1.s in pseudo assembly - - % make clean (removes generated files) - % make TAGS (creates file TAGS, for Emacs tags commands) - -compile-l1c.sml -- SML commands that will create bin/l1c.heap. -bin/l1c -- the script that will run the exported SML heap - -sources.cm -- lists all source files, including libraries, - and lexer and grammar specifications - For a quick test - - % sml - - CM.make "sources.cm"; - - Top.test "--verbose ../tests/test1.c"; - - should generate ../tests/test1.s in pseudo assembly - -parse/ast.sml -- definition and printer for abstract syntax trees (AST's) -parse/l1.lex -- L1 lexer -parse/l1.grm -- L1 grammar -parse/parse.sml -- L1 parser -parse/parsestate.sml -- L1 parser support for error messages - -type/typechecker.sml -- (trivial) type-checker for AST - -trans/temp.sml -- functions to generate and track temp's -trans/tree.sml -- definition and pretty printer for IR trees -trans/trans.sml -- translation from AST to IR trees - -codegen/assem.sml -- pseudo assembly format for this starter code -codegen/codegen.sml -- pseudo code generator - -util/errormsg.sml -- error message utilities -util/flag.sml -- library for defining flags -util/mark.sml -- library for tracking source file positions -util/safe-io.sml -- I/O utilities -util/symbol.sml -- symbol table library -util/word32.sml -- machine word utilities for two's complement interpretation - -top/top.sml -- top level function for export to binary and testing - ------------------------------------------------------------------------- -Debugging Hints ------------------------------------------------------------------------- -You can use - - - Top.test "--verbose --dump-ast --dump-ir --dump-assem file.l1"; - -to print information from all the phases of the current compiler. - -If you want to see the internal representations, you can call directly -on SML's top level: - - - val ast = Parse.parse "file.l1"; - - val ir = Trans.translate ast; - - val assem = Codegen.codgen ir; - -This will use SML's internal printing function to print the data -structures. However, not everything will show. - -"-" means that the type is opaque. Sometimes you can replace an opaque - signature ascription ":>" with a transparent one ":" to see the info. - For reasons of general hygiene, however, you should change it back - before handing in. - -"#" means that the printing depth is exceeded. Use - - - Control.Print.printDepth := 100; - - to increase the depth if you need to see more. - -"..." means that the printing length is exceeded. Use - - - Control.Print.printLength := 1000; - - to increase the length if you need to see more. - ------------------------------------------------------------------------- -Library Hints ------------------------------------------------------------------------- -See util/symbol.sml for some uses of libraries provided with SML/NJ -(and some other SML implementations). BinaryMapFn and -BinarySetFn are likely of general use. To see their interface, -you can check http://www.smlnj.org/doc/smlnj-lib/Manual/toc.html. -I found binary maps and binary sets to be occasionally helpful. +README +------ + +This compiler is a big long chain of modules that transform l2 code into +x86_64 assembly. + +These modules include: + + * The parser. The parser was mainly brought in from lab 1, and mainly + just a straight-forward extension of the l1 parser. We continued to + mark expressions, and pass marking through as needed so that we could + produce reasonable error messages up through translation stage. We + introduced all needed grammar with no shift/reduce conflicts, but for + one in the IF/ELSE stage, with a construct such as: + if (x) + if (z) + a + else + b + (indentation intentionally omitted; there are at least two legitimate + ways to parse that!) + * The typechecker. This module was completely rewritten since lab1. Three + checks are instituted: a check to see if the program has misplaced break + or continue statements, a check to see that the program returns in all + control paths, and a check that all variables are initialized in all + control paths before usage. + + The return and break check is essentially implemented per the rules; the + only thing of interest for the variable initialization routine is that + there is a helper that computes all assigns to extend contexts from + block contents. It was determined that returning 2 accumulators from + varcheck would lead to returning 17 accumulators, which would lead to + 1984193248148132 accumulators; and 238547854478 accumulators leads to + the foldl, and foldl leads to anger, anger leads to hate, and hate leads + to the Dark Side. + * The translator is mainly intact; it was determined that the IR will have + basic control flow instructions of labels, jumps, and jump if not + conditional, which we deemed sufficient to implement all forms of l2 + control. + * The munch module was fully rewritten; we now munch directly to + pseudo-x86_64, in that it has temporaries allowed in it as well. We + believe that this allows us to generate much more optimal code than + munching into three op, converting from three to two, then converting + two to x86_64; in particular, we can run liveness on the x86_64 + directions directly, which makes translation significantly easier (we do + not have to worry about mashing necessary registers). + * The liveness analyzer was also fully rewritten; it is now fully + def-use-succ, giving us very pretty rules, and a lot of very ugly code + to mash them together. Luckily, the ugly code need not be touched ever + again. + * The grapher had about 4 characters of inconsequential change that had + the useful property of speeding it up by two orders of magnitude. You + need not worry about it. + * The orderer and colorer had no changes. + * A new module was introduced -- in particular, the solidifier. The + solidifier takes pseudo-x86_64 that is annotated with register locations + and emits needed spill and unspill operations to get everything into + real registers that the x86_64 chips can access. + * The peepholer remains pretty simple; redundant moves are optimized out, + and hence the code size drops by a factor of 1.5 or so. + * The stringifier is of no interest to you, for it does real things that + interact with the real world, and that is not of interest to people who + write in ML. + +We believe that it's fully functional; we have not had a case in quite some +time that caused us to generate incorrect code (at least, when we should +generate code). The internal debug mechanisms are very useful; often a +line-by-line examination of dumps after each translation phase can narrow +bugs down into single lines of ML code.