diff --git a/flang/documentation/mod-files.md b/flang/documentation/mod-files.md new file mode 100644 index 000000000000..95a190da04b0 --- /dev/null +++ b/flang/documentation/mod-files.md @@ -0,0 +1,124 @@ +# Module Files + +Module files hold information from a module that is necessary to compile +program units that depend on the module. + +## Name + +Module files must be searchable by module name. They are typically named +`.mod`. The advantage of using `.mod` is that it is consistent with +other compilers so users will know what they are. Also, makefiles and scripts +often use `rm *.mod` to clean up. + +The disadvantage of using the same name as other compilers is that it is not +clear which compiler created a `.mod` file and files from multiple compilers +cannot be in the same directory. This could be solved by adding something +between the module name and extension, e.g. `-f18.mod`. + +## Format + +The proposed format for module files is a Fortran source. +Declarations of all visibile entities will be included, along with private +entities that they depend on. Executable statements will be omitted. + +### Header + +There will be a header containing extra information that cannot be expressed +in Fortran. This will take the form of a comment or directive +at the beginning of the file. + +If it's a comment, the module file reader would have to strip it out and +perform *ad hoc* parsing on it. If it's a directive the compiler could +parse it like other directives as part of the grammar. +Processing the header before parsing might result in better error messages +when the `.mod` file is invalid. + +Regardless of whether the header is a comment or directive we can use the +same string to introduce it: `!mod$`. + +Information in the header: +- Magic string to confirm it is an f18 `.mod` file +- Version information: to indicate the version of the file format, in case it changes, + and the version of the compiler that wrote the file, for diagnostics. +- Checksum of the body of the current file +- Modules we depend on and the checksum of their module file when the current + module file is created +- Source file dependency information? +- Compilation options? + +### Body + +The body will consist of minimal Fortran source for the required declarations. +The order will match the order they first appeared in the source. + +Some normalization will take place: +- extraneous spaces will be removed +- implicit types will be made explicit +- attributes will be written in a consistent order +- entity declarations will be combined into a single declaration +- function return types specified in a *prefix-spec* will be replaced by + an entity declaration +- etc. + +#### Symbols included + +All public symbols from the module need to be included. + +In addition, some private symbols are needed: +- private types that appear in the public API +- private components of non-private derived types +- private parameters used in non-private declarations (initial values, kind parameters) +- others? + +It might be possible to anonymize private names if users don't want them exposed +in the `.mod` file. (Currently they are readable in PGI `.mod` files.) + +#### USE associate + +A module that contains `USE` statements needs them represented in the +`.mod` file. +Each use-associated symbol will be written as a separate *use-only* statement, +possibly with renaming. + +Alternatives: +- Emit a single `USE` for each module, listing all of the symbols that were + use-associated in the *only-list*. +- Detect when all of the symbols from a module are imported (either by a *use-stmt* + without an *only-list* or because all of the public symbols of the module + have been listed in *only-list*s). In that case collapse them into a single *use-stmt*. +- Emit the *use-stmt*s that appeared in the original source. + +## Reading and writing module files + +A command-line option (e.g. `-module`) will specified a directory to +search for `.mod` files and to write them to. +If not specified it defaults to the current directory. + +### Writing modules files + +When writing a module file, if the existing one matches what would be written, +the timestamp is not updated. + +Module files will be written after semantics, i.e. after the compiler has +determined the module is valid Fortran.
+**NOTE:** PGI does create `.mod` files sometimes even when the module has a +compilation error. + +When the compiler can get far enough to determine it is compiling a module +but then encounters an error, it will delete the existing `.mod` file +if present. + +### Reading module files + +When the compiler finds a `.mod` file it needs to read, it firsts checks the first +line and verifies it is a valid module file. It can also verify checksums of +modules it depends on and report if they are out of date. + +If the header is valid, the module file will be run through the parser and name +resolution to recreate the symbols from the module. Once the symbol table is +populated the parse tree can be discarded. + +When processing `.mod` files we know they are valid Fortran with these properties: +1. The input (without the header) is already in the "cooked input" format. +2. No preprocessing is necessary. +3. No errors can occur.