Skip to content

Update TypeTree chapter to reflect newer understanding of it#2911

Open
ZuseZ4 wants to merge 1 commit into
mainfrom
typetree-updates
Open

Update TypeTree chapter to reflect newer understanding of it#2911
ZuseZ4 wants to merge 1 commit into
mainfrom
typetree-updates

Conversation

@ZuseZ4

@ZuseZ4 ZuseZ4 commented Jun 26, 2026

Copy link
Copy Markdown
Member

@scottmcm @workingjubilee does that match your understanding after the last discussions?

@rustbot

rustbot commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Thanks for the PR. If you have write access, feel free to merge this PR if it does not need reviews. You can request a review using r? rustc-dev-guide or r? <username>.

@rustbot rustbot added the S-waiting-on-review Status: this PR is waiting for a reviewer to verify its content label Jun 26, 2026

@workingjubilee workingjubilee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some typos and some concept-level thoughts

View changes since this review

Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently.
Memory layout descriptors for Enzyme. They tell Enzyme what "type" bytes are, with the main categories being Float, Integer, or Pointer. In Rust, memory is conceptually untyped, so it is possible to store a float into 4 bytes, and later read the bytes back as an integer. This is generally true in Rust even in the absence of `enum` or `union` types. We therefore can not directly put typetree metadata on allocations. We can also not accept Enzyme's default behaviour, which incorrectly assumes that LLVM-IR follows `strict aliasing` rules (known from C/C++). As a solution, we disable Enzyme's strict-aliasing behaviour and only generate TypeTree metadata in selected locations.

## Where we generate TypeTree

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Where we generate TypeTree
## Where we generate TypeTrees


## What are TypeTrees?
Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently.
Memory layout descriptors for Enzyme. They tell Enzyme what "type" bytes are, with the main categories being Float, Integer, or Pointer. In Rust, memory is conceptually untyped, so it is possible to store a float into 4 bytes, and later read the bytes back as an integer. This is generally true in Rust even in the absence of `enum` or `union` types. We therefore can not directly put typetree metadata on allocations. We can also not accept Enzyme's default behaviour, which incorrectly assumes that LLVM-IR follows `strict aliasing` rules (known from C/C++). As a solution, we disable Enzyme's strict-aliasing behaviour and only generate TypeTree metadata in selected locations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be split across multiple lines since Markdown (proper Markdown, like the book's) gets concatenated across newlines (but not double-newlines). This allows diffing individual sentences of a paragraph.

You do not have to do this if you do not like how it looks, it is just a suggestion.

Suggested change
Memory layout descriptors for Enzyme. They tell Enzyme what "type" bytes are, with the main categories being Float, Integer, or Pointer. In Rust, memory is conceptually untyped, so it is possible to store a float into 4 bytes, and later read the bytes back as an integer. This is generally true in Rust even in the absence of `enum` or `union` types. We therefore can not directly put typetree metadata on allocations. We can also not accept Enzyme's default behaviour, which incorrectly assumes that LLVM-IR follows `strict aliasing` rules (known from C/C++). As a solution, we disable Enzyme's strict-aliasing behaviour and only generate TypeTree metadata in selected locations.
Memory layout descriptors for Enzyme. They tell Enzyme what "type" bytes are, with the main categories being Float, Integer, or Pointer. In Rust, memory is conceptually untyped, so it is possible to store a float into 4 bytes, and later read the bytes back as an integer. This is generally true in Rust even in the absence of `enum` or `union` types. We therefore can not directly put typetree metadata on allocations. We can also not accept Enzyme's default behaviour, which incorrectly assumes that LLVM-IR follows `strict aliasing` rules (known from C/C++).
As a solution, we disable Enzyme's strict-aliasing behaviour and only generate TypeTree metadata where Rust actively asserts a type.

Memory layout descriptors for Enzyme. They tell Enzyme what "type" bytes are, with the main categories being Float, Integer, or Pointer. In Rust, memory is conceptually untyped, so it is possible to store a float into 4 bytes, and later read the bytes back as an integer. This is generally true in Rust even in the absence of `enum` or `union` types. We therefore can not directly put typetree metadata on allocations. We can also not accept Enzyme's default behaviour, which incorrectly assumes that LLVM-IR follows `strict aliasing` rules (known from C/C++). As a solution, we disable Enzyme's strict-aliasing behaviour and only generate TypeTree metadata in selected locations.

## Where we generate TypeTree
The underlying idea is that memory "at rest" is untyped, but plenty of usages interprete bytes in a way that we can communicate to Enzyme. For example, when we call a function, the memory passed to it is interpreted according to the function's signature, so we can add TypeTrees to the LLVM-IR function definitions. We currently only do that for the outermost functions differentiated (those that have a `#[autodiff]` macro on them), but we plan to extend it to all functions which are called from them. We currently also generate TypeTree information for all calls to mem{cpy|move|set}. Finally, we started to add TypeTrees to the input or return values of certain instructions, for now that mainly is `extractvalue`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The underlying idea is that memory "at rest" is untyped, but plenty of usages interprete bytes in a way that we can communicate to Enzyme. For example, when we call a function, the memory passed to it is interpreted according to the function's signature, so we can add TypeTrees to the LLVM-IR function definitions. We currently only do that for the outermost functions differentiated (those that have a `#[autodiff]` macro on them), but we plan to extend it to all functions which are called from them. We currently also generate TypeTree information for all calls to mem{cpy|move|set}. Finally, we started to add TypeTrees to the input or return values of certain instructions, for now that mainly is `extractvalue`.
The underlying idea is that while the memory of a place is untyped, plenty of usages impose a type assertion on bytes in ways that we can communicate to Enzyme.
For example, when we call a function, its arguments and return values are passed by typed copies matching the function's signature, so we can add TypeTrees to the LLVM-IR function definitions.
We currently only do that for the outermost functions differentiated (those that have a `#[autodiff]` macro on them), but we plan to extend it to all functions which are called from them. We currently also generate TypeTree information for all calls to mem{cpy|move|set}. Finally, we started to add TypeTrees to the input or return values of certain instructions, for now that mainly is `extractvalue`.

So this one is not just nitpicking the spelling of "interpret"... the interpretation is a very specific kind, and it applies to the values that receive what are often referred to as "typed copies".

Saying "memory" risks being vague for people because "memory" can mean both the values that receive typed copies and the memory in a place, and an argument can be a pointer to a place.

Much of the idea of the validity here can be considered as matching https://doc.rust-lang.org/std/mem/fn.transmute.html

The underlying idea is that memory "at rest" is untyped, but plenty of usages interprete bytes in a way that we can communicate to Enzyme. For example, when we call a function, the memory passed to it is interpreted according to the function's signature, so we can add TypeTrees to the LLVM-IR function definitions. We currently only do that for the outermost functions differentiated (those that have a `#[autodiff]` macro on them), but we plan to extend it to all functions which are called from them. We currently also generate TypeTree information for all calls to mem{cpy|move|set}. Finally, we started to add TypeTrees to the input or return values of certain instructions, for now that mainly is `extractvalue`.

## How we add TypeTrees
If we determined that a value has a meaningfull type, then we walk the MIR `Ty` of that value in the middle-end and generate a Rust TypeTree out of it. In the codegen\_llvm backend we lower our Rust TypeTree to LLVM/Enzyme TypeTrees. We then attach them to one of three locations:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If we determined that a value has a meaningfull type, then we walk the MIR `Ty` of that value in the middle-end and generate a Rust TypeTree out of it. In the codegen\_llvm backend we lower our Rust TypeTree to LLVM/Enzyme TypeTrees. We then attach them to one of three locations:
If we determine that a value has a meaningful type, then we walk the MIR `Ty` of that value in the middle-end and generate a Rust TypeTree out of it. In the `codegen_llvm` backend we lower our Rust TypeTree to Enzyme TypeTrees. We then attach them to one of three locations:

Hm. Calling them LLVM/Enzyme TypeTrees confuses the matter, I think? What makes them "LLVM/Enzyme TypeTrees"? Is it because they are directly embedded in the LLVMIR? I think it could just say that, then?

This should probably explain that first, actually. "A TypeTree is an Enzyme concept that gets smuggled through LLVMIR metadata" https://llvm.org/docs/LangRef.html#metadata

define internal void @_RNvCs7tI50jyFEig_3foo1f(ptr align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %0, ptr align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %1, ptr align 8 "enzyme_type"="{[-1]:Pointer, [-1,-1]:Float@double}" %2) unnamed_addr #0 !dbg !1089 {
```

Argument to calls:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Argument to calls:
Arguments to calls:

- Tells Enzyme which bytes are differentiable vs metadata
## Why are they needed?
- Plenty of LLVM types are opaque (e.g. `ptr`), but types are needed to compute the correct derivatives.
- They tell Enzyme which bytes are differentiable (e.g. the pointer to float within a slice) vs metadata (e.g. the integer length of a slice)

@workingjubilee workingjubilee Jul 3, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "a float", but the pointer of a slice can also be to zero floats, so...

Suggested change
- They tell Enzyme which bytes are differentiable (e.g. the pointer to float within a slice) vs metadata (e.g. the integer length of a slice)
- They tell Enzyme which bytes are differentiable (e.g. the pointer to floats within a slice) vs. metadata (e.g. the integer length of a slice)

## Why are they needed?
- Plenty of LLVM types are opaque (e.g. `ptr`), but types are needed to compute the correct derivatives.
- They tell Enzyme which bytes are differentiable (e.g. the pointer to float within a slice) vs metadata (e.g. the integer length of a slice)
- Enzyme can't deduce all types from LLVM IR, but can (to some extend) deduce them from usage (Type Analysis).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Enzyme can't deduce all types from LLVM IR, but can (to some extend) deduce them from usage (Type Analysis).
- Enzyme can't deduce all types from LLVM IR, but can (to some extent) deduce them from usage (Type Analysis).

call void @llvm.memcpy.p0.p0.i64(ptr align 8 "enzyme_type"="{[0]:Pointer, [0,0]:Pointer, [0,0,-1]:Float@double}" %6, ptr align 8 "enzyme_type"="{[0]:Pointer, [0,0]:Pointer, [0,0,-1]:Float@double}" %0, i64 24, i1 false), !dbg !669
```

Input or return values of instructions, via debug metadata:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what these are, right? https://llvm.org/docs/LangRef.html#metadata-nodes-mdnode

Suggested change
Input or return values of instructions, via debug metadata:
Input or return values of instructions rustc uses for typed copies, via metadata nodes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: this PR is waiting for a reviewer to verify its content

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants