CXL Collab Sync
CXL Linux Sync: Ground Rules
- Do not share confidential information
- Do not share confidential product details
- Do not disclose CXL consortium confidential information
- Do discuss any Linux questions about released CXL specifications:
- Do use IRC as a supplement for this sync meeting for quick questions
#cxl
onirc.oftc.net
- Do follow-up on linux-cxl@vger.kernel.org for longer questions / debug
- https://pmem.io/ndctl/collab/
September 2023
- Opens:
- John: CXL memory online by default memhp_default_state=offline not working?
- QEMU
- cxl-cli
- v6.6 Fixes
- v6.7 Queue
QEMU
- Merge window induced slowness
- Round-up of fixlets sent up
- Multiple HDM Decoder support for endpoints posted
- Serial number update
- Maintainer feedback administrivia cleanups
- Sort out revision numbers for spec version comments
- advocate with your rep about caching old copies at spec-landing
- MCTP I2C from NVME
- Single Aspeed i2c controller driver has support
- POC quality / out-of-tree support until server class driver arrives
- DCD Update
- waiting for kernel-side code resolution
- Get Extent List for unaccepted memory, track pending state in the implementation
- cxl-test may need updates too
- MHD Update
- Joint effort with SK Hynix, custom command set
- Proto-DCD
- Single logical device
- Software Development Vehicle
- CPMU, ARM, Compliance, Type-2
- SPDM Interest
- WDC looking at library-izing it, still looking to support and external agent
- FM API (MCTP Mailboxes + Switch CCI + MHD Mailbox)
v6.6 Fixes
- CXL RAS Enabling
- Region Granularity Setup
- Region Decoder Discover
v6.7 Queue
- RCH EH (under)
- Kernel SPDM
- WDC showing up to help
- Invite to CXL sync? Invited to “devsec”
August 2023
- Opens:
- Linux Plumbers CXL Microconference CFP
- uConf proposals close at end of the August
- Linux Plumbers CXL Microconference CFP
QEMU Update
- Not a huge amount going in this merge, doc, fixes Multiple HDM decoders should be going in this merge.
- Lot of stuff is backed up by the mailbox rework
- Jonathans gitlab has DCD preview queued up.
- Ira did some testing and fixes were merged in latest version
- Jonathan might have broken it with rebasing. So just a reminder that this is work in progress.
- MCTP support over I2c… Support is coming from NVME-MI this work is similar to FM-API
cxl-cli update:
v6.5 Fixes Queue
v6.6 Queue
- RCH Error handling
- Terry working on it right now. Was waiting on response from Dan which should be there yesterday.
- Will pick that work back up
- Type2
- Davidlohr to submit the fix for type2 init collision. (Merged)
- Dan rebasing patches. There is conflict here with the Switch CCI work. See below.
- DCD
- Ira is reworking the patch set quite a bit.
- Fan’s QEMU DCD work is being used
- Cxl-test being added for better regression testing
- Cxl-test event processing was changed
- New DAX device work needed to handle sparse extents within the dax region
- Interleaving is in the back of his head and Navneet has been looking into this. However, interleaving is not slated for this initial work
- Jonathan - concerned that interleaving should not to be precluded
- Leave in comments about where interleaving would fit in.
- Interleaving is the next major feature…
- QEMU - DCD merge would be at least 6.7 aligned.
- Switch CCI (Jonathan)
- Opens around what we do for user space – almost every command is destructive
- Maybe just CXL raw commands are required?
- Patch set has been a pain to rebase on type 2 from Dan
- Would really like review / feedback
- Davidlohr would like to merge the ‘moving around code’ sooner
- Would help with the type 2 conflicts
- It is hard to generalize the code without this second user
- Not critical for 6.6
- would like to see an early merge slated for 6.7
- In the end – Security questions are major gating factor
- Memory tiring in general
- CDAT vs HMAT
- ‘Distance’ calculations vary
- Patch set: ‘Mem tiring calculating abstract distance from ACPI’ (v6.7 material)
FM general topics
- We said we would talk about FM things in this meeting…
- Is there something at plumbers? Yes there is.
- Plumbers BoF for FM stuff?
Question from discord:
- John: “numa ratio policy patch?”
- Jonathan will try and dig in to see where the patches are
- We are talking at a VMA level.
QEMU Update
cxl-cli update
v6.5 Fixes Queue
- Region autodiscovery fixes
- x1 granularity calculation fix: minor fixups requested
- switch decoder allocation: minor fixups requested
- Hotplug fixes
- Cleanup softreserve on takeover: awaiting review
- Reuse SRAT proximity domain: pinged x86
- CXL _OSC AER Fixup: minor fixups requested
v6.6 Queue
- Queue closes August 18th
- RCH Error handling: fixes requested
- QTG enabling
- ACPI HMAT Generic Port support: awaiting merge
- Surface QTG ID info: awaiting merge
- CDAT Parsing: awaiting merge
- Finish Type2 enabling
- Fix security init collision: different approach requested
- Rebase remaining Type2 HDM API
- DCD: awaiting next rev
- Switch CCI: awaiting review
July 2023
ndctl / cxl-cli update
- v78 - minor fixups only - will go out this week
- v79
- Firmware update (no outstanding comments)
- Poison injection (awaiting new rev)
- Others?
…further notes not captured.
June 2023
- Opens:
- OpenBMC collaboration
- Labels / Persistent Naming (6.3 issue?)
- Add a CXL-CLI Item to the Agenda
- QEMU Update
- v6.4 Fixes
- v6.5 Merge Queue
- Post v6.5 material
QEMU Update
- QEMU DCD Support?
- MLD Support
- CCI layering work for OpenBMC collab
- I2C ACPI aspeed controller (upstream questionable)
v6.4 Fixes
- DAX Use After Free
- SRAT vs CFMWS Fixup(pending next rev and x86 review)
- Cache Management Discussion
v6.5 Merge Queue
- RCH Error Handling(awaiting v6 posting)
- Follow-up: RDPAS vs Root Port Scanning?
- Background command support(baseline pushed, awaiting consumer)
- Sanitization(pending review)
- Firmware udpate(awaiting final review)
- CXL perf monitoring(awaiting push to cxl-next)
Post v6.5
- QoS Class support(pre-reqs heading for v6.5)
- CDAT + QTG _DSM integration(pending review)
- Standalone CXL IDE
- Switch CCI
- memory_failure() for CXL events
- Type-2 Region Creation (awaiting review)
- Scan Media
- background dependency
- Dynamic Capacity Device support(awaiting next rev)
- Sparse DAX Region infrastructure
- DCD event plumbing
May 2023
- Opens:
- rasdaemon patches need review
- LSF/MM takeaways
- QEMU Update
- v6.4 pull summary
- v6.5 Queue
LSF/MM takeaways
- CXL 3.0 specification update review well received
- Discussed nodes vs zones and mempolicy vs mmap flags, nodes+mempolicy continues as the path forward
- Fabric manager: several efforts in flight (one in rust one in golang, OCP and OFA efforts as well)
- Live migration: CXL as a transport for migration, opportunity for migrate in place
QEMU Update
- Several patchkits ready and awaiting final merge:
- volatile memory
- poison handling
- events
- DCD support starting to surface
- Initial test results of the pre-RFC implementation look good
- QMP based interface
v6.4 pull summary
- DOE rework(queued)
- Poison retrieval(pending review)
- Forward and reverse address translation (DPA <==> HPA)
- Poison inject and clear(awaiting next rev)
v6.5 queue
- Background command support(pending review)
- QoS Class support(pending review)
- CDAT + QTG _DSM integration(pending review)
- CXL perf monitoring(awaiting perf acks)
- Dynamic Capacity Device support(awaiting next rev)
- Sparse DAX Region infrastructure
- DCD event plumbing
- Firmware udpate)(pending review)
- v2 posted with review feedback incorporated
- man page added to the cxl-cli patchkit
- RAS Capability Tracing on RCH AER events(awaiting next rev)
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- KEYP table enabling
- Switch CCI
- memory_failure() for CXL events
- Type-2 Region Creation(awaiting first rev)
- Scan Media
- background dependency
April 2023
- Opens:
- QEMU Update
- v6.3 Fixes
- v6.4 Queue
- v6.5 Queue
v6.3 Fixes
- Decoder Enumeration Fixes(queued)
v6.4 Queue
- DOE rework(queued)
- Poison retrieval(pending review)
- Forward and reverse address translation (DPA <==> HPA)
- Poison inject and clear(awaiting next rev)
- CXL perf monitoring(awaiting perf acks)
v6.5 Queue
- CDAT + QTG _DSM integration(review pending)
- Dynamic Capacity Device support(awaiting next rev)
- Sparse DAX Region infrastructure
- DCD event plumbing
- Firmware Update (awaiting first rev)
- RAS Capability Tracing on RCH AER events(awaiting next rev)
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- KEYP table enabling
- Switch CCI
- memory_failure() for CXL events
- Type-2 Region Creation(awaiting first rev)
- Scan Media
- background dependency
March 2023
- Opens:
- cxl/hdm: Fix hdm decoder init by adding COMMIT field check
- HDM-D/DB Kernel-internal region creation
- QEMU Update
- v6.4 Queue
v6.4 Queue
- DOE rework
- Poison retrieval
- Forward and reverse address translation (DPA <==> HPA)
- Poison inject and clear
- Scan Media
- background dependency
- Background command support
- Dynamic Capacity Device support
- Sparse DAX Region infrastructure
- DCD event plumbing
- Firmware Update
- CDAT + QTG _DSM integration
- CXL perf monitoring
- RAS Capability Tracing on RCH AER events
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- Switch CCI
- memory_failure() for CXL events
- Maintenance Feature Support (DRAM PPR) (BMC only?)
Notes
- Question about kernel code modularity for accelerator drivers
- Expectation is that it is a bug if CXL core code cannnot be reused for devices outside of the class-device definition
- DCD Sharing may be the first user of HDM-DB functionality in the kernel, QEMU model for this in scoping
- Multi-head (not yet MLD) device support in the works for QEMU
- QEMU gaining a fix for clearing the HDM decoder COMMITTED bit when deactivating decoders
- Poison
- Poison inject can be done unconditionally, rely on “injected” indication to delineate real vs simulated hardware problems
- open question: should the driver taint the kernel on inject? No, ACPI EINJ does not
- Poison list: emit trace event on inject event? Maybe already covered by another event record
February 2023
- Opens:
- CXL DVSEC emulation fixes
- QEMU Update
- v6.3 Merge Window
- v6.4 Queue
v6.3 Merge Window
- Move tracepoints to cxl_core
- Export CXL _OSC error control result
- CXL Events to Linux Trace Events (including interrupts)
- HDM decoder emulation
- Default “Soft Reserved” (EFI_MEMORY_SP) handling policy (kernel)
- Volatile Region Discovery
- Volatile Region Provisioning
- Set timestamp
v6.4 Queue
- Poison inject and clear
- Forward and reverse address translation (DPA <==> HPA)
- Poison retrieval
- memory_failure() for CXL events
- Dynamic Capacity Device support
- Sparse DAX Region infrastructure
- DCD event plumbing
- CDAT + QTG _DSM integration
- DOE rework
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- RAS Capability Tracing on RCH AER events
- Maintenance Feature Support (DRAM PPR)
- CXL perf monitoring
- Switch CCI
Notes:
- QEMU:
- Several patch kits in flight: https://gitlab.com/jic23/qemu/-/commits/cxl-2023-02-21/
- AER Discussion:
- What about CXL Reset for recovery?
- May be more relevant for future Type-2 devices than Type-3
- Add another PCI error recovery reset type?
- Map FLR => CXL Reset?
- PCI core supports per-device reset methods
- What about CXL Reset for recovery?
- DCD
- Look at MLD support before Switch CCI support
- CXL perf monitoring
- https://lore.kernel.org/r/20221018121318.22385-1-Jonathan.Cameron@huawei.com
- FW Update
- depends on background command support
- revisit for v6.4
- Scan Media
- revisit for v6.4
January 2023
Agenda 01/24
- Opens:
- DAX-page request API rework
- FM Project? LSF/MM topic
- Type-3 volatile
- QEMU Update
- v6.2 Merge Window
- v6.2-rc Fixes
- v6.3 Status
- v6.3+ Future Work
v6.2 Merge Window
- Cache invalidation for region physical invalidation scenarios
- DOE kernel/user access collision detection
- RCH preparation patches
- RCH Support (including DVSEC Range Register enumeration)
- Security commands (including background commands)
- RAS Capability Tracing on VH AER events
- XOR Interleave Math support
- cxl_pmem_wq removal
- EFI CPER record parsing for CXL error records
v6.2-rc Fixes
Merged in cxl/fixes:
- RAS UE addr mis-assignment
Pending merge:
- Fix nvdimm unregistration
v6.3 Status
Merged in cxl/next:
- Move tracepoints to cxl_core
- Export CXL _OSC error control result
Pending merge:
- CXL Events to Linux Trace Events (including interrupts)
- Poison inject and clear
- Forward and reverse address translation (DPA <==> HPA)
- Poison retrieval
- HDM decoder emulation
Awaiting next (or first) posting:
- RAS Capability Tracing on RCH AER events
- Volatile Region Discovery
- Volatile Region Provisioning
- CDAT + QTG _DSM integration
- Set timestamp
- memory_failure() for CXL events
- DOE rework
v6.3+ Future Work
- Default “Soft Reserved” (EFI_MEMORY_SP) handling policy (cxl-cli + daxctl)
- Dynamic Capacity Device support
- Sparse DAX Region infrastructure
- DCD event plumbing
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- Maintenance Feature Support (DRAM PPR)
- CXL perf monitoring
FM Future
- MLD Mailbox support for DCD event injection
- Switch mailbox CCI
- Multi-head device mailbox tunneling
QEMU
- Start new threads for debug issues not on patches
- Greg’s volatile region setup testing
- Passthrough decoder checks
- SPDM still pending
November 2022
Agenda 11/29
- Opens:
- FSDAX ->notify_failure() regression work still pending
- Others?
- Fixes merged for v6.1-rc4
- v6.2 merge window status
- Post v6.2 Features
v6.1-rc4 Fixes
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tag/?h=v6.1-rc4
Merged:
- Mailbox input payload fix
- Decoder commit crash
- LSA payload handling fix
- CFMWS NUMA Node setup
- Fix switch attached to single-port host-bridge
- BUG in create-region when no more intermediate port decoders available
- Fix region object memory leak
- Fix memdev object memory leak
- cxl_pmem static analysis fix
v6.2 Merge Window Status
Merged:
- Cache invalidation for region physical invalidation scenarios
- DOE kernel/user access collision detection
- RCH preparation patches
In the queue (has review):
- RCH Support (including DVSEC Range Register enumeration)
- Security commands (including background commands)
- CXL Events to Linux Trace Events (including interrupts)
- RAS Capability Tracing on RCH and VH AER events
In the queue (needs review):
- XOR Interleave Math support
- Forward and reverse address translation (DPA <==> HPA)
- Poison retrieval
- cxl_pmem_wq removal
- EFI CPER record parsing for CXL error records
At risk:
- Volatile Region Discovery
- Volatile Region Provisioning
- CDAT + QTG _DSM integration
- Poison inject and clear
- CXL perf monitoring
Post v6.2 Features
- MLD Mailbox support for DCD event injection
- Dynamic Capacity Device support
- Sparse DAX Region infrastructure
- DCD event plumbing
- Switch mailbox CCI
- Multi-head device mailbox tunneling
- Standalone CXL IDE
- PCIE SPDM pre-requisite
- Maintenance Feature Support (DRAM PPR)
- Default “Soft Reserved” (EFI_MEMORY_SP) handling policy (cxl-cli + daxctl)
October 2022
Agenda 10/25
- Opens:
- FSDAX page reference counting rework (merged in mm-unstable)
- FSDAX ->notify_failure() regression work still pending
- Code First ECR: ‘SP’ attribute in SRAT
- QEMU emulation status update
- Others?
- Fixes pending for v6.1-rc
- Features in flight for v6.2
- Rough plans for post v6.2 work
v6.1 Fixes
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=fixes
Queued:
- Mailbox input payload fix
- Decoder commit crash
- LSA payload handling fix
- CFMWS NUMA Node setup
Pending:
- Fix switch attached to single-port host-bridge
- BUG in create-region when no more intermediate port decoders available
v6.2 Features
In rough priority order, feedback welcome:
- RCH Support (including DVSEC Range Register enumeration)
- Cache invalidation for region physical invalidation scenarios
- RAS Capability Tracing on RCH and VH AER events
- CXL Events to Linux Trace Events (including interrupts)
- EFI CPER record parsing for CXL error records
- Forward and reverse address translation (DPA <==> HPA)
- Volatile Region Discovery
- Volatile Region Provisioning
- Security commands (including background commands)
- CXL perf monitoring
- Miscellaneous cleanups and renames
Post v6.2 Features
- Dynamic Capacity Device support
- Sparse DAX Region infrastructure
- DCD event plumbing
- Maintenance Feature Support (DRAM PPR)
- Switch mailbox CCI
- Multi-head device mailbox tunneling
- Default “Soft Reserved” (EFI_MEMORY_SP) handling policy (cxl-cli + daxctl)
August 2022
Agenda 8/30
- Opens:
- FSDAX ->notify_failure() fixes
- FSDAX page reference counting rework
- Linux v6.0-rc1 and ndctl (ndctl, daxctl, cxl-cli) v74 released
- Fix and Feature queue for v6.0-rc, v6.1 and ndctl-v75
- Rough plans for post v6.1 work for CXL 3.0 enabling
Recently released
- Kernel:
- DPA Space Accounting
- PMEM Region Provisioning
- DOE Support in PCI core
- CDAT retrieval (for debug)
- User tooling:
- cxl create-region
- cxl reserve/free-dpa
- cxl list -vvv
Next fixes and features
- ‘arch_flush_memregion()’
- Fix validation of x1 switch topologies
- Volatile region provisioning
- Region labels
- Security commands support
- Trace events for CXL events (including interrupts)
- ‘cxl monitor’ command
- CXL AER handling
- Address translation
Future work
- Performance monitoring
- Maintenance Feature Support (DRAM PPR)
- Dynamic Capacity Device support
- Default “Soft Reserved” (EFI_MEMORY_SP) handling policy
July 2022
Agenda 7/26
- Opens:
- FSDAX page reference counting rework
- What is queued for v6.0 (and ndctl-v74)?
- Late v6.0 updates
- Post v6.0 work
Queued for v6.0
- DOE Support in PCI core
- CDAT retrieval (for debug)
- DPA Space Accounting
- PMEM Region Provisioning
In review for v6.0
- Interleave granularity fixes
- Fix host-bridge x1 interleave constraint
- Fix region granularity > host-bridge granularity handling (scale factors must match)
Post v6.0 material
- Pre-existing region enumeration
- Volatile region provisioning
- XORMAP interleave support
- Trace Events for CXL Events
- List Poison
- Scan Media
- Address translation
- Region persistence in labels
- Region enumeration via labels
June 2022
Agenda: 6/28
- Opens:
- CXL Device Tree Support
- MEM_HWINIT_MODE=0
- QEMU mainline CXL support is live
- What is in review for v5.20 (and ndctl-v74)
- What else might make v5.20?
- What is post v5.20 material?
v5.20 in review
v5.20 on deck
- Pre-existing region enumeration
- Region persistence in labels
- Region enumeration via labels
- Address translation foundation
Post v5.20 material
- List Poison
- Scan Media
- XORMAP interleave support
- Trace Events for CXL Events
- Address translation (in cxl-cli) for all kernel supported Events, List Poison, and Scan Media
May 2022
Agenda: 5/31
- What is in v5.19?
- What is on deck for v5.20?
- What is post v5.20 material?
- Opens
v5.19 / ndctl-v73
- Kernel
- lockdep annotations
- CXL _OSC (native CXL hotplug + error “handling”)
- Disable suspend
- Mem_enable fixes
v5.20 / ndctl-v74
- Kernel
- Region Provisioning
- DOE Core
- CXL CDAT Retrieval
- Event record handling core
- Scan Media records
- Event Interrupts
- Background command timesharing
- Userpace
- ‘cxl create-region’
- Region listing support
- Scan media / Event records to json
- Address translation
Post v5.20 / v6.0
- Kernel
- SPDM Attestation
- IDE
- Security commands
- Userspace
- Attestation helper process
- CXL Device-DAX Policy