mirror/ryujinx - An experimental Nintendo Switch Emulator written in C#

Age	Commit message (Collapse)	Author
2023-01-12	Arm64: Cpu feature detection (#4264)1.1.544	merry
	* Arm64: Cpu feature detection * Ptc: Add Arm64 feature info * nits * simplify CheckSysctlName * restore some macos flags * feedback
2023-01-10	Implement JIT Arm64 backend (#4114)1.1.536	gdkchan
	* Implement JIT Arm64 backend * PPTC version bump * Address some feedback from Arm64 JIT PR * Address even more PR feedback * Remove unused IsPageAligned function * Sync Qc flag before calls * Fix comment and remove unused enum * Address riperiperi PR feedback * Delete Breakpoint IR instruction that was only implemented for Arm64
2022-12-21	Fix CPU FCVTN instruction implementation (slow path) (#4159)1.1.487	gdkchan
	* Fix CPU FCVTN instruction implementation (slow path) * PPTC version bump
2022-12-18	Revert "ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663)" ↵1.1.479	gdkchan
	(#4145) This reverts commit 295fbd0542a93ac50e558054a3f0c8c64286b764.
2022-12-18	ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663)1.1.478	Wunk
	* ARMeilleure: Add AVX512{F,VL,DQ,BW} detection Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as short-hands for `F+VL` and `F+VL+DQ`. * ARMeilleure: Add initial support for EVEX instruction encoding Does not implement rounding, or exception controls. * ARMeilleure: Add `X86Vpternlogd` Accelerates the vector-`Not` instruction. * ARMeilleure: Add check for `OSXSAVE` for AVX{2,512} * ARMeilleure: Add check for `XCR0` flags Add XCR0 register checks for AVX and AVX512F, following the guidelines from section 14.3 and 15.2 from the Intel Architecture Software Developer's Manual. * ARMeilleure: Increment InternalVersion * ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting * ARMeilleure: Move XCR0 procedure to GetXcr0Eax * ARMeilleure: Add `XCR0` to `FeatureInfo` structure * ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly Avoids an additional allocation * ARMeilleure: Formatting fixes
2022-10-19	Do not clear the rejit queue when overlaps count is equal to 0. (#3721)1.1.319	LDj3SNuD
	* Do not clear the rejit queue when overlaps count is equal to 0. * Ptc and PtcProfiler must be invalidated. * Revert "Ptc and PtcProfiler must be invalidated." This reverts commit f5b0ad9d7dc3c0b3a0da184de4d04d7234939c81. * Fix #3710 slow path due to #3701.
2022-10-19	A32: Implement VCVTT, VCVTB (#3710)1.1.315	merry
	* A32: Implement VCVTT, VCVTB * A32: F16C implementation of VCVTT/VCVTB
2022-10-19	A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V in… (#3712)1.1.314	LDj3SNuD
	* A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V instructions; they use "Round to Nearest with Ties to Away" rounding mode not supported in x86. All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. The titles Mario Strikers and Super Smash Bros. U. use these instructions intensively. * Update Ptc.cs * A32: Add fast path for Vcvta_RM, Vrinta_RM and Vrinta_V instructions aswell.
2022-10-02	ARMeilleure: Add `gfni` acceleration (#3669)1.1.286	Wunk
	* ARMeilleure: Add `GFNI` detection This is intended for utilizing the `gf2p8affineqb` instruction * ARMeilleure: Add `gf2p8affineqb` Not using the VEX or EVEX-form of this instruction is intentional. There are `GFNI`-chips that do not support AVX(so no VEX encoding) such as Tremont(Lakefield) chips as well as Jasper Lake. https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt#L1297-L1299 https://github.com/InstLatx64/InstLatx64/blob/13df339fe7150b114929f71b19a6b2fe72fc751e/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt#L1252-L1254 * ARMeilleure: Add `gfni` acceleration of `Rbit_V` Passes all `Rbit_V` unit tests on my `i9-11900k` ARMeilleure: Add `gfni` acceleration of `S{l,r}i_V` Also added a fast-path for when the shift amount is greater than the size of the element. * ARMeilleure: Add `gfni` acceleration of `Shl_V` and `Sshr_V` * ARMeilleure: Increment InternalVersion * ARMeilleure: Fix Intrinsic and Assembler Table alignment `gf2p8affineqb` is the longest instruction name I know of. It shouldn't get any wider than this. * ARMeilleure: Remove SSE2+SHA requirement for GFNI * ARMeilleure Add `X86GetGf2p8LogicalShiftLeft` Used to generate GF(2^8) 8x8 bit-matrices for bit-shifting for the `gf2p8affineqb` instruction. * ARMeilleure: Append `FeatureInfo7Ecx` to `FeatureInfo`
2022-09-20	Fpsr and Fpcr freed. (#3701)1.1.279	LDj3SNuD
	* Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.
2022-09-19	Implemented in IR the managed methods of the ShlReg region of the ↵1.1.273	LDj3SNuD
	SoftFallback class. (#3700) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Update InstEmitSimdHelper.cs
2022-09-14	A32/T32/A64: Implement Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) ↵1.1.272	merry
	(#3694) * OpCodeTable: Implement Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) * A64: Remove catch-all Hint instruction * T16: Handle unallocated hint instructions Some thumb tests execute these assuming that they're nops. * T32: Fill out other Hint instructions * A32: Fill out other hint instructions
2022-09-13	Implement PLD and SUB (imm16) on T32, plus UADD8, SADD8, USUB8 and SSUB8 on ↵1.1.269	gdkchan
	both A32 and T32 (#3693)
2022-09-13	Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 (#3695)1.1.266	gdkchan
	* Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 * PPTC version bump * PR feedback
2022-09-11	Implement VRINT (vector) Arm32 NEON instructions (#3691)1.1.263	gdkchan

2022-09-10	T32: Add Vfp instructions (#3690)1.1.262	merry

2022-09-10	Implement Thumb (32-bit) memory (ordered), multiply, extension and bitfield ↵1.1.261	gdkchan
	instructions (#3687) * Implement Thumb (32-bit) memory (ordered), multiply and bitfield instructions * Remove public from interface * Fix T32 BL immediate and implement signed and unsigned extend instructions
2022-09-09	Add ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ ↵1.1.256	gdkchan
	thumb instructions (#3683) * Add ADD (zx imm12), NOP, MOV (register shifted), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions, fix LDRD, STRD, CBZ, CBNZ and BLX (reg) * Bump PPTC version
2022-09-09	Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, ↵1.1.255	gdkchan
	VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions (#3677) * Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions * PPTC version * Fix VQADD/VQSUB * Improve MRC/MCR handling and exception messages In case data is being recompiled as code, we don't want to throw at emit stage, instead we should only throw if it actually tries to execute
2022-09-08	Implemented in IR the managed methods of the Saturating region ... (#3665)1.1.252	LDj3SNuD
	* Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback.
2022-08-25	ARMeilleure: Hardware accelerate SHA256 (#3585)1.1.230	merry
	* ARMeilleure/HardwareCapabilities: Add Sha * ARMeilleure/Intrinsic: Add X86Sha256Rnds2 * ARmeilleure: Hardware accelerate SHA256H/SHA256H2 * ARMeilleure/Intrinsic: Add X86Sha256Msg1, X86Sha256Msg2 * ARMeilleure/Intrinsic: Add X86Palignr * ARMeilleure: Hardware accelerate SHA256SU0, SHA256SU1 * PTC: Bump InternalVersion
2022-08-25	Implement some 32-bit Thumb instructions (#3614)1.1.229	gdkchan
	* Implement some 32-bit Thumb instructions * Optimize OpCode32MemMult using PopCount
2022-08-18	Removed unused usings. (#3593)1.1.223	Nicholas Rodine
	* Removed unused usings. * Added back using, now that it's used. * Removed extra whitespace.
2022-08-05	Implement Arm32 Sha256 and MRS Rd, CPSR instructions (#3544)1.1.208	gdkchan
	* Implement Arm32 Sha256 and MRS Rd, CPSR instructions * Add tests using Arm64 outputs
2022-07-06	Implement CPU FCVT Half <-> Double conversion variants (#3439)1.1.165	gdkchan
	* Half <-> Double conversion support * Add tests, fast path and deduplicate SoftFloat code * PPTC version
2022-05-31	Refactor CPU interface to allow the implementation of other CPU emulators ↵1.1.134	gdkchan
	(#3362) * Refactor CPU interface * Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers * Make CpuEngine take a ITickSource rather than returning one The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source * XML docs for the public interfaces * PPTC invalidation due to NativeInterface function name changes * Fix build of the CPU tests * PR feedback
2022-03-19	InstEmitMemoryEx: Barrier after write on ordered store (#3193)1.1.77	merry
	* InstEmitMemoryEx: Barrier after write on ordered store * increment ptc version * 32
2022-03-05	A32: Fix ALU immediate instructions (#3179)1.1.60	merry
	* Tests: Add A32 tests for immediate ADC/ADCS/RSC/RSCS/SBC/SBCS * A32: Fix bug in ADC/ADCS/RSC/RSCS/SBC/SBCS * CpuTestAluImm32: Add more opcodes * Increment PTC version
2022-03-04	Decoder: Exit on trapping instructions, and resume execution at trapping ↵1.1.58	merry
	instruction (#3153) * Decoder: Exit on trapping instructions, and resume execution at trapping instruction * Resume at trapping address * remove mustExit
2022-03-04	T32: Implement B, B.cond, BL, BLX (#3155)1.1.57	merry
	* Decoders: Make IsThumb a function of OpCode32 * OpCode32: Fix GetPc * T32: Implement B, B.cond, BL, BLX * rm usings
2022-02-22	T32: Implement ALU (shifted register) instructions (#3135)1.1.53	merry
	* T32: Implement ADC, ADD, AND, BIC, CMN, CMP, EOR, MOV, MVN, ORN, ORR, RSB, SBC, SUB, TEQ, TST (shifted register) * OpCodeTable: Sort T32 list * Tests: Rename RandomTestCase to PrecomputedThumbTestCase * T32: Tests for AluRsImm instructions * fix nit * fix nit 2
2022-02-22	A32: Fix BLX and BXWritePC (#3151)1.1.48	merry

2022-02-18	Enable CPU JIT cache invalidation (#2965)1.1.44	gdkchan
	* Enable CPU JIT cache invalidation * Invalidate cache on IC IVAU
2022-02-18	Decoders: Add IOpCode32HasSetFlags (#3136)1.1.39	merry

2022-02-17	ARMeilleure: Thumb support (All T16 instructions) (#3105)1.1.36	merry
	* Decoders: Add InITBlock argument * OpCodeTable: Minor cleanup * OpCodeTable: Remove existing thumb instruction implementations * OpCodeTable: Prepare for thumb instructions * OpCodeTables: Improve thumb fast lookup * Tests: Prepare for thumb tests * T16: Implement BX * T16: Implement LSL/LSR/ASR (imm) * T16: Implement ADDS, SUBS (reg) * T16: Implement ADDS, SUBS (3-bit immediate) * T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) * T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) * T16: Implement ADD, CMP, MOV (high reg) * T16: Implement BLX (reg) * T16: Implement LDR (literal) * T16: Implement {LDR,STR}{,H,B,SB,SH} (register) * T16: Implement {LDR,STR}{,B,H} (immediate) * T16: Implement LDR/STR (SP) * T16: Implement ADR * T16: Implement Add to SP (immediate) * T16: Implement ADD/SUB (SP) * T16: Implement SXTH, SXTB, UXTH, UTXB * T16: Implement CBZ, CBNZ * T16: Implement PUSH, POP * T16: Implement REV, REV16, REVSH * T16: Implement NOP * T16: Implement LDM, STM * T16: Implement SVC * T16: Implement B (conditional) * T16: Implement B (unconditional) * T16: Implement IT * fixup! T16: Implement ADD/SUB (SP) * fixup! T16: Implement Add to SP (immediate) * fixup! T16: Implement IT * CpuTestThumb: Add randomized tests * Remove inITBlock argument * Address nits * Use index to handle IfThenBlockState * Reduce line noise * fixup * nit
2022-02-17	Use ReadOnlySpan<byte> compiler optimization for static data (#3130)1.1.34	Berkan Diler

2022-02-11	InstEmitMemory32: Literal loads always have word-aligned PC (#3104)1.1.26	merry

2022-02-08	ARMeilleure: A32: Implement SHSUB8 and UHSUB8 (#3089)1.1.21	merry
	* ARMeilleure: A32: Implement UHSUB8 * ARMeilleure: A32: Implement SHSUB8
2022-02-06	ARMeilleure: A32: Implement SHADD8 (#3086)1.1.18	merry

2022-01-29	Fix small precision error on CPU reciprocal estimate instructions (#3061)1.1.13	gdkchan
	* Fix small precision error on CPU reciprocal estimate instructions * PPTC version bump
2022-01-21	Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015)	gdkchan
	* Add host CPU memory barriers for DMB/DSB and ordered load/store * PPTC version bump * Revert to old barrier order
2022-01-19	Implement FCVTNS (Scalar GP) (#2953)	sharmander
	* Implement FCVTNS (Scalar GP) * Update Ptc Version
2022-01-16	Fix return type mismatch on 32-bit titles (#3000)	gdkchan

2022-01-04	CPU - Implement FCVTMS (Vector) (#2937)	sharmander
	* Add FCVTMS_V Implementation to Armeilleure * Fix opcode designation * Add tests * Amend Ptc version * Fix OpCode / Tests * Create Math.Floor helper method + Update implementation * Address gdk comments * Re-address gdk comments * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Update Tests to use 2S (4S) and 2D Co-authored-by: gdkchan <gab.dark.100@gmail.com>
2021-12-19	Implement CSDB instruction (#2927)	gdkchan

2021-12-08	Implement UHADD8 instruction (#2908)	Piyachet Kanda
	* Implement UHADD8 instruction along with a test unit * Update PTC revision number
2021-09-29	Use normal memory store path for DC ZVA (#2693)	riperiperi
	Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal. The existing EmitStore is used by passing in ZR as the register to get a 0 write.
2021-09-14	Refactor `PtcInfo` (#2625)	FICTURE7
	* Refactor `PtcInfo` This change reduces the coupling of `PtcInfo` by moving relocation tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through out the pipeline until it actually needs to be written to the PTC streams. Keeping this representation makes inspecting and manipulating relocations after compilations less painful. This is something I needed to do to patch relocations to 0 to diff dumps. Contributes to #1125. * Turn `Symbol` & `RelocInfo` into readonly structs * Add documentation to `CompiledFunction` * Remove `Compiler.Compile<T>` Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the `CompiledFunction` returned.
2021-08-27	Implement MSR instruction for A32 (#2585)	Mary
	* Implement MSR instruction Fix #1342. Now Pocket Rumble is playable. * Address gdkchan's comments * Address gdkchan's comments * Address gdkchan's comment
2021-08-17	Reduce JIT GC allocations (#2515)	FICTURE7
	* Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.