Age | Commit message (Collapse) | Author |
|
|
|
|
|
offsets for Tlds
Formatting
|
|
on Linux.
|
|
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
|
|
This reverts #4713. The implementation in that PR is not accurate.
It does not reflect the behavior seen in hardware.
|
|
|
|
|
|
|
|
|
|
The "VK" prefix predates the "Vulkan" namespace. It was carried around
the codebase for consistency. "VKDevice" currently is a bad alias with
"VkDevice" (only an upcase character of difference) that can cause
confusion. Rename all instances of it.
|
|
|
|
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
|
|
video_core: Make use of ordered container contains() where applicable
|
|
With C++20, we can use the more concise contains() member function
instead of comparing the result of the find() call with the end
iterator.
|
|
Provides an in-place format string to make it more pleasant to read.
|
|
shader_ir: std::move node within DeclareAmend()
|
|
Same behavior, but elides an unnecessary atomic reference count
increment and decrement.
|
|
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
|
|
Cleans out the rest of the occurrences of variable shadowing and makes
any further occurrences of shadowing compiler errors.
|
|
Migrates the video core code closer to enabling variable shadowing
warnings as errors.
This primarily sorts out shadowing occurrences within the Vulkan code.
|
|
Prevents logic bugs from accidentally ignoring the return value.
|
|
|
|
decoder/image: Fix incorrect G24R8 component sizes in GetComponentSize()
|
|
Same behavior, but constructs the threads in place instead of moving
them.
|
|
This is equivalent to moving all the contents and then clearing the
vector. This avoids a redundant allocation.
|
|
|
|
Same behavior, but avoids redundant copies.
While we're at it, we can simplify the pushing of the parameters into
the pending queue.
|
|
shader: Partially implement texture cube array shadow
|
|
async_shaders: Increase Async worker thread count for >8 thread cpus
|
|
Adds 1 async worker thread for every 2 available threads above 8
|
|
This implements texture cube arrays with shadow comparisons but doesn't
fix the asserts related to it.
Fixes out of bounds reads on swizzle constructors and makes them use
bounds checked ::at instead of the unsafe operator[].
|
|
Trivially add the encoding for this.
|
|
TMML takes an array argument that has no known meaning, this one appears
as the first component in gpr8 followed by s, t and r. Skip this
component when arrays are being used. Also implement CUBE texture types.
- Used by Pikmin 3: Deluxe Demo.
|
|
shader/registry: Make use of designated initializers where applicable
|
|
|
|
arithmetic_integer_immediate: Make use of std::move where applicable
|
|
|
|
|
|
|
|
Same behavior, minus any redundant atomic reference count increments and
decrements.
|
|
renderer_vulkan: Make unconditional use of VK_KHR_timeline_semaphore
|
|
|
|
Using statements already make these unnecessary.
|
|
Same behavior, less repetition.
|
|
Places data structures where they'll eventually be moved to to avoid
needing to even move them in the first place.
|
|
Avoids unnecessary atomic increments and decrements.
|
|
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
|
|
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
|
|
decoder/texture: Eliminate narrowing conversion in GetTldCode()
|