Data Movement Optimizations For Gpu-Based Non-Uniform Processing-In-Memory Systems