Improved worst-case regret bounds for randomized least-squares value iteration