Anything involving batch calculations or high dimensional linear algebra typically runs faster on gpu
That's true if the software is coded to take advantage of one or more GPUs. TWS itself probably doesn't support GPUs. But it's likely possible to use TWS and write software that does use GPUs for some calculations.
Has nothing to do with TWS being coded to support GPU. You can write your source code to use GPU easily. No matter the software, the data has be moved from the tcp buffer to RAM, and than to GPU, and back again. I dont know of (haven't looked for) any mechanism that reads a tcp buffer directly into cuda memory. The questions becomes if using the gpu is advantageous...for smaller models, it wont be during real-time execution due to the cpu to gpu to cpu overhead.
Try looking into PAC cards [Accelerator Functional Unit (AFU)] running on FPGAs. I can't talk much about it as I am currently developing the unit working for the manufacturer company.