Control Network camera through a reference point in every frame


Just playing around with a network camera and I am really stuck so I could really appreciate any help:

I have a network camera which has ptz control and would, for start, to control the
camera left/right based on some sort of region of interest:

You can see the region of interest in the attached file. Now if I define a point to the right of region of interest, I would like the camera to turn left until that point is within
the ROI. Once that point is there, I sent a stop command to the camera.

My problem is, for say a point located at ( 800,362), as the camera turns right and for every frame I get from the camera, how do I define new coordinates for the old point so I can
compute whether it’s in the ROI or not? I remember, doing something similar with Opencv was quite easy because for every frame I would get the real screen coordinates of
the point/object.

Thank you in advance.