Using AR ofxAruco to find camera position in a room

I have a bit of a reverse use case for Aruco that I would like some advice on.

I have a physical setup with a camera strapped to the knee of a person pointing at the floor at a slight angle. The floor is a big carpet with aruco markers on constituting one Aruco board.

From this I want to be able to track the position of the camera and thus the person in the room.

I took the getCameraLocation method and made public for this purpose.

/**
*/
cv::Point3f CameraParameters::getCameraLocation(cv::Mat Rvec,cv::Mat Tvec)
{
    cv::Mat m33(3,3,CV_32FC1);
    cv::Rodrigues(Rvec, m33)  ;

    cv::Mat m44=cv::Mat::eye(4,4,CV_32FC1);
    for (int i=0;i<3;i++)
        for (int j=0;j<3;j++)
            m44.at<float>(i,j)=m33.at<float>(i,j);

    //now, add translation information
    for (int i=0;i<3;i++)
        m44.at<float>(i,3)=Tvec.at<float>(0,i);
    //invert the matrix
    m44.inv();
    return  cv::Point3f( m44.at<float>(0,0),m44.at<float>(0,1),m44.at<float>(0,2));

}

Using it like this:

    ofxCv::Mat rvec = aruco.getBoard().Rvec;
    ofxCv::Mat tvec = aruco.getBoard().Tvec;

    ofxCv::Point3f pos = aruco.camParams.getCameraLocation(rvec, tvec);
    ofDrawBox(pos.x*scale, pos.y*scale, pos.z*scale, 20, 20, 20);

My results are very shaky though. The box to the right kind of moves in the room in a way that makes sense but then often it fucks up completely. I am not sure if I am doing something fundamentally wrong here or if its a question of calibration and filtering, better camera …?

If someone else have experience using AR with Aruco or with something else for a similar use case I would love to get your advice.

Full work in progress source code here: https://github.com/jchillerup/rokoko/tree/master/aruco