LQY's Blog
Share anything
incomplete projects
Archived by month
Useful tags
Robot calligraphy at human level: Experiments
2023-05-13 |robot | Project

See also: Main article

Problem guided experiments

joint learning of trajectory optimization model and brush model.

assume coarse median trajectories (stroke-median sequences) are fixed and known knowledge (can be generated in batch, see this), different styles in training images mean different trajectory skew model and brush model.

the goal is to fine-tune brush neural net model of one style using images of another style

the stroke-median action sequence should also be fine-tuned in terms of control point of Bezier curve and width

at test time, we feed in unseen character's stroke-median sequence

  • drawback: the number of strokes is fixed.
  • Imperfect mitigation measure: add interpolated points to represent median curves in a more fine-grained manner

Can neural nets approximate the dynamics of calligraphy-writing environment? How to model 3D brush? Is it necessary / viable?

Unlike parametric painting program, which has a stateless parameters→image mapping relationship independent of parameters in previous step (we can easily superpose previous-step canvas and current-step stroke to obtain current-step canvas), the calligraphy-writing environment has a brush inner state(tuft shape) affected by previous steps.

Using simplified brush model, actor-critic algorithm, reward function based on mean squared error(MSE)

$$\displaystyle \operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}}\right)^{2}$$
The figure below shows that although the agent do learn to do inverse graphics, the quality is low and all strokes are out of order and different from how humans write them.

writing 仍 (from left to right: final canvas, training sample, canvas after 1st-step, canvas after 2nd-step, ...)

How to inject knowledge of coarse trajectory (in the form of a sequence of vectors)? Can we utilize language models? Can we feed in videos/animations?

See Dibya Ghosh et al. Reinforcement Learning from Passive Data via Latent Intentions

How to do visual track following? Can we use train image-based policy guided by location-aware(cheating) policy?

Is hierarchical RL viable? Can we make it multiple-episodes(every stroke corresponds to a episode)?

the agent learns in a continual non-episodic setting without relying on extrinsic interventions for training

Can we integrate GAN in local perception field to achieve finer texture?

Can we change our mind and study the inverse problem as wiping ink off?

Considering there are plenty of similarities among writing similar strokes, can we leverage the common structure and use meta RL?


Add comments

Please add name
Please fill in valid email address
Please fill in a valid website
Please fill in content