Test-driven development → Problem-driven experiments
Can neural nets approximate the dynamics of calligraphy-writing environment? How to model 3D brush? Is it necessary / viable?
Unlike parametric painting program, which has a stateless parameters→image mapping relationship independent of parameters in previous step (we can easily superpose previous-step canvas and current-step stroke to obtain current-step canvas), the calligraphy-writing environment has a brush inner state(tuft shape) affected by previous steps.
Is it feasible to learn a trajectory optimization model and brush model jointly?
assume coarse median trajectories (stroke-median sequences) are fixed and known knowledge (can be generated in batch, see this), different styles in training images mean different trajectory skew model and brush model.
the goal is to fine-tune brush neural net model of one style using images of another style
the stroke-median action sequence should also be fine-tuned in terms of control point of Bezier curve and width
at test time, we feed in unseen character’s stroke-median sequence
- drawback: the number of strokes is fixed.
- Imperfect mitigation measure: add interpolated points to represent median curves in a more fine-grained manner
Using simplified brush model, actor-critic algorithm, reward function based on mean squared error(MSE)
{MSE} ={\frac {1}{n}}\sum _{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}}\right)^{2}
(Q function curve during training)
<Return (expected sum of reward) curve during training>
The figure below shows that although the agent do learn to do inverse graphics, the quality is low and all strokes are out of order and different from how humans write them.
<canvas image at each timesteps (from left to right)>
<comparison between shape written (left) and ground-truth(right)>
But this method failed in writing very complex characters.
(failed writing process)
(comparison)
How to inject knowledge of coarse trajectory (in the form of a sequence of vectors)? Can we utilize language models? Can we feed in videos/animations?
See Dibya Ghosh et al. Reinforcement Learning from Passive Data via Latent Intentions
How to do visual track following? Can we use train image-based policy guided by location-aware(cheating) policy?
Is hierarchical RL viable? Can we make it multiple-episodes(every stroke corresponds to a episode)?
Considering there are plenty of similarities among writing similar strokes, can we leverage the common structure and use meta RL?
the agent learns in a continual non-episodic setting without relying on extrinsic interventions for training
What is the edge over GAN methods?
Which method will have finer texture?
Can we change our mind and study the inverse problem as wiping ink off?
Similar to Towards ML-enabled cleaning robots – Google Research Blog
Overall strategy: from easy to hard
(Meta) (Offline) (Goal-conditioned) (Hierarchical) RL
RL environment simulation: 2D → 3D
experiment: single stroke → single character → different characters → different styles