In essence, robot calligraphy manipulation can be reframed as a multi-task learning problem. Considering the process of humans learning to write calligraphy, with the aim of replicating existing masterpieces as faithfully as possible, writing a single character is a task. The common structure lying underneath is the knowledge of the way a character is composed of multiple strokes and how a brush will deform according to dexterous manipulation of a human hand, in addition to how inky brush interacts with the absorbent paper to leave footprints on it.
From a different perspective, the problem can be reframed as hierarchical RL, with a lower-level policy in charge of writing basic strokes (e.g. ㇐㇑㇒㇏丶), or a bit more complex ones (e.g.㇁㇢㇂乛㇕㇆㇗㇘𠄌亅𠃋㇙𡿨㇅𠃑乚㇠㇈㇉㇊ㄣ㇇乁⺄㇌㇋㇎𠄎), and a higher-level policy providing location, scale, and shape information for the lower-level policy.
Considering the difficulty of devising a proper reward function for RL, the problem can also be naturally viewed as learning to reach goals or goal-conditioned RL, in which each image of different character in different style is the goal.
The process of human calligraphy learning reminds us of utilizing a curriculum to facilitate learning.
For example, one usually starts from writing simple strokes with the aim to copy the shape of some basic strokes in classic masterpieces.
After becoming familiar with manipulating brush to replicate basic strokes, one would learn to write the whole character usually by looking at an ancient masterpiece and trying to imitate them, during which one have to parse the stroke order of each character, write each stoke on paper in the proper position, and determine the suitable locomotion so that the shape of the written character best resembles the one in the masterpiece.
After one has acquired the ability to write any character in the style of one masterpiece, one would then try to write other masterpieces in multiple different styles, usually by a different calligrapher or by the same calligrapher in different stages of life.
There are multiple sets of input/output relationships with different complexity and should be treated differently.
- a calligraphy/painting synthesizing program
- real calligraphy samples / written images in one single style
- Output: a brush-control policy that produce trajectories to write any character in the training set with minimal reconstruction error
This I/O relationship yields a template fitting problem. Here are some limitations:
the agent has no idea of the temporal relation of strokes and can only learn to minimize the discrepancy between writing results and those in the training set. The agent cannot write any unseen character. We can mitigate this by providing the agent with existing stroke order knowledge.
The quality of character images written by the agent is highly limited by the fidelity of the synthesizing program. There is a huge sim2real gap to deploy the control policy on a real robot manipulator. As one amenable way, we can use real-world calligraphy writing videos (optionally plus captured movement data) to learn a realistic brush dynamics model, i.e. predicting how the brush deforms w.r.t action, which is not nontrivial because of the challenges in learning the following things:
- the 3D shape of the deformable brush tuft is not regular, esp. when it is rotated
- forking of brush-head hairs, esp. when there is little ink
- frictional effect with different pressing force and moving velocity
- the diffusion of ink in the paper
It is pragmatic to learn a control policy using simulator with an immature brush model and then fine-tuning it with real-world data, because it would be too difficult for neural net models to approximate complex physics rules considering the difficulties mentioned above. In other words, we inject inductive bias by using a simulator to inform the agent of how the brush deform and how it leaves a ink footprint on the paper.
- a calligraphy synthesizing program (same as in Scenario I)
- calligraphy images in one or multiple style(s)
- font file(s)
- Output: a policy that can write any character supported by the font, with any style in the training dataset
This I/O relationship makes the resulting policy more useful. For example, because we can obtain all images of characters in an encoding(e.g. GB 2132), we can build a customized font easily, alleviate the tedious job of designing of thousands of Chinese characters.
Or, as a replacement for printers, we can use a robot arm to write in one’s own handwriting style in an indefatigable manner. (As an aside, this is really useful when a teacher penalize a student and force it to copy-write an array of paragraphs or passages, which is very normal in Chinese primary schools and high schools)
Starting from a coarse trajectory extracted by skeletonizing the font-rendered image of any character, we can optimize the trajectory, so that its execution on a robot will yield a written character image that matches a designated style. With those expert trajectories data, in addition those collected using a random policy, we can utilize an offline RL algorithm to obtain an expert policy.
It has been shown that a huge convolutional deep neural net can mimic the behavior of a parameterized painting program(such as libmypaint) as shown in Stylized Neural Painting. That is to say, given a tuple of stroke parameters including position and color, a neural net can output an image that resembles rendered image. So if we collect the scene image and corresponding locomotion data, there is also a chance that we learn a neural brush model, which could have far better image quality and accuracy than hand-crafted brush models proposed in multiple previous works[1, 2].