- Background
- Related work(this part is far from complete now)
- Problem analysis
- Methods
- See also: Experimental results
(This post has not been finished yet! If you are also interested, you can leave a comment or email me.
In this project, we study enabling a robot to write Chinese calligraphy. The goal is to obtain an intelligent brush control agent powered by RL with high generalization capability, i.e. able to write any character in any style. To make it really useful in the real-world, we aim to deploy the controller on a robot arm with end effector attached to a calligraphy brush dipped with ink, and when commanded with a character or a sequence of characters, it should write it out faithfully in a chosen style on a piece of paper.
Background
Writing (characters on a piece of paper, not writing essays) is one of the defining characteristics that make us different from any other species, and non-trivial--students have to practice a lot in order to write beautiful words. Writing can be even harder for some ethnic groups such as Chinese considering the fact that there are over thousands of characters and some characters can be quite complex(e.g. 齉).
Writing requires different functionalities in our brain to work together, such as memorizing coarse trajectories so that others can recognize what we write, MPC-alike writing tool manipulation ability with simulation going on in the brain, and aesthetic discrimination of the final character image.
Chinese calligraphy is the writing of Chinese characters using an specialized brush (somewhat alike watercolor brushes) in an aesthetically meaningful way. It usually requires a practitioner years of training to be capable of dexterous manipulation of an inky soft brush tuft against a water-absorbing paper to create an image of a specific character in an aesthetically valuable way (you have to pay attention to both the overall style and local fine-grained details). It is a challenging control task due to hard-to-predict 3D soft-body dynamics, friction, adhesion, and fluid dynamics of ink in the paper.
In recent years, we have seen tremendous advances of AI content generation methods in various domains, including text, audio, image and video. Chinese calligraphy, with a history dating back to 2000 years ago, is an integral part of Chinese traditional art. Considering the difficult nature of mastering calligraphy and with the advent of deep reinforcement learning and robotics, it is tempting to ask: can we make robots learn to write calligraphy at human level?
Related work(this part is far from complete now)
Generating digital Chinese calligraphy has been studied extensively since the Information Age.
hand-crafted rules
There are many works in handwriting generation before the prevailing of deep learning
RNN
These methods lack the ability to generate images with highly-varying stroke boldness and fine-
grained texture, which gives calligraphy its unique artistic meaning compared to normal handwriting.
generative models with neural nets
There are a lot of works using VAE or GAN to generate calligraphy of a specific style. However,
these methods use complex neural networks to directly output image pixels, which is very unnatural
considering how calligraphy is actually created by human. Our method trains a controller that output
brush motion sequences and bears more resemblance to the natural handwriting process than these
methods.
style transfer using pix2pix alike methods
trajectory optimization / optimal control
These methods perform character-wise optimization and lack the generalization ability of ML
algorithms
GA Tech: “using vector-based character database, which provides a quick and accurate way
to extract strokes, as well as stroke order”
TODO: using varational model in VMAIL. use learned model instead of using system
identification; using SysID+learning in SimGAN) “The dynamic virtual brush model has
two components: a drawing component, and a dynamic update component. The drawing
component describes how the brush leaves a mark on paper depending on its parameters.
The updating component then describes how the brush parameters are updated due to
deformations when executing an open-loop trajectory [x(t), y(t), z(t)] .
extract strokes and run optimization based on individual strokes
slow end-effector velocity to prevent excessive jerk and vibrations.
accurately predicting the state of the brush and finding a feasible control trajectory is
difficult and unreliable. To avoid accumulating prediction error as more strokes are written,
we have the robot dip ink after a stroke is written, which restores the brush to a predictable
state. We handcrafted a control algorithm to accomplish this: given a circular inkstone, the
brush is pushed down heavily at first to make the tip flat, and we then slowly move it to the
edge of the inkstone in different directions with a gradually smaller extent.
DRL robotics calligraphy
Only care about writing low-granularity(interpolation between
adjacent trajectory points, totally 6 points) parameterized single stroke. cosine distance
between two images low resolution training image samples: 28x28
Problem analysis
General analysis
In essence, robot calligraphy manipulation can be reframed as a multi-task learning problem. Considering the process of humans learning to write calligraphy, with the aim of replicating existing masterpieces as faithfully as possible, writing a single character is a task. The common structure lying underneath is the knowledge of the way a character is composed of multiple strokes and how a brush will deform according to dexterous manipulation of a human hand, in addition to how inky brush interacts with the absorbent paper to leave footprints on it.
From a different perspective, the problem can be reframed as hierarchical RL, with a lower-level policy is in charge of writing basic strokes (e.g. ㇐㇑㇒㇏丶乛㇕㇆㇗㇘𠄌亅𠃋㇙𡿨㇅𠃑乚㇠㇁㇢㇂㇈㇉㇊ㄣ㇇乁⺄㇌㇋㇎𠄎), and a higher-level policy providing location, scale, and shape information for lower-level policy.
The problem can also be naturally viewed as learning to reach goals or goal-conditioned RL, in which each image of different character in different style is the goal, considering the difficulty to specify suitable reward during until the agent is about to reach the goal.
The process of human calligraphy learning reminds us of utilizing a curriculum to facilitate learning. For example, one usually starts from writing simple strokes with the aim to copy the shape of some basic strokes in classic masterpieces. After becoming familiar with manipulating brush to replicate basic strokes, one would learn to write the whole character usually by looking at an ancient masterpiece and trying to imitate them, during which, one have to parse the stroke order of each character, write each stoke on paper in the proper position, and determine the suitable hand locomotion so that the shape of the written character best resembles the one in the masterpiece. After one has acquired the ability to write any character in the style of one masterpiece, it would then try to write other masterpieces in multiple different styles, usually by a different calligrapher or by the same calligrapher in different stages of life.
I/O analysis
Considering the complexity of the problem, there are multiple situations with different input/output relationship, which should be treated differently.
Scenario I
- Input:
- a calligraphy/painting synthesizing program
- sample real calligraphy / written images in one single style
- Output: a brush-control policy that produce trajectories to write any character in the training set with minimal reconstruction error
This I/O relationship yields a template fitting problem. Here are some limitations:
-
the agent has no idea of the temporal relation of strokes and can only learn to minimize the discrepancy between writing results and those in the training set. The agent cannot write any unseen character. We can mitigate this by providing the agent with existing stroke order knowledge.
-
The quality of character images written by the agent is highly limited by the fidelity of the synthesizing program. There is a huge sim2real gap to deploy the control policy on a real robot manipulator. As one amenable way, we can use real-world calligraphy writing videos (optionally plus captured movement data) to learn a realistic brush dynamics model, i.e. predicting how the brush deforms w.r.t action, which is not nontrivial because of the challenges in learning the following models:
- the 3D shape of the deformable brush tuft is not regular, esp. when it is rotated
- forking of brush-head hairs, esp. when there is little ink
- frictional effect with different pressing force and moving velocity
- the diffusion of ink in the paper
It is pragmatic to learn a control policy using simulator with an immature brush model and then fine-tuning it with real-world data, because it would be too difficult for neural net models to approximate complex physics rules considering the difficulties mentioned above. In other words, we inject inductive bias by using a simulator to inform the agent of how the brush deform and how it leaves a ink footprint on the paper.
Scenario II
- Input:
- a calligraphy synthesizing program (also required by [Scenario I](#Scenario I))
- calligraphy images in one or multiple style(s)
- font file(s)
- Output: a policy that can write any character supported by the font, with any style in the training dataset
This I/O relationship makes the resulting policy more useful. For example, because we can obtain all images of characters in an encoding(e.g. GB 2132), we can build a customized font easily, alleviate the tedious job of designing of thousands of Chinese characters.
Or, as a replacement for printers, we can use a robot arm to write in one's own handwriting style in an indefatigable manner.(As an aside, this is really useful when a teacher penalize a student and force it to copy-write an array of paragraphs or passages, which is very normal in Chinese primary schools and high schools)
Starting from a coarse trajectory extracted by skeletonizing the font-rendered image of any character, we can optimize the trajectory, so that its execution on a robot will yield a written character image that matches a designated style. With those trajectories data, in addition those collected using a random policy, we can utilize an offline RL algorithm to obtain an expert policy.
Methods
Neural Brush Model
It has been shown that a huge convolutional deep neural net can mimic the behavior of a parameterized painting program(such as libmypaint) as shown in Stylized Neural Painting. That is to say, given a tuple of stroke parameters including position and color, a neural net can output an image that resembles rendered image. So if we collect the scene image and corresponding locomotion data, there is also a chance that we learn a neural brush model, which could have far better image quality and accuracy than hand-crafted brush models proposed in multiple previous works[1, 2].
Trajectory optimization for a single character(planning)
[TBD]
Learning a calligraphy agent(policy learning)
[TBD]