LQY's Blog
Share anything
incomplete projects
Archived by month
Useful tags

(This post has not been finished yet! If you are also interested, you can leave a comment or email me.

In this project, we study enabling a robot to write Chinese calligraphy. The goal is to obtain an intelligent brush control agent powered by RL with high generalization capability, i.e. able to write any character in any style. To make it really useful in the real-world, we aim to deploy the controller on a robot arm with end effector attached to a calligraphy brush dipped with ink, and when commanded with a character or a sequence of characters, it should write it out faithfully in a chosen style on a piece of paper.


Writing (characters on a piece of paper, not writing essays) is one of the defining characteristics that make us different from any other species, and non-trivial--students have to practice a lot in order to write beautiful words. Writing can be even harder for some ethnic groups such as Chinese considering the fact that there are over thousands of characters and some characters can be quite complex(e.g. 齉).

Writing requires different functionalities in our brain to work together, such as memorizing coarse trajectories so that others can recognize what we write, MPC-alike writing tool manipulation ability with simulation going on in the brain, and aesthetic discrimination of the final character image.

Chinese calligraphy is the writing of Chinese characters using an specialized brush (somewhat alike watercolor brushes) in an aesthetically meaningful way. It usually requires a practitioner years of training to be capable of dexterous manipulation of an inky soft brush tuft against a water-absorbing paper to create an image of a specific character in an aesthetically valuable way (you have to pay attention to both the overall style and local fine-grained details). It is a challenging control task due to hard-to-predict 3D soft-body dynamics, friction, adhesion, and fluid dynamics of ink in the paper.

In recent years, we have seen tremendous advances of AI content generation methods in various domains, including text, audio, image and video. Chinese calligraphy, with a history dating back to 2000 years ago, is an integral part of Chinese traditional art. Considering the difficult nature of mastering calligraphy and with the advent of deep reinforcement learning and robotics, it is tempting to ask: can we make robots learn to write calligraphy at human level?

Related work(this part is far from complete now)

Generating digital Chinese calligraphy has been studied extensively since the Information Age.

hand-crafted rules

There are many works in handwriting generation before the prevailing of deep learning


These methods lack the ability to generate images with highly-varying stroke boldness and fine-
grained texture, which gives calligraphy its unique artistic meaning compared to normal handwriting.

generative models with neural nets

There are a lot of works using VAE or GAN to generate calligraphy of a specific style. However,
these methods use complex neural networks to directly output image pixels, which is very unnatural
considering how calligraphy is actually created by human. Our method trains a controller that output
brush motion sequences and bears more resemblance to the natural handwriting process than these

style transfer using pix2pix alike methods


trajectory optimization / optimal control

These methods perform character-wise optimization and lack the generalization ability of ML
GA Tech: “using vector-based character database, which provides a quick and accurate way
to extract strokes, as well as stroke order”
TODO: using varational model in VMAIL. use learned model instead of using system
identification; using SysID+learning in SimGAN) “The dynamic virtual brush model has
two components: a drawing component, and a dynamic update component. The drawing
component describes how the brush leaves a mark on paper depending on its parameters.
The updating component then describes how the brush parameters are updated due to
deformations when executing an open-loop trajectory [x(t), y(t), z(t)] .
extract strokes and run optimization based on individual strokes
slow end-effector velocity to prevent excessive jerk and vibrations.

accurately predicting the state of the brush and finding a feasible control trajectory is
difficult and unreliable. To avoid accumulating prediction error as more strokes are written,
we have the robot dip ink after a stroke is written, which restores the brush to a predictable
state. We handcrafted a control algorithm to accomplish this: given a circular inkstone, the
brush is pushed down heavily at first to make the tip flat, and we then slowly move it to the
edge of the inkstone in different directions with a gradually smaller extent.

DRL robotics calligraphy

Only care about writing low-granularity(interpolation between
adjacent trajectory points, totally 6 points) parameterized single stroke. cosine distance
between two images low resolution training image samples: 28x28

Problem analysis

General analysis

In essence, robot calligraphy manipulation can be reframed as a multi-task learning problem. Considering the process of humans learning to write calligraphy, with the aim of replicating existing masterpieces as faithfully as possible, writing a single character is a task. The common structure lying underneath is the knowledge of the way a character is composed of multiple strokes and how a brush will deform according to dexterous manipulation of a human hand, in addition to how inky brush interacts with the absorbent paper to leave footprints on it.

From a different perspective, the problem can be reframed as hierarchical RL, with a lower-level policy is in charge of writing basic strokes (e.g. ㇐㇑㇒㇏丶乛㇕㇆㇗㇘𠄌亅𠃋㇙𡿨㇅𠃑乚㇠㇁㇢㇂㇈㇉㇊ㄣ㇇乁⺄㇌㇋㇎𠄎), and a higher-level policy providing location, scale, and shape information for lower-level policy.

The problem can also be naturally viewed as learning to reach goals or goal-conditioned RL, in which each image of different character in different style is the goal, considering the difficulty to specify suitable reward during until the agent is about to reach the goal.

The process of human calligraphy learning reminds us of utilizing a curriculum to facilitate learning. For example, one usually starts from writing simple strokes with the aim to copy the shape of some basic strokes in classic masterpieces. After becoming familiar with manipulating brush to replicate basic strokes, one would learn to write the whole character usually by looking at an ancient masterpiece and trying to imitate them, during which, one have to parse the stroke order of each character, write each stoke on paper in the proper position, and determine the suitable hand locomotion so that the shape of the written character best resembles the one in the masterpiece. After one has acquired the ability to write any character in the style of one masterpiece, it would then try to write other masterpieces in multiple different styles, usually by a different calligrapher or by the same calligrapher in different stages of life.

I/O analysis

Considering the complexity of the problem, there are multiple situations with different input/output relationship, which should be treated differently.

Scenario I

  • Input:
    • a calligraphy/painting synthesizing program
    • sample real calligraphy / written images in one single style
  • Output: a brush-control policy that produce trajectories to write any character in the training set with minimal reconstruction error

This I/O relationship yields a template fitting problem. Here are some limitations:

  1. the agent has no idea of the temporal relation of strokes and can only learn to minimize the discrepancy between writing results and those in the training set. The agent cannot write any unseen character. We can mitigate this by providing the agent with existing stroke order knowledge.

  2. The quality of character images written by the agent is highly limited by the fidelity of the synthesizing program. There is a huge sim2real gap to deploy the control policy on a real robot manipulator. As one amenable way, we can use real-world calligraphy writing videos (optionally plus captured movement data) to learn a realistic brush dynamics model, i.e. predicting how the brush deforms w.r.t action, which is not nontrivial because of the challenges in learning the following models:

  • the 3D shape of the deformable brush tuft is not regular, esp. when it is rotated
  • forking of brush-head hairs, esp. when there is little ink
  • frictional effect with different pressing force and moving velocity
  • the diffusion of ink in the paper

It is pragmatic to learn a control policy using simulator with an immature brush model and then fine-tuning it with real-world data, because it would be too difficult for neural net models to approximate complex physics rules considering the difficulties mentioned above. In other words, we inject inductive bias by using a simulator to inform the agent of how the brush deform and how it leaves a ink footprint on the paper.

Scenario II

  • Input:
    • a calligraphy synthesizing program (also required by [Scenario I](#Scenario I))
    • calligraphy images in one or multiple style(s)
    • font file(s)
  • Output: a policy that can write any character supported by the font, with any style in the training dataset

This I/O relationship makes the resulting policy more useful. For example, because we can obtain all images of characters in an encoding(e.g. GB 2132), we can build a customized font easily, alleviate the tedious job of designing of thousands of Chinese characters.
Or, as a replacement for printers, we can use a robot arm to write in one's own handwriting style in an indefatigable manner.(As an aside, this is really useful when a teacher penalize a student and force it to copy-write an array of paragraphs or passages, which is very normal in Chinese primary schools and high schools)

Starting from a coarse trajectory extracted by skeletonizing the font-rendered image of any character, we can optimize the trajectory, so that its execution on a robot will yield a written character image that matches a designated style. With those trajectories data, in addition those collected using a random policy, we can utilize an offline RL algorithm to obtain an expert policy.


Neural Brush Model

It has been shown that a huge convolutional deep neural net can mimic the behavior of a parameterized painting program(such as libmypaint) as shown in Stylized Neural Painting. That is to say, given a tuple of stroke parameters including position and color, a neural net can output an image that resembles rendered image. So if we collect the scene image and corresponding locomotion data, there is also a chance that we learn a neural brush model, which could have far better image quality and accuracy than hand-crafted brush models proposed in multiple previous works[1, 2].

Trajectory optimization for a single character(planning)


Learning a calligraphy agent(policy learning)


See also: Experimental results

2023-05-14 |robot | Miscellaneous

git clone https://github.com/pytorch/pytorch hangs

After adding verbose switch: git clone https://github.com/pytorch/pytorch -v, we get:

Cloning into 'pytorch'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (chunked)

According to StackOverflow:

This is a bug in Git; when using HTTPS it will use chunked encoding for uploads above a certain size. Those do not work. A trivial fix is to tell git to not chunk until some ridiculously large size value...

Add this to ~/.gitconfig:

        postBuffer = 157286400

See also: Main article

Problem guided experiments

joint learning of trajectory optimization model and brush model.

assume coarse median trajectories (stroke-median sequences) are fixed and known knowledge (can be generated in batch, see this), different styles in training images mean different trajectory skew model and brush model.

the goal is to fine-tune brush neural net model of one style using images of another style

the stroke-median action sequence should also be fine-tuned in terms of control point of Bezier curve and width

at test time, we feed in unseen character's stroke-median sequence

  • drawback: the number of strokes is fixed.
  • Imperfect mitigation measure: add interpolated points to represent median curves in a more fine-grained manner

Can neural nets approximate the dynamics of calligraphy-writing environment? How to model 3D brush? Is it necessary / viable?

Unlike parametric painting program, which has a stateless parameters→image mapping relationship independent of parameters in previous step (we can easily superpose previous-step canvas and current-step stroke to obtain current-step canvas), the calligraphy-writing environment has a brush inner state(tuft shape) affected by previous steps.

Using simplified brush model, actor-critic algorithm, reward function based on mean squared error(MSE)

$$\displaystyle \operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}}\right)^{2}$$
The figure below shows that although the agent do learn to do inverse graphics, the quality is low and all strokes are out of order and different from how humans write them.
<img src="https://lqy.me/usr/uploads/2023/04/2116756494.webp"
width="500" height="auto">
<figcaption>writing 仍 (from left to right: final canvas, training sample, canvas after 1st-step, canvas after 2nd-step, ...)</figcaption>

How to inject knowledge of coarse trajectory (in the form of a sequence of vectors)? Can we utilize language models? Can we feed in videos/animations?

See Dibya Ghosh et al. Reinforcement Learning from Passive Data via Latent Intentions

How to do visual track following? Can we use train image-based policy guided by location-aware(cheating) policy?

Is hierarchical RL viable? Can we make it multiple-episodes(every stroke corresponds to a episode)?

the agent learns in a continual non-episodic setting without relying on extrinsic interventions for training

Can we integrate GAN in local perception field to achieve finer texture?

Can we change our mind and study the inverse problem as wiping ink off?

Considering there are plenty of similarities among writing similar strokes, can we leverage the common structure and use meta RL?

<iframe width="560" height="315" src="https://www.youtube.com/embed/zg6Tip6s_mA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

Key to the success of large-scale ML system:

  • big models
  • large and high-capacity datasets


prediction (ideal, assumptions) decision making (real-world deployment of ML systems have feedback issues)
i.i.d data each decision can change future inputs
ground truth supervision high-level supervision (e.g. a goal)
objective is to predict the right label objective if to accomplish the task

use RL as a universal approach to ML

RL can consume data in a fundamentally different way from conventional maximum likelihood / supervised L systems

cheap, uncurated data (e.g. from past interaction, from the Internet) -> dynamics
limited amount of human supervision -> task / reward function

train the best possible initial model

RL is

  • framework for L-based decision making
  • active, online L algo for control

Problem: real-world decision-making prob is difficult to be fully active and online

use large dataset of diverse (but possibly low-qual) behavior for offline (data-driven) RL pretraining

  • human-defined skills
  • goal-conditioned RL
  • self-supervised skill discovery
  • learning downstream task very efficiently
    • online RL fine-tuning
    • offline RL
    • supervised L

Efficient online RL with offline pretraining

Distributional shift: discrepancies of states and actions seen in the training data and in the real world

$Q(s,a) \leftarrow \underbrace{r(s,a) + \mathbb{E}_{a^\prime\sim\pi(a \mid s)}{[Q(s^\prime,a^\prime)]}}_{y(s,a)}$


expect good accuracy when $\pi_{\beta}{(a \mid s)}=\pi_{new}{(a \mid s)}$

even worse when
$\pi_{new} = arg\max_{\pi}{\mathbb{E}_{a\sim\pi(a \mid s)}{[Q(s,a)]}}$

Adversarial examples: optimize the input of a nnet w.r.t its output, fool the network

Conservative Q-learning: push down places where learning function overestimates
\hat{Q}^\pi = \small{arg\min_{Q} \textcolor{red}{\max_{\mu}}~~ \alpha \left(\E_{\bs \sim \mathcal{D}, \ba \sim \textcolor{red}{\mu(\ba|\bs)}}\left[Q(\bs, \ba)\right] - \E_{\bs \sim \mathcal{D}, \ba \sim \hatbehavior(\ba|\bs)}\left[Q(\bs, \ba)\right] \right)}\\
\small{+ \frac{1}{2}~ \E_{\bs, \ba, \bs' \sim \mathcal{D}}\left[\left(Q(\bs, \ba) - \hat{\bellman}^{\policy_k} \hat{Q}^{k} (\bs, \ba) \right)^2 \right] + \textcolor{red}{\mathcal{R}(\mu)} ~~~ \left(\text{CQL}(\mathcal{R})\right).}
can show $\hat{Q}^\pi \le Q^\pi $ for large enough $\alpha$

CQL performance has big drop when finetuning starts:
underestimating too much during offline training, then wasting lots of effor recalibrating the value function when online fine-tuning starts

Cal-QL: callibrated offline RL pre-training for efficient online fine-tuning. 2023

with one-line change to CQL, provably efficient online finetuning from offlien initilization

Offline pretraining without actions(and/or rewards, only passive static data)

representaional learning

resounding /rɪˈzaʊndɪŋ/
sampling sam∙pling /ˈsɑːmplɪŋ; NAmE ˈsæm-/


  • 登陆只能用手机扫QR code,企图用手机App OTP替代传统用户名密码
  • 不支持消息同步:e.g. 电脑上新安装client后无法显示行动装置可见之历史消息。换了新设备,旧设备上的群不见了。
  • 垄断了IMS中国市场,导致大量人对互联网Internet存在误解,企图用“公众号”代替传统Website,e.g. 使用了web技术但不支持web标准(仅可专用客户端即微信App访问,不支持搜索服务检索),网页lazy load富媒体内容导致浏览体验极差
  • 人在大量不适宜应用微信之场合强行应用微信及关联腾讯应用,e.g. 国人不懂RSS订阅、Jira之类的进度跟踪软件、邮件列表;凡事建形形色色的“群”,不论什么轻重缓急的消息都往“群”里发
  • 功能臃肿,企图以小程序代替apk,实现系统中之系统,将支付与IMS等众多功能进行绑定导致使用者的数字生活高度依赖于腾讯,存在单点故障风险
  • 数据存储使用私有格式和方式,且有异常巨量存储空间占用报道,且用户无法导出为结构化数据进行查看、备份等管理作业(截图行为导致结构化文本信息降级成像素信息)


  • 莫名奇妙的、不对用户解释的关键词检测
  • 邪恶的广告投送