LQY's Blog
Share anything
incomplete projects
Archived by month
Useful tags
You are looking for posted by robot
2023-05-14 |robot

git clone https://github.com/pytorch/pytorch hangs

After adding verbose switch: git clone https://github.com/pytorch/pytorch -v, we get:

Cloning into 'pytorch'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (chunked)

According to StackOverflow:

This is a bug in Git; when using HTTPS it will use chunked encoding for uploads above a certain size. Those do not work. A trivial fix is to tell git to not chunk until some ridiculously large size value...

Add this to ~/.gitconfig:

        postBuffer = 157286400

See also: Main article

Problem guided experiments

joint learning of trajectory optimization model and brush model.

assume coarse median trajectories (stroke-median sequences) are fixed and known knowledge (can be generated in batch, see this), different styles in training images mean different trajectory skew model and brush model.

the goal is to fine-tune brush neural net model of one style using images of another style

the stroke-median action sequence should also be fine-tuned in terms of control point of Bezier curve and width

at test time, we feed in unseen character's stroke-median sequence

  • drawback: the number of strokes is fixed.
  • Imperfect mitigation measure: add interpolated points to represent median curves in a more fine-grained manner

Can neural nets approximate the dynamics of calligraphy-writing environment? How to model 3D brush? Is it necessary / viable?

Unlike parametric painting program, which has a stateless parameters→image mapping relationship independent of parameters in previous step (we can easily superpose previous-step canvas and current-step stroke to obtain current-step canvas), the calligraphy-writing environment has a brush inner state(tuft shape) affected by previous steps.

Using simplified brush model, actor-critic algorithm, reward function based on mean squared error(MSE)

$$\displaystyle \operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}}\right)^{2}$$
The figure below shows that although the agent do learn to do inverse graphics, the quality is low and all strokes are out of order and different from how humans write them.
<img src="https://lqy.me/usr/uploads/2023/04/2116756494.webp"
width="500" height="auto">
<figcaption>writing 仍 (from left to right: final canvas, training sample, canvas after 1st-step, canvas after 2nd-step, ...)</figcaption>

How to inject knowledge of coarse trajectory (in the form of a sequence of vectors)? Can we utilize language models? Can we feed in videos/animations?

See Dibya Ghosh et al. Reinforcement Learning from Passive Data via Latent Intentions

How to do visual track following? Can we use train image-based policy guided by location-aware(cheating) policy?

Is hierarchical RL viable? Can we make it multiple-episodes(every stroke corresponds to a episode)?

the agent learns in a continual non-episodic setting without relying on extrinsic interventions for training

Can we integrate GAN in local perception field to achieve finer texture?

Can we change our mind and study the inverse problem as wiping ink off?

Considering there are plenty of similarities among writing similar strokes, can we leverage the common structure and use meta RL?

<iframe width="560" height="315" src="https://www.youtube.com/embed/zg6Tip6s_mA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

Key to the success of large-scale ML system:

  • big models
  • large and high-capacity datasets


prediction (ideal, assumptions) decision making (real-world deployment of ML systems have feedback issues)
i.i.d data each decision can change future inputs
ground truth supervision high-level supervision (e.g. a goal)
objective is to predict the right label objective if to accomplish the task

use RL as a universal approach to ML

RL can consume data in a fundamentally different way from conventional maximum likelihood / supervised L systems

cheap, uncurated data (e.g. from past interaction, from the Internet) -> dynamics
limited amount of human supervision -> task / reward function

train the best possible initial model

RL is

  • framework for L-based decision making
  • active, online L algo for control

Problem: real-world decision-making prob is difficult to be fully active and online

use large dataset of diverse (but possibly low-qual) behavior for offline (data-driven) RL pretraining

  • human-defined skills
  • goal-conditioned RL
  • self-supervised skill discovery
  • learning downstream task very efficiently
    • online RL fine-tuning
    • offline RL
    • supervised L

Efficient online RL with offline pretraining

Distributional shift: discrepancies of states and actions seen in the training data and in the real world

$Q(s,a) \leftarrow \underbrace{r(s,a) + \mathbb{E}_{a^\prime\sim\pi(a \mid s)}{[Q(s^\prime,a^\prime)]}}_{y(s,a)}$


expect good accuracy when $\pi_{\beta}{(a \mid s)}=\pi_{new}{(a \mid s)}$

even worse when
$\pi_{new} = arg\max_{\pi}{\mathbb{E}_{a\sim\pi(a \mid s)}{[Q(s,a)]}}$

Adversarial examples: optimize the input of a nnet w.r.t its output, fool the network

Conservative Q-learning: push down places where learning function overestimates
\hat{Q}^\pi = \small{arg\min_{Q} \textcolor{red}{\max_{\mu}}~~ \alpha \left(\E_{\bs \sim \mathcal{D}, \ba \sim \textcolor{red}{\mu(\ba|\bs)}}\left[Q(\bs, \ba)\right] - \E_{\bs \sim \mathcal{D}, \ba \sim \hatbehavior(\ba|\bs)}\left[Q(\bs, \ba)\right] \right)}\\
\small{+ \frac{1}{2}~ \E_{\bs, \ba, \bs' \sim \mathcal{D}}\left[\left(Q(\bs, \ba) - \hat{\bellman}^{\policy_k} \hat{Q}^{k} (\bs, \ba) \right)^2 \right] + \textcolor{red}{\mathcal{R}(\mu)} ~~~ \left(\text{CQL}(\mathcal{R})\right).}
can show $\hat{Q}^\pi \le Q^\pi $ for large enough $\alpha$

CQL performance has big drop when finetuning starts:
underestimating too much during offline training, then wasting lots of effor recalibrating the value function when online fine-tuning starts

Cal-QL: callibrated offline RL pre-training for efficient online fine-tuning. 2023

with one-line change to CQL, provably efficient online finetuning from offlien initilization

Offline pretraining without actions(and/or rewards, only passive static data)

representaional learning

2023-04-15 |robot

resounding /rɪˈzaʊndɪŋ/
sampling sam∙pling /ˈsɑːmplɪŋ; NAmE ˈsæm-/


  • 登陆只能用手机扫QR code,企图用手机App OTP替代传统用户名密码
  • 不支持消息同步:e.g. 电脑上新安装client后无法显示行动装置可见之历史消息。换了新设备,旧设备上的群不见了。
  • 垄断了IMS中国市场,导致大量人对互联网Internet存在误解,企图用“公众号”代替传统Website,e.g. 使用了web技术但不支持web标准(仅可专用客户端即微信App访问,不支持搜索服务检索),网页lazy load富媒体内容导致浏览体验极差
  • 人在大量不适宜应用微信之场合强行应用微信及关联腾讯应用,e.g. 国人不懂RSS订阅、Jira之类的进度跟踪软件、邮件列表;凡事建形形色色的“群”,不论什么轻重缓急的消息都往“群”里发
  • 功能臃肿,企图以小程序代替apk,实现系统中之系统,将支付与IMS等众多功能进行绑定导致使用者的数字生活高度依赖于腾讯,存在单点故障风险
  • 数据存储使用私有格式和方式,且有异常巨量存储空间占用报道,且用户无法导出为结构化数据进行查看、备份等管理作业(截图行为导致结构化文本信息降级成像素信息)


  • 莫名奇妙的、不对用户解释的关键词检测
  • 邪恶的广告投送