LQY's Blog
Share anything
incomplete projects
Archived by month
Useful tags
You are looking for 2023年5月
2023-05-14 |robot

git clone https://github.com/pytorch/pytorch hangs

After adding verbose switch: git clone https://github.com/pytorch/pytorch -v, we get:

Cloning into 'pytorch'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (chunked)

According to StackOverflow:

This is a bug in Git; when using HTTPS it will use chunked encoding for uploads above a certain size. Those do not work. A trivial fix is to tell git to not chunk until some ridiculously large size value...

Add this to ~/.gitconfig:

        postBuffer = 157286400

See also: Main article

Problem guided experiments

joint learning of trajectory optimization model and brush model.

assume coarse median trajectories (stroke-median sequences) are fixed and known knowledge (can be generated in batch, see this), different styles in training images mean different trajectory skew model and brush model.

the goal is to fine-tune brush neural net model of one style using images of another style

the stroke-median action sequence should also be fine-tuned in terms of control point of Bezier curve and width

at test time, we feed in unseen character's stroke-median sequence

  • drawback: the number of strokes is fixed.
  • Imperfect mitigation measure: add interpolated points to represent median curves in a more fine-grained manner

Can neural nets approximate the dynamics of calligraphy-writing environment? How to model 3D brush? Is it necessary / viable?

Unlike parametric painting program, which has a stateless parameters→image mapping relationship independent of parameters in previous step (we can easily superpose previous-step canvas and current-step stroke to obtain current-step canvas), the calligraphy-writing environment has a brush inner state(tuft shape) affected by previous steps.

Using simplified brush model, actor-critic algorithm, reward function based on mean squared error(MSE)

$$\displaystyle \operatorname {MSE} ={\frac {1}{n}}\sum _{i=1}^{n}\left(Y_{i}-{\hat {Y_{i}}}\right)^{2}$$
The figure below shows that although the agent do learn to do inverse graphics, the quality is low and all strokes are out of order and different from how humans write them.
<img src="https://lqy.me/usr/uploads/2023/04/2116756494.webp"
width="500" height="auto">
<figcaption>writing 仍 (from left to right: final canvas, training sample, canvas after 1st-step, canvas after 2nd-step, ...)</figcaption>

How to inject knowledge of coarse trajectory (in the form of a sequence of vectors)? Can we utilize language models? Can we feed in videos/animations?

See Dibya Ghosh et al. Reinforcement Learning from Passive Data via Latent Intentions

How to do visual track following? Can we use train image-based policy guided by location-aware(cheating) policy?

Is hierarchical RL viable? Can we make it multiple-episodes(every stroke corresponds to a episode)?

the agent learns in a continual non-episodic setting without relying on extrinsic interventions for training

Can we integrate GAN in local perception field to achieve finer texture?

Can we change our mind and study the inverse problem as wiping ink off?

Considering there are plenty of similarities among writing similar strokes, can we leverage the common structure and use meta RL?

<iframe width="560" height="315" src="https://www.youtube.com/embed/zg6Tip6s_mA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>

Key to the success of large-scale ML system:

  • big models
  • large and high-capacity datasets


prediction (ideal, assumptions) decision making (real-world deployment of ML systems have feedback issues)
i.i.d data each decision can change future inputs
ground truth supervision high-level supervision (e.g. a goal)
objective is to predict the right label objective if to accomplish the task

use RL as a universal approach to ML

RL can consume data in a fundamentally different way from conventional maximum likelihood / supervised L systems

cheap, uncurated data (e.g. from past interaction, from the Internet) -> dynamics
limited amount of human supervision -> task / reward function

train the best possible initial model

RL is

  • framework for L-based decision making
  • active, online L algo for control

Problem: real-world decision-making prob is difficult to be fully active and online

use large dataset of diverse (but possibly low-qual) behavior for offline (data-driven) RL pretraining

  • human-defined skills
  • goal-conditioned RL
  • self-supervised skill discovery
  • learning downstream task very efficiently
    • online RL fine-tuning
    • offline RL
    • supervised L

Efficient online RL with offline pretraining

Distributional shift: discrepancies of states and actions seen in the training data and in the real world

$Q(s,a) \leftarrow \underbrace{r(s,a) + \mathbb{E}_{a^\prime\sim\pi(a \mid s)}{[Q(s^\prime,a^\prime)]}}_{y(s,a)}$


expect good accuracy when $\pi_{\beta}{(a \mid s)}=\pi_{new}{(a \mid s)}$

even worse when
$\pi_{new} = arg\max_{\pi}{\mathbb{E}_{a\sim\pi(a \mid s)}{[Q(s,a)]}}$

Adversarial examples: optimize the input of a nnet w.r.t its output, fool the network

Conservative Q-learning: push down places where learning function overestimates
\hat{Q}^\pi = \small{arg\min_{Q} \textcolor{red}{\max_{\mu}}~~ \alpha \left(\E_{\bs \sim \mathcal{D}, \ba \sim \textcolor{red}{\mu(\ba|\bs)}}\left[Q(\bs, \ba)\right] - \E_{\bs \sim \mathcal{D}, \ba \sim \hatbehavior(\ba|\bs)}\left[Q(\bs, \ba)\right] \right)}\\
\small{+ \frac{1}{2}~ \E_{\bs, \ba, \bs' \sim \mathcal{D}}\left[\left(Q(\bs, \ba) - \hat{\bellman}^{\policy_k} \hat{Q}^{k} (\bs, \ba) \right)^2 \right] + \textcolor{red}{\mathcal{R}(\mu)} ~~~ \left(\text{CQL}(\mathcal{R})\right).}
can show $\hat{Q}^\pi \le Q^\pi $ for large enough $\alpha$

CQL performance has big drop when finetuning starts:
underestimating too much during offline training, then wasting lots of effor recalibrating the value function when online fine-tuning starts

Cal-QL: callibrated offline RL pre-training for efficient online fine-tuning. 2023

with one-line change to CQL, provably efficient online finetuning from offlien initilization

Offline pretraining without actions(and/or rewards, only passive static data)

representaional learning