Skip to main content

My Story with Computers and AI

· loading · loading · ·
Yang Cao
Author
Yang Cao
私は、誰かの心に住めたかな?
Table of Contents

First Encounter
#

My father was a computer engineer. We had a large collection of computer books at home in Beijing.

Before I started kindergarten, I forget the exact reason, but once I had an argument with my mom. I told her, “I don’t need you to support me! I’ll earn money on my own!” After saying this, I started looking for a stool to reach the thickest C++ beginner’s book on the top shelf. I wanted to prove myself.

But in less than three minutes, I quietly started crying. Because I couldn’t understand a single word on the page! I threw the book aside and turned to bury my face in my mom’s arms.

Although that attempt to show off was a complete failure, from that moment, I saw the infinite mystery of computers. And I developed a strong urge to uncover its veil, to understand computers completely.

Looking back now, I realize that at that moment, I planted the seed of computers deep in my heart.

AI
#

I started to know about AI when I was in ninth grade. At that time, I was working on something similar to Discord bots, and I even managed to make some profit.

This achievenent siginificantly cheered my up, and I wanted to challenge myself even more, to do some more difficult activities.

At the end of 2022, a classmate recommended an anime called “Vivy: Fluorite Eye’s Song” to me. “Vivy” built a sci-fi human world, talking about how AI can also understand and express love, just like humans.

Coincidentally, at that time, GPT-3.5 had just announced. I immediately became very interested in large language models (LLMs), and started doing some research on AI role-play and Artificial Personality with the friend.

Learning More
#

Gradually, as I began to delve deeper into AI and LLMs, I realized that more knowledge in linear algebra and calculus are indispensable.

At that time, I had just started 10th grade in China. I really hate that school, especially some teachers who’s always have strong bias on me. I had really severe depression at that time.

So, during the class, I was basically self-studying linear algebra, calculus, and deep learning. During that period, I watched a lot of videos, including 3Blue1Brown’s essence of calculus and linear algebra series, a deep learning course on Coursera, and MIT’s 6.006 open course on algorithms.

I was also really curious about latest works. So, I stated to read some papers and to build my own perception. I read the paper of Transformers, Attention Free Transformers, LoRA and some more, before I stated to do the research.

After accumulating a bit of fundamental knowledge, I started to engage in some simple research. As I mentioned before, I was very interested in AI and personality, so I wanted to start working on some prompt engineering and model modifications to make LLMs better at simulating human’s tone.

I brainstormed many ideas at the time, and the one I thought was most feasible was: calculating the embedding of the context and storing it in a database, after user input a new query, using LLM to generate keywords from the query, and calculating the cosine similarity relative to all previous context embeddings to automatically find relevant previous context, and adding it to the prompt to enhance long-term memory; another was modifying the Transformers’ Query, to mix Self-Attention and Cross-Attention to achieve enhanced model personalization.

I also had an internship in a start-up company. They funded me the embedding research, and I gave them my code.

After leaving my original school and coming to the US, I started to put more efforts on my research. The first term in the US, I was experiencing some personal relationship issues. Under the strong depression, I started to use LLM as a friend, having some really deep conversations with them. I was becoming more and more interested in my research direction of LLM and personality.

During that period, I felt very empty. The setbacks I experienced in my relationship made me strongly want to prove my abilities. So, I started to invest more and more time in research.

SORSA
#

Later, after numerous experiments, I focused my research on the latter approach.

After a long period of discussions with industry professionals, searching for related works, brainstorming, and iterating, I began to research a new PEFT method.

Initially, I wanted to create a Transformer-specific PEFT Method, paralleling a new low-rank head to optimize the entire model. However, a major problem with this method was that initialization greatly affected training.

So I wondered if there was a way for the parallel head to efficiently obtain key information from the pre-trained head for initialization. Later, I read the paper of PiSSA. It used SVD for this. So, I started learning about SVD-related knowledge.

As I learned more about the properties of SVD, I gradually had some questions about PiSSA. Why merge singular vectors and singular values? Would it be better not to do so? What if we maintain the properties of SVD during the training process?

To be honest, when I started this research, I had no idea whether this method would actually work. I just felt it made intuitive sense. I even drew the first architecture diagram before doing any experiments.

Maybe it was luck, or maybe my intuition was pretty accurate. This method indeed performed excellently. I named it SORSA and began related interpretability work. I learned almost everything by writing code, writing the paper, and searching online simultaneously.

When I completed most of the work, I had no idea what level my paper was at. I submitted it to arXiv with a “let’s give it a try” attitude. After a long wait, it was successfully posted. I also successfully published my first preprint.

Paper Explain: "SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models"
· loading · loading

I have gained an unprecedented sense of achievement.

And I believe that my journey has just begun.