eugenesiow (Eugene Siow)

reacted to hesamation's post with 👍 9 months ago

Post

2348

OpenAI just released a 34-page practical guide to building agents,

Here's 10 things it teaches us:

1➜ agents are different from workflows: they are complete autonomous systems that perform tasks on your behalf. many applications use LLMs for workflows, but this is not an agent.

2➜ use them for tricky stuff: complex decision making, dynamic rules, unstructured data

3➜ core recipe: each agent has three main components: Model (the brain), Tools, Instructions on how to behave

4➜ choose the right brain: set up evals to get a baseline performance, use a smart model to see what's possible, gradually downgrade the model for cost and speed

5➜ tools are key: choose well-defined and tested tools. an agent needs tools to retrieve data and context, and take actions.

6➜ instruction matters A LOT: be super clear telling the agent its goals, steps, and rules. Vague instructions = unpredictable agent. Be explicit.

7➜ start simple, then scale: often a single agent with several tools is ok. don't jump to complex multi-agent systems immediately.

8➜ if you use multi-agents: you can have a "manager" agent directing traffic to specialist agents, or have agents hand off tasks to each other.

9➜ gaurdrails are a MUST: check user input for weird stuff, make sure the agent isn't about to do something risky, filter out private info, block harmful content. Don't let it run wild.

10➜ build and plan for humans: start small, test, improve. always have a plan for when the agent gets stuck or is about to do something high-risk.

Download: https://t.co/fJaCkgf7ph

3 replies

·

posted an update 9 months ago

Post

1916

GPT-4.1 dropped this week - and it puts OpenAI back in the race for coding & agentic leadership.

⚙️ API only - no ChatGPT toggle for this.
💻 Coding performance is back on par with Claude 3.7 Sonnet & Gemini 2.5 Pro (though Gemini still leads).
💸 Pricing:
• Full: $3.50 / 1M tokens
• Mini: $0.70 / 1M
• Nano: $0.17 / 1M
👉 Gemini 2.5 Pro = best price/perf ($3.44 / 1M)
😵 Claude 3.5 Sonnet = $6 / 1M (!)

🧠 Not a "thinking" model.
📊 Mini shines on general reasoning tasks (e.g. GPQA), but only the full model holds up in SWE-bench-verified (GitHub issue solving).

reacted to m-ric's post with 🔥 about 1 year ago

Post

1525

𝗦𝗵𝗼𝘄𝗨𝗜: 𝗮 𝘀𝗺𝗮𝗹𝗹 𝗲𝗻𝗱-𝘁𝗼-𝗲𝗻𝗱 𝗮𝗴𝗲𝗻𝘁 𝘁𝗵𝗮𝘁 𝗰𝗮𝗻 𝗻𝗮𝘃𝗶𝗴𝗮𝘁𝗲 𝗮𝗻𝘆 𝗨𝗜 𝗮𝗻𝗱 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗺𝘂𝗰𝗵 𝗯𝗶𝗴𝗴𝗲𝗿 𝘀𝘆𝘀𝘁𝗲𝗺𝘀! 📲

A team from NUS and Microsoft just released an agent that can act on any UI (Desktop, Android, Web) without needing additional text information. It works extremely well : they applied their method on a tiny Qwen2-VL-2B, and they managed to beat methods that use either much more powerful vision models (like GPT-4V) without using any additional info (e.g. leveraging the DOM of a webpage) like previous methods did ! 👏👏

They started from the idea that most existing methods rely heavily on text, which makes them less generalizable, while letting aside rich UI structure that user actually rely on when navigating this interfaces.

⚙️ They put several good ideas to work:

💡 Simplify screenshots to the max:
They prune a lot the heavy visual content of UI screenshots, by removing cloned image patches (like any vast patch of the same color will be reduced to a small patch, while maintaining positional embeddings), then group patches from the same GUI elements together to simplify even further

💡 Build a truly generalist dataset:
To train a general UI agent, you need trajectories from each possible UI, and express them in a common language. Authors merge datasets like OmniAct for Desktop, Mind2Web for websites, AMEX for Android trajectories to create a high-quality and diverse dataset.

➡️ Nice results ensued:
They fine-tune a tiny Qwen-2-VL-2B on their method, and it reaches SOTA on several task (element identification, web navigation), even beating methods that either use additional info from the DOM or use much bigger VLMS like GPT-4v! 🏆

And performance could certainly jump with a slightly bigger vision model. Let's hope the community builds this soon! 🚀

Paper added to my "Agents" collection 👉 m-ric/agents-65ba776fbd9e29f771c07d4e

reacted to MonsterMMORPG's post with 🔥 over 1 year ago

Post

3423

Full Fine Tuning of FLUX yields way better results than LoRA training as expected, overfitting and bleeding reduced a lot

Configs and Full Experiments
Full configs and grid files shared here : https://www.patreon.com/posts/kohya-flux-fine-112099700

Details
I am still rigorously testing different hyperparameters and comparing impact of each one to find the best workflow
So far done 16 different full trainings and completing 8 more at the moment
I am using my poor overfit 15 images dataset for experimentation (4th image)
I have already proven that when I use a better dataset it becomes many times betters and generate expressions perfectly
Here example case : https://www.reddit.com/r/FluxAI/comments/1ffz9uc/tried_expressions_with_flux_lora_training_with_my/
Conclusions
When the results are analyzed, Fine Tuning is way lesser overfit and more generalized and better quality
In first 2 images, it is able to change hair color and add beard much better, means lesser overfit
In the third image, you will notice that the armor is much better, thus lesser overfit
I noticed that the environment and clothings are much lesser overfit and better quality
Disadvantages
Kohya still doesn’t have FP8 training, thus 24 GB GPUs gets a huge speed drop
Moreover, 48 GB GPUs has to use Fused Back Pass optimization, thus have some speed drop
16 GB GPUs gets way more aggressive speed drop due to lack of FP8
Clip-L and T5 trainings still not supported
Speeds
Rank 1 Fast Config — uses 27.5 GB VRAM, 6.28 second / it (LoRA is 4.85 second / it)
Rank 1 Slower Config — uses 23.1 GB VRAM, 14.12 second / it (LoRA is 4.85 second / it)
Rank 1 Slowest Config — uses 15.5 GB VRAM, 39 second / it (LoRA is 6.05 second / it)
Final Info
Saved checkpoints are FP16 and thus 23.8 GB (no Clip-L or T5 trained)
According to the Kohya, applied optimizations doesn’t change quality so all configs are ranked as Rank 1 at the moment
I am still testing whether these optimizations make any impact on quality or not

2 replies

·

Eugene Siow

AI & ML interests

Organizations

Eugene Siow

AI & ML interests

Organizations

eugenesiow's activity