AI Logs
Posts
👩🏽‍💻 Open AI's Sora, Large World Models, Google AI's Genie, and MoodCaptureAI

👩🏽‍💻 Open AI's Sora, Large World Models, Google AI's Genie, and MoodCaptureAI

Plus: Prompt of the week, image upscaling, plus this week’s best tools and AI news.

February 28, 2024

Welcome back to AI Logs! Last week Sora (OpenAI’s upcoming text-to-video taking the world by storm prelaunch) only got a mention, but this week I want to break it down further, as there’s a lot to unpack from it beyond just incredible studio-quality videos. Additionally, Stable Diffusion 3 was just revealed, and its outputs are incredible, giving way to speculation that with more GPUs, StableVideo could be here soon, and rival the quality Sora is already delivering in countless demos surfacing everywhere.

But what makes Sora so remarkable is the MODEL that it is built on, namely: Large World Model (LWM). This is different than the common LLMs (large language models) that ChatGPT, Gemini, Mistral, Perplexity, Pi, etc are built on, and a potentially huge step for AI and humanity. Last week in the “Tools of the Week” I highlighted a paper by Runway introducing this concept of Large World Models that are trained on physics, reasoning, nature, and so much more. One such application is training simulations to train robots, another is sensor reading; even things like self-driving vehicles will benefit greatly from AI “understanding” and fully comprehending the world around it, and how it works.

In addition to LWM, the handheld device Rabbit R1 which was the darling of CES is built on another new model, the Large Action Model, which is intended to effectively understand user intent. Multi-modal models are obviously models that can process multiple inputs, and when we think about the combination of Large World Models with both Large Action Models and Large Language Models, the implications are staggering.

The second thing is Groq, a new AI company (not to be confused with Twitter’s Grok or Elon’s ex’s AI toy that hopes to read kids’ minds, also called Grok), and particularly their new technology, LPUs, which are a new type of chip, an alternative to GPUs, that are incredibly quick, engineered and fine-tuned for AI, and could be industry disruptors.

Did a friend forward this e-mail to you?

IE₊SUPPORT INTERESTING ENGINEERING
_{Invest In Science And Engineering}

Enjoy exclusive access to the forefront of AI content, highlighting trends and news that shape the future. Join a community passionate about AI, delve into the latest AI breakthroughs, and be informed with our AI-focused weekly premium newsletters. With IE+, AI reporting goes beyond the ordinary - and it is Ad-Free.

NEWS

🤖 Move aside Optimus, Figure AI’s Figure 01 is here
Check out Figure AI’s Figure 01 humanoid worker robot doing its thing around a warehouse.
💸 Mistral AI: Microsoft invests $16 million in OpenAI’s French doppelganger
The Microsoft-Mistral agreement draws the scrutiny of the European Commission.
😒 Study: MoodCapture AI-powered scans facial expressions to detect depression
But you’ll have to wait 5 years for this app.

MUST READ

Google’s AI, Genie first to be trained exclusively from Internet videos, crafts games

Tim Rocktäschel from Google DeepMind’s Open-Endedness Team announced the development of another artificial intelligence-inspired system–Genie.

Genie is the first-ever generative interactive AI application to be trained exclusively from 200,000 hours of internet videos. According to the announcement, the model can generate an endless variety of action-controllable 2D worlds from image prompts. This marks a significant leap in the world of AI.

OTHER IMPORTANT UPDATES

🧌 Using AI, political parties are bringing back their dead leaders
Over 60 nations are heading to the polls this year.
🔘 Samsung unveils ‘world’s fastest’ data processing AI chip to date
Samsung doubles down in HBM race with largest memory.

PROMPT OF THE WEEK

This week, I’ll do things a little differently and feature a tool that can help you prompt better, which I’ll also feature in the “tools of the week” called Say What You See by Google, in the Google Labs suite available via Google website or mobile app.

It provides AI generated images it created (despite not currently creating images due to the debacle of creating “too diverse” and not historically accurate depictions of humans lately, see “Image of Week” for more) and has you write prompts to try to recreate them. It scores you on a number of things, and is gamified so they get harder and harder to solve.

I consider myself to be a pretty good prompt engineer but it can be challenging to get images that are exactly what I’m going for, and this tool is already helping me level up significantly.

So, while this week’s prompt of the week isn’t a prompt itself, it’s intended to help you create superior prompts all on your own. Enjoy!

AI PICTURE OF THE WEEK

Ariel Zilber on Google Gemini, “generate a picture of one of the founding fathers of the USA.”

TUTORIAL

I’ll keep it short and sweet this week, how to dig into Google’s suite of AI.

The easiest way is via their mobile app, in which at the top left corner is a beaker icon that if you click it, opens up Google Labs (features above in “Tools of the Week”). Also, at the top middle of the app is a Google “G Search" with a red diamond next to it. If you click that red diamond it switches over to Gemini, and now you’re in Google’s competitor to ChatGPT. Pro tip: sign up for two free months of ‘Google One’ and get Gemini Advanced, their best version and comparable to GPT4, for free as well.

To access these from web/desktop, go to https://labs.google and https://gemini.google.com.