• AI Logs
  • Posts
  • 👁️ Meta's V-JEPA predictive vision model, AI race, Open AI's Sora and Eleven Labs

👁️ Meta's V-JEPA predictive vision model, AI race, Open AI's Sora and Eleven Labs

Plus: Prompt of the week, image upscaling, plus this week’s best tools and AI news.

A lot has happened in the world of AI since last week, so let’s jump right into it! Meta has raised several eyebrows, from revealing their V-JEPA predictive vision model, teasing a wristband that is said to “sense intentions” and allow the wearer to access AI with their mind (like a less invasive Neuralink), and Zuck came out after playing with the Apple Vision Pro and said that he believes the Quest is a superior product, regardless of price!

It seems as though Meta and even Apple are both finally ready to start unveiling more of their AI programs and products, as the “race” between OpenAI/Microsoft and Google keeps heating up. To quote Nelly, “it’s getting hot in here”. We are also seeing serious advancements in the world of autonomous AI agents, which I suspect OpenAI and Google (and all) are working on or eying an acquisition of in real-time. The Google/OpenAI race was evident this last week when Google announced that they have trained Gemini 1.5, and right afterward OpenAI revealed Sora, their new text-to-video/ image-to-video generation tool which, at least in all the video demos, is set to be significantly better than any/all other g-AI video creation products on the market. Oh, and OpenAI also announced that it is getting “memory” in the near future, meaning it will remember things for users even across multiple threads and sessions.

This past week we also saw Eleven Labs launch a feature where users can monetize their voices (train Eleven Labs to speak in your voice and let others use your voice to say whatever they want it to and pay you for it), and many other AI tools seem to be hyper-focused on letting users monetize their generations and work as well.

Did a friend forward this e-mail to you?

IE+ SUPPORT INTERESTING ENGINEERING
Invest In Science And Engineering

Enjoy exclusive access to the forefront of AI content, highlighting trends and news that shape the future. Join a community passionate about AI, delve into the latest AI breakthroughs, and be informed with our AI-focused weekly premium newsletters. With IE+, AI reporting goes beyond the ordinary - and it is Ad-Free.

NEWS

MUST READ

China, as per The South China Morning Post (SCMP), now generates half the power capacity of the US using renewables in the Gobi and western deserts, mainly solar and wind. Northwestern China's installed capacity nears 500 GW, reaching 600 GW with the Gobi Desert's inclusion.

Despite intermittency, over half of the region's energy facilities rely on renewables, boasting 95 percent efficiency. Northwest China, spanning 1.16 million square miles and including Xinjiang, historically underdeveloped due to its harsh terrain, now exemplifies China's renewable energy success.

OTHER IMPORTANT UPDATES

PROMPT OF THE WEEK

I try to share prompts that are platform/LLM agnostic for the most part, and have shared “Professor Synapse” through several lenses here as a “super prompt” in the past that’s effectively able to turn ChatGPT, Gemini, Perplexity etc into “agents”, but today I want to share a prompt hack (not specifically a prompt) that is, at least currently, unique to Perplexity, because it’s new, powerful, and consistent with the theme of Agents I brought up in the intro, as well as will be featuring in the ‘Tools of the Week’ section. And it is… Perplexity Collections (their foray into agents). To create Collections:

Library >> Collections plus sign (+)

Title >> Enter helpful text

Icon >> Add (resonant with prompt)

Description >> Enter helpful text:

Prompt purpose reminder

Prompt-prompt copy/paste solution

How-to instructions

AI Prompt >> Enter your desired prompt

Privacy >> As desired

Create >> Select

AI PICTURE OF THE WEEK

What if emojis could be enhanced by the power of AI? This is the challenge Dogan Ural took on with eight animal emojis yesterday, with impressive results. Earlier in the week, he also used Magnific.ai to enhance face emojis, but some of them were frankly disturbing. Animals are far cuter.

TUTORIAL

Here is one Google Gemini came up with for real-time immersive story generation:

Idea: Real-Time Immersive Story Generation

Imagine a world where you can be a co-creator of a story that unfolds in a virtual environment right before your eyes.

How it works

The Seed: You provide a short prompt to Gemini. Example: "A lone astronaut stranded on Mars struggles to survive."

The Web of Imagination: Gemini generates a captivating narrative, including:

Detailed plot development with unexpected twists

Vivid character descriptions

Dialogue that feels emotionally authentic

Visual Translation: Gemini collaborates with a visual AI engine (like DALL-E 2 or Midjourney) to translate textual descriptions into real-time generated 3D scenes.

The Immersive Experience: You're immersed in the story using VR technology.Scenes change around you based on Gemini's evolving narrative.

Dynamic Co-creation: You influence the direction of the story with decisions or new prompts for Gemini. The narrative and the visuals adjust accordingly.

Tutorial Outline:

1. Introduction

* The Power of Gemini: Explain Gemini's capability in advanced language modeling, multi-modal understanding, and generative reasoning.

* Visual AI: Concepts of text-to-image generation and 3D rendering.

2. Setting the Stage

* Tools: VR headset, Gemini interface, visual AI engine access

* Prompting Gemini: Tips for open-endedness and fostering creativity.

3. The Collaborative Loop

* Step 1: The Initial Prompt (Demonstration with the Mars example)

* Step 2: Visual AI Translation (Show the results in VR)

* Step 3: Gemini Extends the Narrative (Explain how new text is generated)

* Step 4: Your Influence (Provide examples of how a user can give input)

* Step 5: Loop Back to Step 2 (Show the dynamic updates)

4. Beyond the Basics

* Multi-character stories

* Incorporating sound design generated by AI

* Exploring different genres (fantasy, mystery, etc.)

Why this will blow minds

Merges language-based storytelling with a real-time visual experience

Showcases Gemini's adaptability and collaborative nature

Opens doors for revolutionary new types of interactive entertainment

Highlights how AI can empower human creativity

How did Gemini do? Please let us know at (our contact email).

Written by

Cory Warfield

LinkedIn Top Voice/Influencer in AI

what else?

🚨 For IE’s daily engineering, science & tech bulletin, subscribe to The Blueprint

🧑🏻‍🔧 For expert advice on engineering careers, subscribe to Engineer Pros

🔷 For all the week’s top engineering stories, subscribe to the Vital Component

⚙️ To explore the wonders of mechanical engineering, get your Mechanical

🎬 For a weekly round-up of our best science, tech & engineering videos, subscribe to IE Originals

For our weekly premium newsletter and an ad-free experience, sign up for IE+


Give Feedback