Why the Future of AI Runs on Data

Recorded: Sept. 9, 2025 Duration: 1:01:39
Space Recording

Short Summary

In a dynamic discussion on the intersection of AI and blockchain, experts highlight the critical role of decentralized data infrastructure in shaping the future of AI, emphasizing growth opportunities for Filecoin and the need for verifiable data solutions.

Full Transcription

Thank you. Thank you. Thank you. Hello, hello.
Test, test, test.
One, two, three.
I can hear that.
Excellent.
Am I coming in clearly?
Hi, Vogue. Can you apply the word success?
Yeah, yeah. I think we are all here.
And Erin, I'm going to deliver to you as you are the best moderator here so we can start now.
Okay, cool.
Okay, thank you.
You want to give a couple minutes for folks to file in or should we just start now?
I think we can start now.
Start now?
All right.
Yeah, well, I suppose we'll take a few minutes to introduce our subject, introduce our panelists, etc.
And then as folks file in we can uh dive into the conversation
um cool well appreciate the opportunity to be here and uh thanks to everyone for for showing up in
you know i guess whatever it's whether it's uh morning evening whatever part of the world you're in. But yeah, so yeah, so today's space is, I apologize, I'm like having a bit of like
a respiratory issue, having kind of a nasty cough. It's kind of peak dry season where
I am right now. And if I have like a weird coughing outburst or something, hopefully
I'll be okay. But I apologize in advance. But yeah, so today's space is called
Why the Future of AI Runs on Data, right?
And this is a really exciting topic
that is very relevant to the Filecoin ecosystem.
It's something we've been talking about,
well, for many years, I guess.
And I think it's one of these areas where,
you know, folks in Filecoin were talking about this
long before all the normie people were talking
about it, right? This is something that we've been talking about for well before Chai Chibati
came along and everyone started paying attention to this. So we're going to kind of dive into some
of the intersections of what Filecoin is doing and also why this is relevant for AI and talk about
some of the other use cases, talk about some of the other developments where there's overlap and perhaps like why this is such, why decentralized data infrastructure
is such an important concept when we're talking about the future of AI. So as I mentioned,
I'm Aaron Stanley. I'll be hosting the conversation today. I do appreciate the invitation to be here.
And maybe, I mean, maybe we can just start off with kind of a quick round of intros.
So our panelists here today, we have Vuk Vukoye of, I'm sorry, Vuk, I can never pronounce
your name like 100% correctly, but I guess it's, but like Vuk is, everyone, everyone
But VOOC is, everyone knows VOOC, right?
knows Vuk, right?
Everyone, you can go by Vuk.
So yeah, VOOC of Ramo, Web3 Mine,
which is really focused on bringing together people,
hardware capital to build more open internet.
VOOC's been kind of a major player
in the Filecoin world for a long time
in a variety of different roles,
which maybe you can give us some context on that,
kind of your journey in Filecoin land.
And then we also have Carson Farmer of Recall Labs, which was formerly Textile.
Some of you probably recognize the name Textile.
They've also been kind of mainstays of the Filecoin ecosystem for quite some time.
And Recall Labs is the team behind RecallNet, which is a decentralized network for agentic
intelligence.
So maybe, Vuk, maybe to kick us off, why don't you give us an introduction of yourself and
talk about your current project with Bramo Web3Mine and maybe give us some more context on just what
you've been doing in the Filecoin world and maybe beyond over the last couple of years.
and maybe beyond over the last couple years.
So I'll try to be super concise, but it has been all right.
So I did start doing a lot of mining back in 2013.
This was before Ethereum, so mined Litecoin a bunch.
Made a company around that, sold that company.
Then in 2017, I started building dev tools in the Ethereum ecosystem,
so I built a lot of EVM stuff. Then I led smart contracts for Cardano and then ultimately
a joint PL, where I led a bunch of the stuff around miners and basic infrastructure. Today,
what they do with Ramo is we build a protocol that basically abstracts away the complexity of doing Filecoin,
whether that is in the context of locking liquidity or in the context of storing the own Filecoin.
We abstract this complexity with different products,
and we basically allow users to either store their massive amounts of data, like hundreds of petabytes,
or stake liquidity without having
to actually do the mining, or basically provide infrastructure without having to do the ceiling
and so on.
Ultimately, we are trying to reduce the complexity of Filecoin and allow, like, more people to
do Filecoin.
And then, Carson, maybe we'll turn it over to you.
You've got a super interesting background. You've got an academic background as well. So I'd love for you to tell us a bit about yourself and what you guys are working on. Give us some background maybe on textile and Recall Labs as well, if you would.
Carson. I am a co-founder and CTO at Recall Labs. And so Recall is the on-chain arena for
evaluating, ranking, and rewarding agents. That's our tagline. I can go into what all that actually
means later. So yeah, in a past life, I was a university professor. I was at a couple different
places, but more recently, the University of Colorado Boulder.
And I left that way back in 2017, around when we joined the Filecoin and IPFS ecosystem.
And back then we were doing on-device machine learning stuff way before it was cool. And nowadays I lead a lot of the like R&D and engineering at Recall where we're sort of
designing and building the core systems to try and make AI evaluation systems more transparent and
rigorous and community powered. So we spend a lot of time thinking about how do we measure and govern AI systems and how do we capture and create audit trails to make sure that humans and AI are staying aligned.
We spent a lot of time thinking about that.
In a separate past life, yes, I was with Textile or still am with Textile.
textile or still am with textile we just rebranded a bit um and so we've been in the filecoin and ipfs
We just rebranded a bit.
ecosystem or the pl network for a long time since just around just before the filecoin ipo
and we've been building sort of various different developer infrastructure um You may remember things like Powergate or the Filecoin Bridge or
Bidbot and all sorts of different developer tooling that eventually kind of ends up getting
rolled into Filecoin in some way, shape, or form. And then most recently, we were engaging a fair bit with IPC and some of the Filecoin scaling technology.
And so, yeah, happy to be here chatting about data.
It's always kind of a nice topic and we've got a lot to talk about.
So looking forward to it.
Cool. Thanks for that.
Isabella, did you want to introduce yourself in the Filecoin TLDR team at all? Or you're hosting this. I always want to give you the chance to plug yourself if you so desire.
but just share a bit context of what Filecoin doing
and Filecoin TRDR.
So everybody here is now Filecoin doing storage
for all the layers for AI and DP layers.
And also after we launching FVM
and we can get bridging with different multiple cross chain
with other ecosystems.
So that's super great.
And for Falcon TRDR channel,
everyone can just follow on Falcon TRDR
to get all the news and the new ecosystem updates
from Falcon DR.
And also you can take it as investor views.
You can read all the data metrics
from the Falcon, chain, ecosystem, and all the third party research report,
which you can use for your investor perspective,
and you can use to make your decision,
and you can know it's a customer insights
for all the Falcon fields with different ecosystems.
That's all, but hi everyone.
And I want to join today's AI panel space,
but looking forward to Carson and Aaron and Vogue to share all your insights about AI with Filecoin.
Thank you. Cool. Yeah, thanks, Isabella. Thank you for that information and thank you for the
opportunity to be here and for setting this up. And yeah, definitely check out the Filecoin TLDR handle website.
They've got a lot of really good information. They kind of like dive a bit deeper into Filecoin,
some of the kind of more complex problems. And they do a really good job of kind of explaining
some of this stuff in plain English, let's say. Filecoin sometimes can be a little bit complicated
and maybe intimidating for folks who aren't super technical,
but they do a pretty good job of breaking all that down.
And then maybe I'll introduce myself really quickly last,
but probably least,
because I'm probably much less knowledgeable
on the subject than you guys.
But so my name is Aaron Stanley.
I am currently editorial director at
Filecoin Foundation, where I host our DWeb Decoded podcast, which is a podcast that really tries to
explore not just Filecoin, but we also talk about a lot of things that are kind of what we call
Filecoin adjacent, things that are not Filecoin specifically, but are very relevant to Filecoin
or where there's overlap with Filecoin. And I think this subject today is definitely one of those areas for sure.
Before this, I worked at CoinDesk for five years.
I was a reporter, editor.
I produced the consensus conference for a few years.
And then I made the jump over to Filecoin a couple of years ago.
Filecoin Foundation, I should say.
And then before that, I was doing mainstream media,
political reporting in Washington, DC. So I've sort of moved on from that into crypto, which is
maybe... Anyway, it's been an interesting transition, I guess we'll say.
But anyway, onto the subject at hand. I I think so the subject that we're trying to tackle
today is really a core challenge in AI, which is data.
And we all know, or I'm assuming all of you, most of you know that by now that AI models
and agents are really only as good as the data they're trained on, right?
And right now there are a lot of questions around accessibility verifiability of data that's
for example that's like kind of locked up in some of these big tech silos creating some risks around
things like transparency and reliability bias various entry all this kind of stuff and um i
think in in the filecoin world we you know we like to envision a future where these AI systems do involve
centralized data infrastructure in some capacity, where things like storage, provenance, access
are more open, are more verifiable, are more resilient due to the decentralized nature.
And Filecoin network obviously offers this decentralized and verifiable data storage infrastructure.
So maybe let's kind of dive in.
Maybe I'll turn it over to Vuk to kind of kick off with, but maybe just talk from a high level of like, from your vantage point and from the Ramo team's vantage point.
you know, from your vantage point and from the Ramo team's vantage point, like, how do you guys
see kind of this overlap between, like, why is there a need for something like what Filecoin is
building in the context of AI? Let's just maybe we'll frame it like that. Like, why is this
decentralized data infrastructure, storage infrastructure, such an important component
of like the future of AI moving forward like we'll
start kind of a high level maybe we'll drill down a bit sure I would say generally like very few
organizations networks or communities have had the chance of getting to exabyte scale let alone
tens of exabytes which has been been shown in the context of Falcon
in a provable way, right?
Like you have PowerApp that is basically showing us
there is like 10, 20 exabytes of storage
that was allocated towards Falcon.
Now, historically, Falcon has had difficulties
leveraging this capacity.
So a lot of this capacity is basically, let's say, more inactive
because it's just committed capacity.
But ultimately, the point is that we were able to really show
that it's possible to aggregate tens of exabytes of storage.
And that has not actually been such an easy task
in the centralized context.
So if you think about a particular data center,
for example, data centers that the big AI labs are building today,
even those data centers will struggle to actually do 10 exabytes of capacity.
So in a way, what we're seeing is that scaling in a centralized context is becoming impractical.
And this is mainly caused by the fact that it's really impractical to get massive amounts of energy in a particular place,
let alone for storage that also, other energy is also kind of massive like it does
take a lot of space as well it does take a lot of um it's uh uh super heavy so like hard drives are
really heavy like if you put them in racks like these things have like tons and ultimately like
you need to put them somewhere so it's really hard to actually stack them on multiple floors so very often you need to kind of have like a very wide like area where like these racks are spread
around uh and uh and yeah like uh that's one thing which is basically we showed that uh uh
it's possible to actually create like this incentives that allow the communities to actually bring together a lot of storage.
Well, on the other hand, we are seeing that putting more compute towards training or like reinforcement is not actually like linearly benefiting the quality of this, of the new LLMs that are being created.
the quality of the new LLMs that are being created.
And ultimately, the next thing is going to be like,
how do we actually find more high-quality data
that we can actually use for training better and better models?
Or how do we actually have, yeah, basically higher density of data
in the sense that you could have a video that is 480p and you can have
the same video that is 4k and there is so much more information in the 4k video and how do we
get to a point where instead of like always are having 480p how do we always are high like 4k
and how do we actually make sure that we can actually store more
of this yeah if you look at all the data generated today like it's massive but most of this data gets
trashed this is everything from logs of programs that actually run all the security cameras most
of the security cameras after a while they basically delete the videos and many others.
So like most of the content is actually getting deleted.
And what we've seen with Filecoin is that with the incentives that it has created, we
were able to reduce the cost of storage by order of magnitude, which basically allows
anyone to just keep storing data without thinking as hard
as they had to think before,
like if this was on AWS or something like that.
But yeah, ultimately we showed that it's possible
to scale storage in a very horizontal way.
And on the other hand, there is a big need
for actually storing more data than data
that has been stored today,
because we kind of just
all the data there is and now we need to figure out how do we actually change the way that we are
actually collecting there possibly collecting there that is not being stored today I'll just
pause there yeah well a quick follow-up there I mean it's interesting that you mentioned
um you mentioned a point of how you just just basically adding more compute just throwing like
more compute more gpus at these llms doesn't necessarily improve the outcome uh if i understood
your point correctly that you're making um i mean i you're i guess you're hitting maybe like
diminishing returns with regards yes yes yes like and um so it's so it's not just so if i'm if i'm
kind of reading between the lines of your point here,
and I'd like for you to maybe elaborate on this,
but like the solution here isn't just like throwing more GPUs at the problem,
but the solution here is like, how do you get more data?
How do you get better quality data ready or available
and ready to be actually used in these models?
Is that kind of the point you were making or is that what you were implying?
So there are basically two functions.
One is the GPU compute,
which has two parts.
One is the interactive one,
which is basically just in time.
So that's the reasoning part
that needs to be done for every query.
And then you have the other part,
which is basically the training, which is kind of shared by all the users, because ultimately one model gets used
many times. Now, what we're seeing is that the training, we're kind of hitting a point where
we are definitely not linearly getting more benefits. What's happening now is that we
are throwing more compute on the reasoning side of things just to increase the quality, but that
doesn't scale super well because you need to put more reasoning into each basically query. So like
you need to do that for each user. At the same time, on the data side uh we have yet to kind of scratch the surface uh and uh there
is definitely like a linear benefit uh and and it's also easier uh to do in the context of like
uh i mean not really easier but like it's something that has like more impact if we
focus on instead of just throwing more g GPU at the problem and hoping for the best.
Got it. Maybe Carson, let's turn it over to you. Would love maybe your thoughts on the kind of the
big picture question I posed earlier, just kind of what do you see as the overlap between maybe
Filecoin, Filecoin's mission, and then building with recall. And then if you want to react to
anything that Vuk was saying,
we'd love some of your thoughts on that as well.
Yeah, cool.
Thanks for the prompt.
I mean, I think I can come at it from a slightly different perspective,
which is useful in the context of a discussion,
which is our team and our research is focused less on sort of like the raw data that goes to like actually drive
and build up these foundation models and more on the sort of like other side of the equation,
which is dealing with all of the outputs from these models, whether it's like reasoning outputs
or like actual like the raw text and tool calls and engagements and things that agentic systems are producing.
Because you can think about it like in general,
these foundation models are the models that are,
that's like a broad class of LMS and
multimodal models that are trained on just like general corpora.
This is like your GPTs and your clods and your llamas.
And these ones are designed to be
a sort of like raw intelligence of the system.
And as Vuk mentioned, like we're getting to the point
where we're starting to see diminishing returns on compute.
And in a lot of senses,
those foundation models are like effectively
starting to commoditize.
They're competing on price and they're competing on just general usability and utility.
And so then a lot of the interesting innovations need to start happening elsewhere
because, like Luke mentioned, just sort of diminishing returns.
Data collection is a hard problem. Data collection is like a hard problem.
It's always been a hard problem.
Data quality is just hard because it's easy.
Well, it's easy.
It's easier to just try and do a general crawl
and get as much data as possible
and then dump it in and hope for the best.
It's a lot harder to curate
and then even manage that curated data set.
So the foundations because of that
are sort of like arguably um uh commoditizing so from our perspective the the interesting thing
starts to be like oh okay well um then why does a system like filecoin need to exist if um in a
world where we want to think about capturing the outputs of these models.
Well, part of the reason for that is, as Vuk also mentioned, we want to capture more of the data
that we're creating. And by and large, you know, from this point forward, most of the data that is
being created, at least in terms of like human and computer interactions, is being created, at least in terms of human and computer interactions,
is being created by these models
and the agentic systems built on top of them.
And so harnessing that is really useful,
but it's not just useful in terms of capturing the data
and being like, oh good, we got that.
Let's think about what we can do with that later.
Systems like Filecoin are useful
and other similar systems
because not only do we have the volume
to capture all of the data,
but we can also do things like record the provenance
and the structure and things like that of the data.
So we can actually verifiably say,
okay, this model was run at this time with this prompt
and this set of tool calls and blah, blah, blah, and it produced this output. And whilst that
particular piece of information maybe isn't useful in the context of that one interaction
with the underlying LLM, in the future, it's going to become increasingly useful. And I think a big part of it is like, you know, yesterday's AI kind of produced just raw text.
Tomorrow's AI systems are producing like transactions and predictions and making decisions that flow through like real markets and supply chains and governance systems and all these things.
flow through like real markets and supply chains and governance systems and all these things.
And if we want to be able to sort of like track these decisions, we need some way in order to
like actually capture in a verifiable way those outputs. And that's been the perspective that
we've been taking a lot is like, look, my team isn't going to have a direct impact on foundation models.
A lot of this stuff is happening in labs.
A lot of the data is proprietary because data quality is so important.
We can try to have an influence on things, and we can work with decentralized training systems.
Prime Intellect is an awesome example. There's a lot of like DAOs and protocols and
teams that are working to collect, you know, high quality data in a way that's, excuse me,
verifiable and for and of the community. But by and large, the big foundation models are happening
in fairly siloed systems. But once we unleash those models, it's very useful
to be able to capture their outputs and do something useful with them. And so that's the
kind of perspective we're coming from. It's like most of the data of the future will be produced
by these models. And so building infrastructure and systems to capture that in a verifiable way is useful.
We see this in other contexts as well.
The outputs are more realistic to capture than inputs.
Our team works a lot with agentic AI builders.
In particular, we've been working with a slew of developers who build
trading bots or trading agents that are actually trying to optimize
P&L profit and loss and things like that over different timescales.
In a lot of cases,
their inputs are pretty proprietary information because they're trying to
build a business around it and they don't necessarily want you to have access
to either the training data that they're using
to fine-tune their own models
or the actual price feeds and data feeds and data sources
that they then actually run through the models at test time.
And so that sort of information they keep close to the chest.
They may be leveraging decentralized storage
as a backup in an encrypted way.
But in terms of open network storage,
they're not leveraging it too much in practice.
But the outputs of those systems are often either on-chain actions, so it's right there, easy to capture and see, or interactions with clients and things like that, which is instantly out of their hands.
And so they don't have any, you know, they're not pretending to have control necessarily over those outputs.
So capturing that and leveraging that in a useful way is helpful.
and leveraging that in a useful way is helpful.
And then furthermore, capturing that and storing it
and then leveraging it later to help with evals and fine-tuning
and even in some cases in-context learning is actually super useful.
And so just to finalize that, evals, evaluations,
this is sort of like a way to do I don't know if a simplified
explanation is this is a way to do unit testing on model outputs because the LLMs are you know
probabilistic systems so unlike more traditional code where you kind of like, ideally you get, you know, for a given input, you get a given output.
With LLM's, LLM-based models, it's probabilistic.
So like a given input will produce similar output, but not necessarily the same.
And so we have to change our testing framework a little bit.
And the testing framework that we call that is called evals.
No, thanks for that. There's a lot of really interesting information in what you just mentioned
there. And I was trying to take notes, but there's like too much good stuff. And I kind of
ran out, I kind of lost track here. But like, but like one point that you think you made that was
really interesting to me was, I think maybe it's kind of like a kind of an overall point that you're making is that the focus, your focus is really on the outputs and making sure that we can track
these out, these model outputs and like a verifiable, that the providence of these,
of these outputs, et cetera, is going to be very important in the future.
And it's also much more kind of realistic to track these rather than trying to track the inputs into the models just because most of these things are being used under proprietary systems, right?
So people don't want to necessarily say how they're training their models, et cetera, right?
Maybe in some instances they do, but if it's a proprietary business, they probably don't, right?
But we can track the output.
It's a lofty goal and it's a good goal at both ends, right? Like, really what I want is an LLM trained on, like, open data that I can know and in theory inspect.
And that is the ideal.
But from a, like, practical perspective, it's not always the case that it's easy on the input side.
the input side.
And then, but I want to double click on the provenance question because I thought that
was really interesting what you raised there in that, you know, right now, like, yeah,
we're using AI systems for, you know, it's like we ask ChatGPT, like, okay, what should
I order for dinner tonight?
Or like, make me a cat photo, whatever.
Or like, we have these kind of, these like, you know, agentic reply bots on Twitter or
whatever. These things that aren't like necessarily of, you know,
really great consequence.
But in the future, as you were saying, you know,
if these things are gonna be doing transactions,
if there's going to be major decisions that are being made based up by these
agents, there needs to be some way of really like tracking these outputs with,
with consistency, with verifiability, just for obviously future learning, but also like having,
you know, if something goes wrong, we actually can kind of look back and know like what went
wrong there, right? Maybe I'll punt it over to Vuk, but I'd love your thoughts on this.
Love your reaction to like, I guess, anything that stood out from Carson's remarks there,
but also, you know, how are you guys thinking about this question of provenance and why this is so important?
So basically, Filecoin by default provides a way to uniquely identify a particular piece
of content.
So like basically even the proofs that are being sent on chain for a particular sector,
every 24 hours are basically saying, okay, this piece of data is actually here.
There is another piece which is basically connecting the dots between a particular data set
and the set of sectors that are being onboarded to the network with a particular CID.
onboarded to the network with a particular CID.
And yeah, ultimately we are really like trying to like
rely a lot on like all these content addressed pieces
because what we're seeing often is that our clients
are basically asking for just getting a sense first of like,
is data actually there?
But also in the context of them,
like proving to their users that they use a particular piece of data
and that that piece of data was, for example,
in a particular jurisdiction and so on.
But yeah, like basically by default,
like the Falcon Network provides this abstraction and we are definitely relying on it.
Although, to be honest, we are focusing a lot on just making the use cases work first, and then adding the benefits of provenance and other ones, instead of making that the main feature that we're offering on our infrastructure.
So yeah, TLDR is like,
we're focused more on trying to push it
to tens or hundreds of petabytes.
And then we'll enrich these features
that allow our customers to take more advantage of the features
that the network provides.
Cool, cool.
And then maybe it's a good point just to mention
that we will have some time for Q&A at the end.
So if folks have questions or comments
or anything that they want to raise,
feel free to have a think about those at the moment.
And then we'll open it up maybe in like 10 to 15 minutes
or so for questions from the audience.
So folks, so have a think.
Maybe Carson, let's turn it back to you.
And I'd love to kind of loop all this
of what you're talking about back in
with what you guys are building with recall.
And you kind of gave the high level elevator pitch of recall in your intro remarks, but
it'd be great if you could maybe tell us a bit more about like, what is this sort of
agentic intelligence concept mean?
How are you guys deploying that?
And then maybe kind of looping that in with some of the other topics we've been discussing
Yeah, yeah yeah sure so um i mean yeah i'll try and keep it focused on the the sort of like uh
ai and data um category but uh by and large recall is is is an agent arena where we test a lot of
capabilities on agents and we have a couple of ways of doing that um but the broad strokes uh
framework is imagine a world in which people could decide okay uh great a new foundation model has
come out but i want to know if it's actually good at not overusing m dashes right like we've all we
try you know you try to get it to write you some copy for a tweet and it just sticks all these dashes in there.
So I don't want that to happen ever again. And I have tests.
I have ways to test whether this model outputs or overusing dashes.
And so but I can't possibly write and sort of control it enough myself.
So I want to leverage the ecosystem and communities to help me build ai systems that
never overuse m dashes and so i want to deploy this particular test or set of tests um that i'm
going to like deploy to the network and start running competitions against all sorts of different
llms and agents and testing them on inputs and outputs,
producing those outputs,
and then evaluating whether they're overusing MDashes.
By the way, I'm using this MDashes example
because it's a bit silly, but it's easy to think about.
It's not exactly the most important test in the world,
but it's one you can wrap your head around.
So we deploy these tests,
we run them against lots of inputs and outputs.
Users and AIs evaluate those models and they determine,
yep, it wasn't overusing or no, it was overusing.
We build up a scoring system and
a ranking system that actually ranks these different systems.
It turns out that the latest models,
so ChatGPT5 is actually not that great.
It kind of overuses MDashes.
But you know what model doesn't overuse MDashes?
A recent coding model kind of makes sense
because it's optimized for coding,
so it doesn't have a lot of examples of MDashes
in its training set probably.
So if you ask it to write prose,
it actually does a pretty good job because it's
trained on technical content and code and data.
So you can start to build up
this intuition over time of which models are really
good at particular things by deploying it to a network of
users and having them engage with
it and rate it and rank it. The most successful of these that we've done so far is trading P&L.
And there's a couple of really important reasons why we started there. One, Web3 people kind of
dig it, so it's like a fun example to think about. But two, one of the problems with a lot of the foundation models is that they are trained against these static benchmarks.
And the benchmarks are awesome, right?
Like we need these benchmarks to help us understand how good is this model at a particular concept or is it good at abstract math?
Is it good at multi-step reasoning?
We can craft these benchmarks that we run
against these models to try and test,
okay, this one is fractionally better than
this one at that particular skill that we're measuring.
That's great. The problem is,
these are static benchmarks.
What happens is you start to test these.
It would be silly if
they didn't do this, right? This is a bunch of data. And we were talking about how we're, you
know, that's actually data is the hard problem, right? Curated data is the hard problem. And
benchmarks are like perfectly curated data. So obviously the models are going to be trained
against them. And so this is really similar to just like,
you know, a teacher basically saying,
OK, students, I'm going to test you later,
but here are all of the answers and questions ahead of time.
Go ahead and study those and then let's see how well you do.
And so obviously the students are going to do a lot better
if they've got all of the questions and answers ahead of time.
So what we really need is we need dynamic benchmarks, benchmarks that change all the time.
And so we started with a really intuitive and simple one, which is trading P&L.
So how good are these agents at determining stops and losses or trading on an open market?
This is obviously dynamic because it's very difficult to predict market movements.
And frankly, if they could predict market movements ahead of time,
they would just quietly stop competing and go off and start making a swag ton of money.
So, you know, it's pretty easy to be certain that they're not going to be able to optimize ahead of time
for a particular trading scenario. So this is a good way to test. It's very dynamic. It's very objective in terms
of how we measure it. And so it's a great example of a dynamic benchmark that we can start to build
up and test and then score and create rankings for. And so that's what we've been putting a lot
of effort into, and that's what recall is building up.
Now, this is pretty useful because we have now a ranking and scoring system that's fairly objective that we can use to track how well particular agents or LMs are at a bunch now. We've done trading is one, but we've done, we actually have done the M-1 and we've done how well are these models at, you know, delivering bad news in an empathetic way. We've done lots of different sort of subjective and objective tests and you start to build up this ranking.
that ranking system is also something that you want to be available to other systems because
it's useful to be able to explore like oh you know if i'm building a uh an agentic coding system
what are the best models right now uh for focusing on i don't know uh solidity or i'm building a
or I'm building an application that helps doctors be more empathetic.
What underlying model should I leverage to help doctors come up with ways to deliver bad news?
What about a model that's good at delivering good news. And all of these systems help, or sorry, all these rankings help these
system builders better understand which tools are good at the things that actual people care about.
Because I like to see which models are good at abstract math, but it doesn't really actually
help me in my day-to-day usage of these models. What really helps me in my day-to-day usage of these models is, does this model produce textual output that doesn't sound like a robot?
Can it handle multi-step tasks for writing code and much more practical things?
and much more practical things.
So yeah, we want to be able to build up benchmarks
that are testing very practical things,
and that's what we're doing with Recall.
And so inputs and outputs of all of these systems
are really important sources and syncs of data.
And you can do further training on that data.
So recent research has shown that even if a model is already trained,
if you actually feed it feedback in the form of like ranking or scoring,
it can actually produce better responses based purely on its in-context data.
And so for people who aren't familiar with in-context.
So basically you train a model,
and you have this large context window
within which you can feed it input tokens.
So input text, right?
Prompts, as everybody knows it.
And we also know if you do a good job of prompting model,
you will get a better output.
And it turns out sometimes it's very hard
to come up with a really good prompt.
But what you can do is you kind of train in that context,
in that prompt window.
And you can ask it the same question multiple times,
rank its responses, and then ask it one more time
and you'll get a better answer.
And you ask it another time,
you'll get a slightly better answer.
And so you can do a lot of this sort of training stuff in context,
and then you build up a data set of, this is called test time training,
that you can then actually leverage in real live systems
so you don't have to wait for the foundation models to be updated.
So there's lots of ways in which we can actually like
rank test and capture the data that these systems are producing like in the real world to improve
our agentic systems like trading and and writing and all of these things uh that was a bit rambly
but the the sort of like yeah the take-home point is there's a lot we can learn and do from capturing the outputs of these models and scoring and ranking them and understanding if they're actually solving the tasks that we need them to solve.
this, this mission, because you're really focused. If I'm hearing you correctly, it
seems like the focus is really on just trying to make these, these things more just like
reliable and usable and useful for just like average, like real world use cases. Right.
And I think we've all probably experienced, you know, chat GPT hallucinations or, you
know, hallucinations from other AI models. Like, you know, just last weekend I was going
out to dinner with my wife and I asked chat GP Chibi T like, what's a good restaurant we can go to. It suggested a place.
We went there and it was closed. Like the place like didn't even exist anymore. Um,
you know, I was like, okay, I, you know, like, you know, you kind of have to train yourself
to like double check these things. But in that case I was at that moment I was being
kind of lazy. I was like, okay, it's probably correct. Let's just go. Uh, and it, the place
didn't exist. So, um, but like, and obviously like, you know,
we're talking about, these are like, we're not outsourced. We're not to the point where like
outsourcing major decisions to like AIs, right. We're talking about, you know, just sort of
help me with this line of thinking, help me with, you know, create a prompt or help me create like
a, you know, a text for an article, help me create a code, you know, you know, some code for this app.
These aren't like, these are just like, it's like a sandbox they're playing around with right so
um so i really like how you're you're thinking about this in terms of like how can you make
like the outputs of this just more like easily trackable and like you know what models are better
at uh different things which are they more reliable at like you know different use cases
different types of prompting etc etc um so i think that's that's reliable at like, you know, different use cases, different types of prompting, et cetera, et cetera.
So I think that's, that's like a key, you know, a really, really key thing.
So, you know, thank you for, for explaining all that.
Vuk, I'd love your, you know, maybe your reaction to any of that.
If there's anything that stuck out, anything you want to chime in on and maybe, you know,
How does that kind of fit in at all with what you're building with Ramo?
how does that kind of fit in at all with what you're building with Rommel?
Yeah, I mean, ultimately, we are always trying to think it from the fundamentals.
And yeah, there are a few stages that you always want to go through.
One is the one that we mentioned, which is basically just collecting the data, like in whatever shape or form.
And then you have a few steps where you would clean this data, like you would label it in a particular way.
And then like you would input that into like either training or reinforcement for a particular model.
So, yeah, we are kind of thinking about like how do we actually like use the tag that both the Falcon ecosystem has enabled us but also like more of the zk innovations that we've seen in
the past couple of years to allow like use cases that or naturally would have been like in large
data centers all centralized in one particular location how do we allow this to happen in a more decentralized context,
maybe between multiple data centers?
Initially, maybe these data centers would need to be connected in the same area.
For example, if this is North Virginia,
like the data center LA.
But how do we get to a point where like you can scale uh all of these to multiple data centers
and allow like larger models uh to be created or more data to be harnessed uh for for actually
improving these months cool cool um now i want to touch on uh an earlier point that we addressed, which is kind of on the input, the training side.
And I think when I was first starting to research this whole subject a couple of years ago and researching the overlap with like BioCoin and with AI, LLM models, et cetera,
I was really interested in this idea of like, wow, this would be, you know, I see, I think
Filecoin really has like maybe a product market fit here because in the future, like you're
going to be wanting to make sure that you're trading your models off of like very pristine
Like I want my, I want this data to be, I don't want, you know, if I'm trading a model, I
don't want this just to be like junk off the internet.
I want this to be, I want to like know that this is verifiable, like real data hasn't been tampered with, hasn't been, you know, like
it's not just synthetic to junk data or something. And Filecoin allows you the ability to basically
guarantee that, okay, this data that's being stored in this place is like cryptographically
sealed. It's, it's, it's secure. It hasn't been tampered with, hasn't moved, hasn't been
altered in any way.
And so my initial thinking was like, wow, that could be a very valuable thing for folks that are looking to train, you know, models in ways that are they can basically be sure that, OK, this is this is going to give me the results I'm intending.
the results I'm intending. Sounds like that path from what we were discussing with Carson earlier
has maybe been a bit more difficult than maybe, or maybe it's easier. It sounds better in theory
and it doesn't practice perhaps. But I would like to, and just given like the dominance of the big
players in this space and kind of the big silos and whatnot, it feels like, okay, this is probably
not a utopia that's going to be coming to pass anytime soon but like i do feel like there's probably at some point in some way shape or form
there's a future where like this there's going to be clients out there there's going to be folks out
there who will want this um this level of like pristine data this kind of guarantee that the
data is pristine and um and i'm just wondering maybe like is there i mean at what point would we
what would would it what would it take for us to get to this point where like
systems like filecoin become there's like a premium on this data that's thrown a file coin
because it's like it has this this it's been sealed we like we know it's we know it's
vera it's true we know it's it's verifiable, et cetera.
We'd love to maybe pose that hypothetical.
Maybe I'll punt it to Carson and then Vukifu and chime in as well.
Well, I don't know.
I don't know when we will be able to say like, yep, cool.
We did it.
We're there now uh i i have a feeling the answer is
somewhat political and fairly far and it's going to be a pretty contentious debate and discussion
you know it seems clear that we're going to have like sovereign AI and we're going
to have like open source AI and we're going to have closed source hyper optimized AI and we're
going to have these systems and there's going to be fights and arguments between groups on
how it needs to be done. You know, and actually Filecoin stands to benefit from either side of these discussions.
If it can provide verifiable storage of the underlying data within specific geographic
or socioeconomic regions, that's going to be useful for sovereign AI. If it can store it in
open, verifiable ways, that's going to be useful for open AI. If it can store it in open, verifiable ways, that's
going to be useful for open source AI. And if corporate entities can ensure that their
data is compliant and secure and backed up in many jurisdictions or whatever, then that's
going to be useful as well.
So I think if the Filecoin ecosystem can prove that it's providing the real specific value that each approach to underlying data in the LLM or the models is trying to get to, then I think it wins.
I think it's probably not really going to win until we get to a scenario where we can co-locate compute with the data.
Because if you're doing a model training run on exabytes of data,
and you have to move that data anywhere,
that is a major cost in terms of literal literal infrastructure cost and and like money but also
in terms of time right like every you know uh meter further that the data has to transfer over
the wire is you know translates into some amount of time um because physics so i think like the
winning combination is like being able to verifiably address different jurisdictional requirements for these different models and then being able to then like co-locate the data with the compute.
But that's maybe that's my hot take for today's discussion.
Book, do you want to chime in there? discussion.
Vuk, do you want to chime in there? Vuk Kususmanis Yeah, I mean, like, I feel that we need to focus first on the base primitives
and scale it to the point where like it becomes relevant in the context of Web2 and where Big Tech is today.
And then we want to basically make sure
that we build the tooling to enable a particular set
of use cases.
I do think we are still in the first phase
where we're basically trying to make it work well
for like 10, 100, maybe one one exploit uh skills uh from there basically amplifying the
value that we've created with tooling uh is a easier uh prone to solve uh but uh but yeah like
that's something that needs to happen now it's good mentioning that uh there is a gap uh generally in the ai space right now
which is basically like everything around they are prepping uh like we've seen acquisitions from meta
of like um scale ai and a few others uh which basically first centralized the data prepping side of things, which kind of left a vacuum of particular companies that are not able to use,
of course, tools from competitors.
So yeah, I do think there is space and we are well positioned to actually attack some of those.
But there is also a lot of work to do on the infrastructural side, which is basically
making the tech work at the scale that is relevant for these use cases.
Well, maybe, yeah, we'll open it up for questions if anybody has them.
And then while we wait for folks to raise their hands,
maybe I'd just like to turn it back to both of you
if you have any final thoughts
or like other points you'd like to make
that are germane, relevant to this conversation.
I guess we've got a couple minutes left.
So maybe Vuk will turn it back to you
if there's any final thoughts you wanted to add.
Yeah, I'm super optimistic.
I feel that we are in a very rare time in history
when things are changing very fast.
Like it kind of reminds me of times
where Hadoop was invented
and a few things that Google open source
back in the days during the,
I think back then,
like it was called Big Dea
or something like that.
But yeah, ultimately, these are rare moments in history when like we can actually define
like how the tech gets shaped.
So I'm super excited about that.
I think there is a big like potential for Filecoin to solve a big part of these problems.
Yeah, I would say likewise.
I think I actually recently had an internal discussion with my team,
and we were kind of saying it can be a little scary in the post-AI world,
but honestly, it's never been a better time to be a builder.
The leverage that you get right now
is just so much greater than I think ever before.
And so we're at a pretty interesting point in time
where you can have a greater impact
on the future of software and digital experiences
and all this stuff with a much smaller team
with access to the right tools.
And for sure, data is going to be the new,
you know, sort of like coordination layer for all of this.
Models are going to come and go.
But, you know, if we can help build the systems
that ensure sort of like open, fair, transparent
and persistent access to data.
That's definitely going to define the next decade of AI.
And so we're at a pretty interesting crossroads or like intersection here
where we've got like real incentive alignment mechanisms that we can leverage.
We've got like real data and real access to large volumes of data and capture tools for that data.
And then we have the same access to the same models
as just about everybody else.
So that's a pretty powerful combination
and I'm pretty excited to see what we unlock with that.
Very cool.
Well, if there's no questions from the audience,
maybe this is a good place to wrap it up.
We're about two minutes shy of the hour here.
But yeah, I wanted to appreciate everybody or just thank everybody for spending the hour
with us here.
It's a really interesting conversation.
I mean, definitely just scratching the surface, I think, of what the implications are.
And it's really great to hear from two folks who are really kind of building at the frontier of this whole kind of data meets AI decentralization realm here.
So we want to give a shout out to Isabella
and the Filecoin TLDR team for hosting this and arranging this.
So big thanks for that and for the invitations.
And thanks everyone for listening to us here today.
And be sure to give us a follow on Twitter if you wouldn't mind or a follow on X here.
That would be appreciated, helpful.
But yeah, maybe Isabella, I'll hand it back to you if you want to close us out.
Okay, thank you, Aaron.
And thank you, Carson and Vogue for today's AI space. We have a very awesome space
and a lot of insights shared from your guys.
That's super awesome.
And I think we can get the end of today.
And also we can looking forward to next episode
of our Filecoin presence beyond storage series
of all the spaces.
Thank you, everyone.
Thanks, Aaron. Bye. Thank you everyone. Thanks everyone.
Bye everyone. Thank you.