Music Thank you. Music Thank you. Thank you. Music Thank you. Hey everyone.
Yeah, welcome back, everyone, to another episode of DSci Mike. This week, we are revisiting an important and recurring topic around DSci,
the intersection of AI and DSci, specifically how AI is essential to advancing DSci.
We'll be hearing about the Macrocosmos AI protein folding subnet on BitTensor, which is really exciting.
We're also joined by ZeroX Mikey F from WaterBearSci, who is going to be joining us to talk about their specific approach to AI.
And we have also got a fan favorite on Desai Mike returning.
That's Science Stanley, who works at Stanford and also Lillipad Tech. So we'll be hearing from him
too. As usual, I'm joined by my co-host Aaron McGinnis, who's actually behind the Desai Mike
account this week. Crypto Shrimp PhD is away this week. So yeah, Erin, did you want to kick it off?
Yeah, I'm super excited for this conversation. It's really leading off of some of the discussion
last week, which Michael from WaterBear was a part of and really kind of prompted deeper discussion on how AI is essential piece to advancing DSi and what the future of science looks like.
So to kick off this conversation, I would love to pick up where we left off last week.
where we left off last week.
And Michael, if you can kind of maybe recap some of that conversation
and then tie into just the broader topic of using AI to advance DSI.
Yeah, I think last week I was just saying that, like,
I don't think that, humans are gonna be doing research
What I think what's gonna happen is like AI is going to do,
like there's a couple of questions here.
One is like, what is AI able to do
and what is AI not able to do both now
and then like maybe like 50 years from now. So I say like in the
near future, AI is going to do all the stuff that traditional scientists do. So the role of humans
will be to contribute data and then all of the analysis will be done using AI. So you'll just like upload data sets
and then it will just be like,
oh yeah, beep, beep, beep.
Here's 10 questions that I've generated
that are interesting based on that data set.
And also, could you go fetch
these other hundred rows of data?
That'll help me complete my analysis.
And like initially what will happen
is like you'll be able to sort of move that AI
But I think at some point down the road,
it'll just sort of tell you what you need to know.
So that's like the near term stuff.
And then I think in the longer term,
if synthetic data works and if like,
like more advanced modeling works of physical systems.
I don't even know if you're going to need humans to collect the data anymore.
I think it'll probably be able to be done all in silico.
I don't know how close that one is, though.
That could be many, many years out.
That could be many, many years out, and we'll find out.
So yeah, it was just like a broad, broad topic that I think like AI is going to be doing a lot of science in the future.
And part of this is also decentralization, being able to allow some of this collaboration or interoperability with that as well. Can you
touch on that and maybe loop some different trends or involvements you might have?
Wait, say that again. I didn't quite understand. Say it again.
Yeah. So that piece of it was pretty heavily focused on AI and science.
How do some of like blockchain or decentralization approaches fit into this whole equation?
Yeah. I mean, I think that the future will be rewarded based on who contributes the data.
So basically it'll work by like you upload a data set.
I don't know if any technology to do this today,
but it's certainly possible,
but based on how much information is in your data.
So you contribute a really good piece of data,
like it's really valuable,
then you'll be rewarded more heavily versus say everyone
contributes like shades of blue.
And then all of a sudden you contribute like red and everyone's
like, wow, red is a really different color this is super new then um basically what will happen is
you'll be rewarded more and because your data was more useful and that'll be used to train the ai
and then anyone who does inference on that model will basically pay a little bit of money and then
you know it goes back to the the people who contributed the data and uh to track
that i think it's probably easiest to do it with the blockchain like if you were to develop a system
that um was like trying if you wanted to like track where data came from like
the easiest you could you could do it in like a database or you you could do it on the blockchain
and the block it would you could do it either way but i or you could do it on the blockchain.
The blockchain, you could do it either way,
but I think the blockchain is the more natural place
to do it because then you can track where it came from
and the payment rails are already built in
and all that sort of stuff.
So that's like the sort of interaction between like AI
and the blockchain is like for incentives
and for rewarding people.
Completely. Can you maybe talk a little bit about the work you're doing in sleep as maybe a tangible
example of what this could end up looking like or how people might be able to interact with
everything you're talking about? Yeah, so we're trying to develop the world's best sleep data set.
So the way we're doing that is we started this thing called WaterBear.
And initially, WaterBear started off as a sleep data set
where people upload their Oura Ring data and then sort of launch.
Then they can have the ability to enroll in different research projects. So you can say
like, oh, when I sleep east-west, I sleep better. When I sleep north-south, I sleep better. When I
sleep on a sleep-eight bed, I sleep better. I sleep worse. When I sleep with or without a person,
I sleep better or worse. When I take this supplement, I sleep better or worse. And all
these things are just like very complicated. And we wanted to collect a large data set both of the the conditions going in and then the final
results as measured by the Oura Ring data and so we're tracking all that data and
then we developed like a platform for people to basically analyze their own
data sets using like it's a combination of chat GPT and pandas
that we built and a little bit of UI magic as well,
that basically like writes code to give you information
about statistical significance,
prompts you for next question.
So it'll just look at your data set and be like,
oh, here's three questions that I thought of based
on this data set that you provided me. And you can also look at your data set and be like oh here's three questions that I thought of based on this data set that you provided me and you can also look at your you can look at your data but you can also
look at the group's data and you can say like are there any outliers here what is the r squared
value is this statistically significant and you can begin to like interactively do science on
your sleep data set sort of with the help
of the AI and what's really cool about this is that you don't just need to be
you don't need to be a sleep expert you don't need to have a PhD in sleep or
bio to be able to do this because the AI gets you up to speed very quickly and
it it can both write the code to do the analysis, but also it has all that sort of background information
on what are the different interactions
between these things that cause,
all the biology behind sleep.
And so we think, and as ChatGPT gets better,
and I think this will improve and proven,
we want to train, fine-tune a specific model so that people could get more information
around sleep as it relates to them.
And all that information would be coming from this data set that we've collected around
all these different interventions and all this private data basically
and so we would hope people would pay for it and then all that money would sort of go back to
the people that contributed to the data set
and no so then we also we also sort of broaden the platform so that uh we're not just doing sleep now
we're doing ones on psychedelics we're doing ones on microplastics
we're doing ones on like blue on nutrition and blueprint diet so we're sort of using this
platform and anyone can create a project as well on it and use these tools to mesh together ai
blockchain and science those are super compelling examples of of how the intersection of AI and DSI can be used to advance science as well as one's own personal health. intersection that might be either building off of some of the projects you're working on or
adjacent to while you're focused on building all of this out?
Hmm. Yeah. I mean, I think we, I think the area of data analysis and like,
And like, and an AI is a big topic that will be interesting.
I think reward, like measuring the value of information is that's used for training a model will be very interesting.
If someone figured out how to say like, what is the incremental information that is able to be extracted from this data
Those would be my two areas
that I think are quite interesting.
And also like robot process automation,
just like automating science collection,
automating data collection using a cloud lab
or a robot would be really cool.
Definitely, there are a lot of really exciting opportunities across this whole intersection of how we can use AI and more decentralized technologies or
more privacy and security focused approaches for advancing science and health. And Macrocosmos, I know you're taking kind of another approach to
what the future of AI and DSI at this intersection point can look like. So I might want to start by
having you share a bit about what Macrocosmos is to begin with, and then we'll start getting into the subnet you recently launched.
Yeah, so thanks, Erin, and thanks for having us.
So I'm Oscar Zanis here at Macrocosmos.
We are an AI research lab that span out of the OpenTensor Foundation,
who are the original builders of BitTensor.
And as of today, we're about 20 engineers, AI researchers, PhDs,
working on how we can build the world's most powerful incentive mechanisms for the provision
of intelligence. And BitTensor is a decentralized AI, decentralized compute protocol that is about four or five years old now.
It was conceived by Const and others. I guess, you know, the community to write rules and incentive mechanisms for people to create,
I guess, problems that can be solved. And Stefan will articulate this a little bit more, but
we are basically game makers for trying to create incentive systems and games for people to provision
computer intelligence resources to solve tricky problems. and our bread and butter in this space has been
the creation of of i guess what a lot of people see today is the very sexy elements of artificial
intelligence so we were the original builders of the first subnet that proved you could do
decentralized inference at scale so what that looks like is a thousand different large language
models all competing to give the best response in a computationally efficient and effective manner.
We've sort of since spanned out to do broader topics in that we're currently
hosting a pre-training competition on BitTensor where the team developing
foundational pre-trained models at 7 billion beat the efficiency
of Falcon 7 billion by 50 times and that it
took 50 times less resources
for a decentralized, incentivized, competitive team
to train a foundational model.
And last week we launched our first protein folding subnet.
And Stefan and I both have research backgrounds
And for us, this was saying,
can we create mechanisms and ways for
decentralized communities to organize compute around problems that are not
just sexy and interesting from an AI perspective but are meaningfully able to
push forward the boundaries of humanity and solve a lot of humanities tricky
problems and we thought protein folding was a really elegant expression of how can people organize
resources to solve these problems.
And there's a lot of reasons for that.
I guess most obviously Google obviously released AlphaFold3, which is a sort of very, very
high profile AI product that is not an LLM.
sort of the first decentralized citizen science project
was arguably folding at home.
So we see protein folding
as being a really beautiful expression of BitTensor's ability
to organize people and resources around a problem,
to help see if decentralized communities can do things that are a bit more
altruistic and a bit more interesting and we see protein folding as the first subnet on bit tensor
that is doing something that is is ultimately a little bit more altruistic in a broader sense
but more importantly we're really interested to see how much the community and how much a decentralized community can decide how important those things are in a distributed manner.
Because in traditional sort of capitalist systems, often it's the thing that makes the most money that gets the most attention.
Whereas our thesis is in decentralized systems, a lot of people actively care about these things.
And we think there's a lot of ability actively care about these things and we think
there's a lot of ability to organize resources to solve these problems and for now it's protein
folding but in the future it might span to anything from particle physics to perhaps solving
some of the challenges around climate science or climate change and i'll pause for breath there and
hand over to crux my co-founder who probably have a little bit more to say about protein folding specifically.
So I guess just to zoom out a little bit and motivate a little bit more what we're here to do.
There's clearly an opportunity for sort of massive collaboration at scale through incentivized decentralized systems
there's an enormous amount of latent compute in the world that can be harnessed and ideally
oriented towards humanitarian goals let's say however one of the unique challenges that's
presented with that is if it's a truly decentralized system and that typically means it's anonymized as
well so the participants are not necessarily identifiable, what typically comes with this as a result
is it can be quite adversarial.
What this means is because it's effectively an economic vehicle, people compete to make
the most of the token within the context of that subnet or that environment.
But this does not always lead to the results that one would like.
I often use the phrase, well, borrow the phrase, following the letter of the law, but not following
the spirit of the law. What this typically means in decentralized systems that are anonymized
is people will find ways to game the system. So imagine protein folding at home.
This is great because everyone in that community wanted to participate.
People were willingfully participating.
And most important, it was voluntary.
This more or less aligns everyone's effort vector towards sort of a common good scenario.
What's different here is when you add the economic layer to this as sort of a blockchain-based system,
economic layer to this as a sort of a blockchain based system, what can quite quickly happen
is sort of a race to the bottom, where people find ways to subvert the rules or perform
the calculations differently or sort of tinker with the outputs.
And the result of this can be a very rapid degradation of the quality of the compute
that's being carried out.
And so what Macrocosmos is here to do, what we're doing with the protein following subnet
and what we're doing at large
is trying to sort of really click into place.
How does one build an adversarially robust algorithm
that allows people to cooperate at scale,
even when, again, it's anonymized, it's highly adversarial.
And that is a very non-trivial problem.
And in fact, it's often case specific.
So solving this in the context of
evaluating the quality of natural language, which is inherently quite fuzzy and subjective,
presents one challenge that we've been working on for over a year. Whereas on the other hand,
protein folding actually maps quite gracefully onto a decentralized system because it's very objective.
A protein that has been folded in a more biologically appropriate way has, let's say, thermodynamic properties like its energy and so on and so forth.
And these things are sort of, it's impossible to game that system.
Nature, you cannot game it.
And this is a fantastic, I guess, match for a decentralized system.
People are competing to provide basically the most protein folding molecular dynamic simulations steps per second.
And that's kind of the access that everyone's racing towards.
And you better believe the race to the bottom is going to happen anyway.
anyway but why why not make it something which is focused towards um basically just cranking out as
many biologically interesting proteins rather than once again kind of going off the rails and people
are instead computing to competing along the wrong axes i guess the last point i'll make on this as
well and um something that is really remarkable about Folding at Home, which of course inspired
us to actually take on this current endeavor, is it has often been said that Bitcoin itself
is the largest supercomputer in the world.
And this is kind of a really great soundbite to emphasize how much potential there is for
decentralization and for the blockchain to harness all of that. Similarly, folding at home,
not many people know this, but with sort of the beginning of COVID, as the world collectively
focused a lot more on trying to contribute in any way we could towards the acceleration of
treatments and vaccines and so on and so forth, actually folding at home adoption rose so much
at the beginning of 2020 that it became the world's first exaflop compute system.
So this goes to show that an enormous amount of progress was made in molecular biology research because folding at home was an appropriate vehicle for people's voluntary efforts to be aligned towards humanitarian goals.
And we're trying to replicate that. We know that the opportunity is large
and with well-crafted systems,
we believe that the results will speak for themselves.
If people are listening in right now
and they're excited about what you're building
and maybe an opportunity to contribute towards this more
collaborative and decentralized approach to exploring other types of ways to fold proteins.
How might they be able to get involved? Are they like, is the subnet usable for them?
What are different opportunities for people?
Yeah, great question, Aaron.
So really where we're getting to right now
is we've sort of released this a week ago.
We're stabilizing the network,
which is to say we're sort of fleshing out
any of those early bugs that can happen
when you take a system like this into production.
And trust me, there's plenty. What we've got our sites firmly set on here is inviting
researchers to come in for fully funded grants a completely free research experience from their
point of view because it's kind of a mutually beneficial symbiosis here we want to demonstrate
that bit tensor can bring real utility to researchers, especially people that want to carry out high
caliber, high impact research. And for whatever reason, those researchers don't always necessarily
get first choice when it comes to HPCs and clusters and traditional avenues for them to
access compute at scale. What we want is to invite them in to work with us so that we can show that
BitTensor can also address these really imperative problems.
And of course, they benefit from unlimited access to compute.
They can basically steer the tank, for lack of a better phrase.
So they have this massive computational system that they can use on demand, oriented towards the specific parameters for their research.
And that's our dream here.
We are accepting applications for people that want to start using the subnet as a research
And yeah, I guess the only thing I would add there as well is protein folding is our first
excursion into the space.
But as Will mentioned, we're both very passionate about research and we're trying to think about
how we can come up with a sort of more general set of principles
to translate traditional research problems into ones that can become adversarily resistant
distributed problems, benefiting from the massive parallelization opportunity, but also making sure
that they can't be subverted and they're basically exploit resistant. So it's going to be moving
really fast. Protein folding is going to be
our first endeavor, but we hope not the last.
Amazing. That's super exciting. I know in some of our conversations, there's particular passion
amongst the team on physics. Do you see, I know there's some mention of like particle physics exploration. Are there
any other kind of hopes for within the D-Site ecosystem at large or other visions kind of
along those lines? Absolutely. Well, what we've been trying to identify here is what are the biggest lever arms that we can provide?
We can solve very specific problems like protein folding, but there's certain fundamental operations, let's say, that are much more powerful because they're so general and they're so agnostic to the specific context of certain scientific problems.
of certain scientific problems.
And I believe one example of this is matrix diagonalization, right?
And I believe one example of this is matrix diabolization.
This is kind of like the spine which holds up the body of so many different disciplines.
In fact, it goes from modern machine learning with single value decomposition all the way
Of course, a plethora of physics problems all depend on diagonalizing huge wave function
matrices and so on and so forth. So we think that strategically, we actually
want to double down on the most primitive problems that
have the most sort of access opportunity,
and then translating that into a specific use case,
whether it's particle physics, condensed metaphysics.
That's simply a matter of switching out
the sort of the math back end, so to speak,
and having access to our subnets to power that.
That's super exciting. I'm really eager to see how as more researchers engage with what
you've built out and just more scientists across the board engage with some of these
different subnets and networks and compute capabilities, what might be possible as we
keep advancing science overall. And Stanley, I know you're also involved in the AI compute space with some of your work at Lilypad. Can you share with us how Lilypad is
approaching DSi and maybe looking at it from another angle, such as discovering different drugs?
drugs? Hey, yeah, absolutely. And also, good morning. So happy to be here. Wanted to also
just shout out WaterBear for such an incredible project. As a person who is drinking coffee as
quickly as possible this morning because he struggles with sleep hygiene sometimes.
I think it's just hard to overstate the importance that sleep plays in health.
So like, yeah, doing God's work over there.
Yeah, we like you as well.
Oh, well, listen, mutual love fest and that there'll be more love as I drink coffee too.
But 100%, let's talk some proteins. fest and that there'll be more love as I, as I drink coffee too. Um, but a hundred percent,
let's talk some proteins and, um, you know, all of these kind of infrastructure questions of,
uh, kind of storing, accessing, scheduling, compute via the blockchain are so interesting.
Um, but at, at Lilypad, um, I'm, uh, kind of focused on the science, and it has been so fun.
And it kind of actually is an interesting configuration that came as a result of my kind of serial tendency to come into Twitter spaces and complain that researchers don't have access to compute.
The good folks at Lillipad reached out, and they said, what could we be doing with this compute?
And man, is it so cool to hear about other projects in the space and folding.
I'm a person who came to kind of molecular engineering as a passion through folding at Home. Me and my friends would go up on Sand Hill Road and dumpster dive for
computers that we could Frankenstein into some truly ridiculous clusters in the chase for points.
But yeah, then kind of maybe some of the specific stuff that I'm looking at would be,
you know, maybe the process that's directly downstream from folding. I'm currently doing work to set up docking as a system that could run on decentralized infrastructure. And then
docking is kind of interesting. That's where, you know, you fold proteins into these shapes, but
you care about the shapes because the shapes determine how they interact with other objects,
how they dock with other objects. how they dock with other objects.
So the docking simulation kind of starts maybe once you've folded,
and then you're sort of simulating the kind of particulars
of how these kind of lock and key configurations happen
between a protein and other types of molecules.
So yeah, that would be like kind of the main thing
I'm up to at the moment at Lilypad.
I think my internet is a tad bit spotty. So Stanley, definitely please keep elaborating on
some of kind of even more downstream from this, but it looks like when it might get to the stage
of interacting with people and some of the other involvements you have,
especially looking at different rare diseases. Oh, yeah, totally. And man, I have so much more
to say about proteins and molecular engineering. And hopefully, you know, we'll be able to kick
that ball back and forth a little bit. Because you have to say on the topic of sleep hygiene, since AlphaFold3
came out, it's been a pretty epic moment. I really think we're living through history.
So, you know, I think the race to understand what we can do with a tool like that, but also,
you know, I think there's a beautiful decentralized science story playing out where a number of
groups are kind of rushing to reproduce AlphaFold3 in the open source.
So yeah, maybe we can loop back around to that.
But yeah, 100% got to shout out, you know, something we're doing on a different side of Desai AI,
which is the Stanford Rare Disease AI Hackathon.
We'll try to post a link to the site for that. But high level, you know, we're
just hoping to create high quality models and validation sets for, you know, the medical and
clinical practice around rare genetic diseases. So we're doing this through this really cool
collaborative intelligence process, aka a hackathon. We have, you know, a couple dozen teams from all over the
world, each working on different specific problems, training different specific models.
And then we're having those models interacted with and validated by a network of experts in
rare genetic disease. And this is just really interesting because in some ways, when you think of like all the different knowledge communities that exist in the world, I think rare disease communities have some of the deepest wisdom.
You know, you have these people who, you know, because they've had to have become experts on their condition, but also incredibly decentralized,
fragmented knowledge. And, you know, I think it's always acknowledged challenge that it's hard to
connect with the right communities of support when you do have a rare illness. So yeah, it's
really interesting to kind of look at the process of training and intelligence as, you know,
something that can happen collaboratively
that can kind of mobilize these communities and that can, in a really powerful way, hopefully
distill some of the wisdom that's present and then put it to work for patients. So yeah, that's
been really exciting. And yeah, in about a month, we'll be doing a kind of demo day where
teams will be presenting models and we'll be doing some kind of collaborative validation with our medical mentors.
Amazing. That's super exciting. And I know some of the work you're doing here under like the
umbrella of research to the people is in collaboration with Stanford. Are there other either larger academic or university institutions or private sector as well who are leaning into either leveraging AI heavily in some of these investigations or a more decentralized approach to solving some of the problems that they're really passionate about?
And I think we're seeing a lot Genes that kind of acts as a research
intermediary for a lot of institutions that do genetic medicine. And yeah, it was just kind of
so cool because the contributors were very high level machine learning engineers. The projects
were very high quality and they were being done with data that, you know, had never
really been analyzed before at all. And so it was really something effective and beautiful that
happened and kind of specifically because you put, you know, just the right problem in front
of the right people. So yeah, I am definitely seeing more of that happening, but I think we need even more.
And I have to say, for the macrocosmos folding net, man, I'm going to be keeping that in mind.
Because resources like that and access to engineering talent like that,
I think there's a lot of people working on little niche areas of science that don't even know it's available. And so I really also would love to shout out the organizers of this space
for just having the conversation because, you know, I think the more we're talking about
what's increasingly possible in silico and, you know, how many engineers and builders are out
there who really like want to pitch in for good causes. Yeah, I think that we're going to connect the dots, get a really beautiful picture drawn.
I'm not sure if you'll be able to dumpster dive the hardware you need to compete on our subnet,
but we invite you to come join us anytime.
Like things that, you know, we saw happening in microcosm long ago or
happening at a worldwide scale in macrocosm you could say ah listen you
caught me that's just where I was going sir and I and I have to say 100% like I
said want to get some hopefully proteins folding over there and what we found
super interesting is that the pace of the community we launched an
initial um cpu miner when we initially launched the subnet there's already guys who've built gpu
miners that are 10 times faster than the base miner um and would love to see some of those uh
those open source alpha fold threes you were talking about stanley getting into play on the subnet. So super exciting.
And, you know, maybe one way or another we can connect after this.
And if nothing else, you guys, my original training was in HPC for particle physics. So it sounds like we might have some stuff to nerd out on.
But, yeah, you know, diffusion based systems like alpha fold three, like learners
Um, I think they're just going to get, uh, more and more powerful, obviously effective
Um, I was excited before someone mentioned kind of like engineering of thermostability
and that's been something I've been working on a lot.
And, and I think that like, you know, when you think about the total space of
all possible proteins, which is, you know, something we can think about mathematically
to some degree, we've only really scratched the surface of what proteins can exist. And so I think
that even beyond medicine, like when we really have control and powerful tools for like precision
engineering of proteins and enzymes, I think it's going to rearrange the whole economy.
It's very exciting stuff.
But it's going to take a lot of compute.
All of these different spaces are making really exciting progress.
these different spaces are making really exciting progress. Were there any other additional points
that you wanted to hop back to, whether related to LilyPad or some of the conversation on protein
folding? Stanley? No, I think I kind of covered the main stuff um you know one one like small nerdy
thing that i'm really excited about is is actually um protein language models um so i'm sort of just
finishing up a small paper about um using environmental data sets to fine-tune uh protein
language models um but yet to me sort of like like tools like AlphaFold, tools like, you know,
diff dock and docking simulation, and then protein language models. These are just tools that are
going to add up to some real magic in the short term. If I may, you know, there was a tidbit from
earlier in the conversation that I would like to circle back on.
You know, it's been mentioned that, you know, AI is becoming, whether it's like, you know,
large language model based or some other approaches, you know, increasingly better and competitive with humans on doing sort of like repeatable sort of like rote tasks, like, you know,
for example, like data entry
or, you know, this type of thing.
And I think it's also been demonstrated that like, you know, AI is also capable of like
inference, you know, generating hypotheses and testing hypotheses and so forth, right?
So, you know, I mean, how does this change the relationship of human researchers and scientists to the endeavor of science, right?
I can't believe that at some point, whether it's soon or in the future, AI will be able to steer scientific discovery in terms of what are the important goals to pursue?
Or will that be sort of like a uniquely human role for the foreseeable future?
I'd happily take a first pass at this one.
I would personally delineate it into sort of
different time horizons so that we can speak with perhaps a little bit more confidence.
What I think has already been demonstrated with some success is sort of embedding physics priors into AI models. So sort of finding a way to get a model that respects things like the conservation of energy, momentum and time and so on and forth.
And the value of that is they are much more effective at researching interesting scenarios and they become much more attuned to what might be novel and interesting.
What you're basically wanting to do is create a novelty detection system, which is what scientific discovery is effectively doing. It's just a massive novelty search in a way. Longer term, it's much harder to know how autonomous those systems
would be, in my opinion. I know that's not the most fun and quotable response, but that's the
academic honesty in me not really wanting to go out on a limb there. I think that the systems,
first of all, need to understand what the sort of constraints that reality imposes upon their experiments and the kind of reasoning that they're able to perform.
And I think that we've had promising results so far.
I think it's only going to get better as we are able to embed that more and more deeply.
And look, I'll be watching with bated breath to see how far this gets.
But think of it more like a coding copilot that's going to become more and more competent. And it's a continuum. I don't see
this as zero one. It's going to be a productivity boost in the interim and maybe longer term. It's
going to be minimal direction. Something like a Jarvis from Ironman that's kind of like your
helpful assistant that's also an absolute badass researcher, but doesn't necessarily need to become entirely autonomous
to bring that 95% of value.
So that's my honest take on that.
Anyone else have any other additions to that question?
Awesome. Well, we have some time now for any questions from the audience as well. So, if audience members have questions, feel free to request the mic
to dive deeper into this conversation on the intersection between AI and decentralization
to advance science and de-sci. Or you can leave a comment on this space and we can get to that and
kind of bring that into the voice conversation as well.
Given that, speakers, anyone up here, if you'd like to reiterate a point as well,
there's opportunity for us to dive into that right now too.
in the meantime we have one person who's come up but michael go oh i was just gonna say um
like i i wonder at what point like it will just be able to calculate like the need for data like
will data ever will we ever get to a point where data is completely not needed because we're
able to calculate everything from like first principles that would be like a
And like if synthetic data is able to actually work,
And then there'd be sort of no limit on how fast science could go.
Because right now science is still sort of bottlenecked by data capture.
But if synthetic data is going to work, like then there's really no sort of limit
on what is possible with science.
But I think that's still a big question
and given a different time horizon as well.
I see Sterling's got his hand up.
I'd like to give him a chance first, maybe.
I have a novel situation right now where I'm doing an AI project through a DSI catalyst molecule model.
captured from patients who have Alzheimer's and dementia. And then that data can be analyzed by
an AI model. And then that, we're hoping, can be kind of licensed out
to be used in further treatment down the line to kind of advance that area.
I'm curious, does anybody... Where do you kind of delineate the line between, so you have data, right? So we're using like, let's just say, WAV files or OGG files. That's the data. Then there's AI code, which is basically Python right now.
And then there's a model.
Where do you guys kind of see the delineation
and where does it decide, where is it not?
Is there a good framing to look at this
in terms of from people who are a little bit further on
in using AI in these kinds of contexts?
This would be my first time incorporating AI
But yeah, does anybody have any thoughts on how to kind of talk about it?
Where is it DSI? Where is it not?
Or any novel kind of insights into kind of how to navigate that conversation around utilizing AI in this kind of,
is like a real example to the Catalyst model.
If anybody has any thoughts, I'd love to hear any feedback or something like that.
if you can like separate your,
what you're trying to do into chunks that you can like give people.
I don't know if we've ever talked,
I'd say if you can give people like chunks of tasks
so maybe it's some analysis
or some sort of data collection
or some sort of finding related work
that would be like a chunk of,
that would be something that I would distinguish
something as a DCI project as,
if it's not just one person doing everything,
but you're sort of rewarding people with token rewards
and sort of changing the incentives around.
Yeah, good to interact with you as well.
That's how we've interacted previously.
Cool, yeah. Yeah. So yeah, I think
that's a good way to look at it for me. And that's even actually in the research contract
that was signed when the minting happened was that there's a delineation between,
I think I listed it as five potential pieces of the equation, which is there's the data, there's the model, there's sort of the
intangible intellectual property about how the data is kind of collected and stored even.
You know, there's like a question of like the methodology of how it's going to be encrypted
and abstracted so that it's protecting patient data. There's also a piece of that too.
So yeah, I think I like that.
That's probably the word I'm looking for
is kind of the chunking of different components
of all pointed to by that one singular token.
So I like that kind of thing, that framing.
I guess to answer an earlier question about synthetic data, I believe this can be a natural
sort of fit with what I was saying about baking in the laws of physics.
There's multiple ways to do that.
You can enforce it at the level of an architecture and, for example, a convolutional neural network
is a classic example of something that embeds sort of spatial invariance into a model's understanding of the world.
And there are ways that you can demonstratively do the same with perhaps more deep physics principles or scientific principles.
But another way you can do it as well is you can just give it loads and loads of samples of data, synthetic data, experimental data, doesn't really matter.
And of course, eventually it will condense that representation somewhere within the model's
weights or whatever the architecture you're using is. And in fact, this is something I did for
several years before working with Will on sort of decentralized intelligence. I was working on
state of the art, basically building AI models for state of the art semiconductor manufacturing.
And what we found is for these atomic level precise manufacturing processes,
where you're basically depositing a layer of atoms at a time onto a crystalline
structure to be used as a three, five wafer to use a little bit of the technical
We found that we could actually just show it lots and lots and lots of examples
of various thermodynamic phenomena and a machine learning model would learn.
So I think that synthetic data is a lot more convenient
and a lot less invasive, and frankly, it's a lot easier.
It's a lot more passive than designing bespoke architectures
that capture the particular laws
or whatever it is that you need.
So I'm pro-synthetic data, let it be known.
Yeah, I mean, I think it works up until a point,
and we're not at that point yet to know where it sort of cuts off. At some point, you have to...
adesium Adam or something.
Something super, super...
Like, what is the basic build?
I don't know that we have the knowledge to figure out, like,
what is the basic building block enough to build up everything else.
But, I mean, it is possible.
But this is sort of an engineering...
A very, very, very difficult engineering question now.
I think you'll probably slowly ascend the terms in the Taylor series expansion.
That's what I think your synthetic data is going to do. You're going to slowly get some model of
the first term, and then the second, and then the third. But I mean, sure, it's only as rich as the
data that you teach it with. And that's under the assumption that the model is sort of infinitely
responsive to the data it's given, which is not always true either. There's always biases in there.
But look, I'm very pragmatic with these things. I'm not a theorist, if you can't tell. And so
for that reason, I just sort of need to specify how accurate my model needs to be to be useful.
And I'm quite happy to just gather that many terms in perturbation theory and then get on
happy to just gather that many terms in perturbation theory and then get on with my day job guys
man uh really interesting stuff and i i just love the point too that there
there really are quite a few layers from the data to the model to the processes and operations around
you know getting and and working with the data. And I always myself like to remember that
the DSI movement is kind of a movement defined or pursued relative to what exists currently.
I mean, there always have been decentralized projects in science to different degrees,
and decentralization can happen on a spectrum and so i think that for
me the heart of the movement is a recognition that science has gotten to a place where it
errs on the side of centralization too often and that yeah the core heart of de-sci is sort of like
being a counterweight to kind of help us you, find the right configuration for the right problem. And Sterling, I do just love that you're working on kind of cognitive health, because like,
A, I think that's such an important and under addressed area, particularly in terms of patient
impact. But also, I think some of the most interesting decentralized data work is being
done in cognitive science. I recently have been doing
a little bit of multimodal training with the human connectome data set. And the human connectome
data set is something you might be able to say more about than me, but it's sort of
translationally integrated brain scans with the kind of cognitive and behavioral inventories for
a number of patients over a time series. And so you can actually kind of cognitive and behavioral inventories for a number of patients over a
time series. And so you can actually kind of start to connect the dots or the pixels, so to speak,
between what's happening physically in the brain and what's happening cognitively and behaviorally
in the patient. And again, just sort of no silver bullet, no single answer for like, what is DCI
doing or what should it do.
But kind of an interesting example, right, that the Human Connectome Project does have labs from all over the world working together to pool this data.
But I would maybe mention that they're all doing it with a very rigid schema and, you know, with the very sort of like similar cognitive and behavioral metrics.
And so, you know, even in a decentralized science project, like there's maybe some room to open it up to researchers to say like, hey, what kind of data would be valuable for you guys?
What kind of metrics like would be most valuable to to integrate?
But anyway, that's just my two cents and high level, though, just very grateful for the work you're doing, Sterling.
And, you know, I think it's remarkable to like how much of our cognition is carried in our voice.
I think maybe you could even tell us some more about this, but I think some of the most reliable early warning indicators for dementia are you can hear in the voice, right?
My computer's acting up. Yeah. So yeah, voice is like a really good biomarker for a lot of the development of neurocognitive disorders. And it's actually surprising that it's really not even
measured in studies. And I think the reason is because it's really not even measured in studies.
And I think the reason is because it's been too subjective.
There are doctors, really good doctors can listen to a patient on the phone and immediately determine there's something, there's early warning signs for dementia.
But it's been impossible to standardize that until now AI has gotten so good that you can actually develop a
model that can learn how a doctor is able to hear the problems. And so that's what our objective is
with this particular study, is to have authenticated dementia patients speaking
into a standard recording device, probably an iPhone,
and then really running hardcore AI models on that and saying, okay, this, like, you know,
really developing that as a, could become a really good biomarker that's used across all,
that's used for years to come. So that's the hope with that study.
Oh, man, I just love it. And then you imagine with these new multimodal models that are coming out,
like if you had, you know, features from audio, from from these behavioral and clinical inventories,
and then also the patient's brain scans. And then again, you kind of go like, oh, well,
this is where D-Sci could be
doing really interesting stuff, right? Because if in an imagined future, every experimental
protocol and methodology, including the human connectome project is on chain, then it's like,
you know, oh, pull requests to add an audio data layer to the contribution spec. Oh, anyway, man,
that's just so exciting. And I hope you'll kind of,
you know, share information about that work as you go, because I think that could be incredibly
high impact. Yeah, I want to kind of keep my project as much of an open book as possible.
So I do try to share as many of the milestones as we hit as we go, because I'm learning this as much as every one of you are as well.
And I think I don't want to be the only person who who does something novel.
I think academia is rife for disruption big time.
So I'm here with all with all of you.
Completely. I like it with all of you. So, yeah. Completely.
I'm a big fan of Sterling.
One quick thing I'll say is it's funny because when I was a kid, my mom stole me away from school to go chase whales in Canada to go listen to them with a hydrophone.
So she was obsessed with trying to understand how whales communicate. And I was like, why?
I was like, I'd rather be playing PlayStation, Twisted Metal with my friends. And now as I'm
older, I'm like, oh my God, like I got, like what a gift that I got to go do that. And now
I get to use that in my work today. It's pretty, it feels right for me to do this kind of.
It's so funny you say that.
And I know we're just hitting the time,
but over at New Atlantis Labs,
just started working with a really incredible
machine learning engineer,
who is actually where I know most of the stuff
about audio diagnosis of cognitive function.
And then, yeah, we've recently been applying some of the same tools that he used to featurize audio data on WhaleSong.
And I did some work maybe three or four years ago building transformers around crow vocalizations.
crow vocalizations and we had some really good results and i think that the whale language
And we had some really good results.
problem is is several orders of magnitude more complicated because even just understanding
you know the features uh because they they're so um you know acoustically precise like these
are creatures that can kind of like sense the exact dimensions
of a coke bottle from 300 yards with sonar like the sort of intricacy of their voices is
mathematically just even hard to kind of wrap your head around from a spectrographic perspective
like kind of just looking at the graph of the audio wave um but yeah i actually do have a feeling
that sort of within the next couple years,
we actually will get some serious advancement in understanding whale language. It's very exciting.
Amazing. So many exciting advances across all of science and all of the incredible work everyone here is doing and within the broader
de-sized scientific AI compute landscapes as well. Wanted to say thank you so much to the
Macrocosmos team for joining in on this space today and sharing a bit about the protein folding
subnet they recently launched on BitTensor to Michael and
the work he's doing at WaterBear specifically on sleep data and making that more accessible
and understandable to the individual person, as well as allowing understanding at a more
collective level. And then Stanley, of course, with all of the work
you're doing across rare diseases, the hackathon, from a compute perspective at LilyPad,
and just other involvements across protein space as well. So thank you to everyone for tuning in.
Sterling would love to dive into some of the work you're doing in future spaces too.
Personally, I would love to have one on neuro-related things and the brain.
So I think that's definitely in store.
With that, we have a space every Wednesday at 4 p.m. UTC.
So looking forward to seeing everyone back next week and invite your friends.
If you have a topic you would like to be a theme for a week or you know someone else who would be quite interesting, please have them reach out as well.
I'm Erin McGinnis and the listeners, Merrick or the D-Sign Mike accounts are all great ways to get connected.
Thanks for joining us, everyone.