Mapping of CSRs in Linux

324 views
Skip to first unread message

Benjamin Herrenschmidt

unread,
Jun 14, 2020, 12:23:09 AM6/14/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
Hi folks !

I would like to revive the discussion and hopefully get to a consensus
about how CSRs are mapped/used in Linux.

I've looked at what's in the current Linux ports for LiteX here:

https://github.com/litex-hub/linux/commits/litex-vexriscv-rebase

And I've noticed a few things, please correct my understanding if I got
something wrong :-)

- The CSR accessors litex_{get,set}_reg are in include/linux/litex.h
added by "LiteX: add SoC controller". However, there are drivers
*before* that one in the tree that use them (ie. non-bisectable
series). Some like UART or LiteEth don't even use the accessors.

- LiteEth is broken in that it mixes up the LiteEth MMIO region with
its CSRs and so pokes at the wrong register to reset the PHY.

- Generally speaking, the CSRs are directly accessed assuming a 32-bit
wide direct memory map. More specifically what I mean here is that each
individual device will find its CSRs as one of its memory resources in
the device-tree and poke at it via the accessors. While I (very much)
like this (:-) it goes against the will of some to support the same
driver on top of different busses such as USB etc... Do we really care
? If yes, see below.

- I don't understand the bit manupulation/size business done byte
litex_{get,set}_reg. Care to explain ? It feels ... wrong. But I might
be missing something. Generally speaking, the two main reasons for
having the accessors at all that I can see are 1) native endian and 2)
those bit manipulations. Is there a reason why we couldn't get rid of
both and just have the drivers directly access the CSRs using standard
accessors in a fixed (LE ?) mode ? Otherwise, I would prefer mimmicing
what regmap-mmio.c does and just have an ifdef LITTLE_ENDIAN/BIG_ENDIAN
does the appropriate "be" vs "le" accessor rather than re-invent it
with hand-spinkled barriers.

Now some more general questions...

- First a bit of bike shedding ... are we wedded to the names litex_{
get,set}_reg ? These are specific to *CSRs* whihc aren't all LiteX
registers (Eth uses its own "registers" that aren't CSRs, it's my
understanding that SD will soon as well etc...). Should we be more
explicit and call them litex_{get,set}_csr ?

- Since CSRs aren't supposed to be a hot path, should we bit the
bullet then and just use regmap ? It will cost us one or two
indirections per access, but it will handle going through different
transport busses if ever needed, handles native endian, handle
selecting fields of a register etc.. etc... I am not *super familiar*
with it but I'm going to have a closer look.

- If we are going to stick with direct MMIO, any reason not define
once and for all a fixed endian for LiteX devices (hint: LE) and let
drivers directly use readl/writel ? If we still want to deal with the
special case of "larger-than-32-bit" CSRs we could have accessors
specifically for that but generally I thought Florent wanted to get rid
of them (or did I misunderstand?)

Finally as soon as we decide, I'll try to get Linux going in Microwatt
and fixup liteeth to do MDIO etc...

Cheers,
Ben.



Gabriel L. Somlo

unread,
Jun 14, 2020, 10:39:57 AM6/14/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley, Mateusz Holenko
Hi Ben,

As an interested party, and occasional drive-by contributor to Litex,
my main interest is in having a fully Free/Open Linux-capable computer
that can self-host not just its software stack, but also the
"hardware", at least all the way down to FPGA bitstream (shameless
plug: http://www.contrib.andrew.cmu.edu/~somlo/BTCP).

On Sun, Jun 14, 2020 at 02:22:52PM +1000, Benjamin Herrenschmidt wrote:
> I've looked at what's in the current Linux ports for LiteX here:
>
> https://github.com/litex-hub/linux/commits/litex-vexriscv-rebase
>
> And I've noticed a few things, please correct my understanding if I got
> something wrong :-)

The "litex-vexriscv-rebase" repo you referenced contains most of the
existing Linux drivers, of which a large part was contributed by
Antmicro (added Mateusz to the cc list). However, they're mainly
targeting 32-bit CPUs (vexriscv primarily), and 8-bit CSR data width.

For a (hopefully backward compatible) update to 64-bit (Rocket) and
32-bit CSR width, see
https://github.com/litex-hub/linux/tree/litex-rocket-rebase
which is meant to keep backward compatibility to 32-bit CPUs and 8-bit
CSRs (the main difference is accessors in litex.h are similar to the
infamous litex/soc/software/include/hw/common.h, and drivers I'm
interested in -- liteeth, mmc-spi, etc. -- have been updated to use
these "universal" accessors).

> - The CSR accessors litex_{get,set}_reg are in include/linux/litex.h
> added by "LiteX: add SoC controller". However, there are drivers
> *before* that one in the tree that use them (ie. non-bisectable
> series). Some like UART or LiteEth don't even use the accessors.

The accessors (and the soc controller) were added (I think) in an
attempt to prevent proliferation of open-coded scatter/gather 8-bit
read/write and shift CSR accesses throughout the rest of the codebase.

Obviously, when upstreaming into linux proper, that should be the first
thing that is submitted (and it is, see:
http://lkml.iu.edu/hypermail/linux/kernel/2006.0/04479.html).

It may be debatable whether the proposed approach (start with 32-bit CPU
and 8-bit CSR support, then update for the rest) vs. something else
(figure out everything *before* upstreaming into Linux) is best, but I
just wanted to make sure we're all on the same page :)

My personal interest is in getting a Rocket-based open-gateware system
running with upstream linux, and I'm happy to help make that happen once
consesus is reached on the CSR architecture portion of the conversation,
which I'm going to defer on to those more qualified and/or with stronger
opinions on the subject :)

Cheers,
--Gabriel

Gabriel L. Somlo

unread,
Jun 14, 2020, 2:07:24 PM6/14/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley, Mateusz Holenko
On Sun, Jun 14, 2020 at 10:39:56AM -0400, Gabriel L. Somlo wrote:
> My personal interest is in getting a Rocket-based open-gateware system
> running with upstream linux, and I'm happy to help make that happen once
> consesus is reached on the CSR architecture portion of the conversation,
> which I'm going to defer on to those more qualified and/or with stronger
> opinions on the subject :)

That said, I'd love to see predictable endianness and up-to 64bit MMIO
register width available in LiteX (essentially, teaching the eventual
incarnation of the CSR bus about byte strobes, so any access width a
CPU would use, up to 64 bits, would work seamlessly).

I know that would make Linux upstreaming much smoother. Not sure about
all the non-linux use cases of LiteX, and their tradeoff costs and
comfort levels with the above...

Best,
--Gabriel

Benjamin Herrenschmidt

unread,
Jun 14, 2020, 11:28:31 PM6/14/20
to linux...@googlegroups.com, Joel Stanley, Florent Kermarrec, Stafford Horne, Mateusz Holenko
(appologies for the semi-dodgy reply, I had issues with google groups)

> The "litex-vexriscv-rebase" repo you referenced contains most of the
> existing Linux drivers, of which a large part was contributed by
> Antmicro (added Mateusz to the cc list). However, they're mainly
> targeting 32-bit CPUs (vexriscv primarily), and 8-bit CSR data width.
>
> For a (hopefully backward compatible) update to 64-bit (Rocket) and
> 32-bit CSR width, see
> https://github.com/litex-hub/linux/tree/litex-rocket-rebase
> which is meant to keep backward compatibility to 32-bit CPUs and 8-bit
> CSRs (the main difference is accessors in litex.h are similar to the
> infamous litex/soc/software/include/hw/common.h, and drivers I'm
> interested in -- liteeth, mmc-spi, etc. -- have been updated to use
> these "universal" accessors).

Allright sooo... there's a bunch of bits and pieces here that
could/will be problematic for upstreaming but I'm happy to help,
I could even maintain a "for upstream" tree with my own cleanups
if you guys want.

That said, let's go back to the basics: Is there *any* reason why we
would want to continue supporting the 8-bit CSR case in Linux for
upstream ?

Because it would be a LOT simpler to just get rid of the accessors
alltogether. There's general resistance upstream to adding more of
these things... Honestly I'd rather see the drivers directly access
their CSRs as if they were normal registers.

> The accessors (and the soc controller) were added (I think) in an
> attempt to prevent proliferation of open-coded scatter/gather 8-bit
> read/write and shift CSR accesses throughout the rest of the codebase.
>
> Obviously, when upstreaming into linux proper, that should be the first
> thing that is submitted (and it is, see:
> http://lkml.iu.edu/hypermail/linux/kernel/2006.0/04479.html).
>
> It may be debatable whether the proposed approach (start with 32-bit CPU
> and 8-bit CSR support, then update for the rest) vs. something else
> (figure out everything *before* upstreaming into Linux) is best, but I
> just wanted to make sure we're all on the same page :)

Well it depends. If there are strong feelings for keeping the 8-bit support
or even getting into "CSR over USB" as proposed by Tim, then we probably
should indeed keep the accessors. But I would make them complete and fully
functional *before* all the drivers that use them. There's no point in
having a patch adding 8-bit only, then a patch fixing that up etc...

That said, if we really want to go all the way toward supporting non-MMIO
mapped CSRs, then we probably want regmap, despite the increased cost.

> My personal interest is in getting a Rocket-based open-gateware system
> running with upstream linux, and I'm happy to help make that happen once
> consesus is reached on the CSR architecture portion of the conversation,
> which I'm going to defer on to those more qualified and/or with stronger
> opinions on the subject :)

My personal interest is microwatt but I'm happy to help with the big picture :-)

I can also help smooth things out with upstream maintainers and anticipate
the kind of reaction they'll have as I've been one myself.

The implementation details are easy frankly once we agree how we want to
proceed.

Fundamentally here are the options that we need to make a decision about
as I see things:

1- Keep 8-bit support or not for "Linux supported platforms".

2- Stick to supporting only direct MMIO vs adding indirections to support
more funky based setups such as LiteX devices over USB etc...

3- [Florent] Add an option to the gateware (and LiteX firmware support)
to force CSRs to be a fixed Endian on Linux supported configurations. That
has a (hopefully small) cost on BE platforms if we decide for the CSRs to
be LE, which I'm in favor of paying, but others might disagree. This is
the only option so far that requires gateway changes.

4- [WARNING: Bikesheding] name CSR acessors (if they still exist) something_csr
rather than something_reg

Can we agree on these one way or another ?

My personal preferences are 1: NO, 2: YES (but... see below) 3: YES

There are more details to iron out, but less key. For example there's an
ongoing discussion on the layout of the CSRs that span more than one
"word" and whether the bits are left or right justified etc... but let's
solve that separately once we've agreed on the basics (also it might require
gateware changes)

As for my preferences for the choice above. I don't see the point in
keeping support for 8-bit-over-32-bit for Linux. It's simplifies everything
to drop it.

For 2) well... not supporting means the simple choice, direct MMIO from devices.

If we go the regmap route instead, which would support all sorts of crazy setups,
then we need to add plumbing to create the regmaps etc.. it complicates things and
it adds I think at least 2 indirect calls on the path to every IO.

We could try to add a "layer" than can do either inlined ... ie, create some kind
of (pseudo-code). [Note: I don't actually *like* this, but it's an option...]

struct litex_csr_region {
enum { LITEX_IO_DIRECT, LITEX_IO_REGMAP } model;
union {
void __iomem *direct_addr;
struct regmap *regmap;
} arg;
};

And have some helpers to set that up form the DT, which could initially be simple
and only support the direct model, and acessors do something like:

static inline void litex_write_csr(u32 val, struct litex_csr_region *r,
unsigned int offset)
{
if (r->model == LITEX_IO_DIRECT)
writel(val, r->arg.direct_addr + offset);
else
regmap_stuff(.....);
}

And a helper drivers can use to create such region objects from the DT.

Cheers,
Ben.


Florent Kermarrec

unread,
Jun 15, 2020, 3:18:02 AM6/15/20
to Benjamin Herrenschmidt, linux...@googlegroups.com, Joel Stanley, Stafford Horne, Mateusz Holenko
Hi,

restricting the CSR bus to 32-bit data-width on Linux capable SoCs seems indeed the right choice since makes things a lot easier. As we demonstrated, using 32-bit vs 8-bit does not impact resource usage/timings on the FPGA and it also scales well with 64-bit CPU where strobes are handled in the 64-bit to 32-bit converter. I don't see a need for keeping 8-bit support (in Linux capable SoCs) regarding the complexity it adds for Linux, if we want plug cores/devices that have been designed for 8-bit CSR bus, it's probably this core/device that needs to be adapted.

For the endianness, benh, can you describe the advantages of a fixed endianness for the Linux drivers vs using native endianness of the CPU? The changes in the gateware are easy, but i would like to understand since it will also impact other software like the BIOS and second stage bootloaders support. We should also probably include in the discussion all MMAP accesses to peripherals (so both CSR and Wishbone/AXI accesses).

Best,

Florent

Mateusz Holenko

unread,
Jun 15, 2020, 4:13:46 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
Hi Benjamin,

On Sun, Jun 14, 2020 at 6:23 AM Benjamin Herrenschmidt
<be...@kernel.crashing.org> wrote:
>
> Hi folks !
>
> I would like to revive the discussion and hopefully get to a consensus
> about how CSRs are mapped/used in Linux.
>
> I've looked at what's in the current Linux ports for LiteX here:
>
> https://github.com/litex-hub/linux/commits/litex-vexriscv-rebase

Just to let you know, there are currently many branches on this repo.
They eventually should merge into a single one, but ATM each project
seems to use its own branch - e.g. linux-on-litex-vexriscv points to
https://github.com/litex-hub/linux/commits/litex-vexriscv--linux5.0_sd_card.

They all, however, are some version of linux with applied patches from
linux-on-litex-vexriscv
plus some additional tweaks, so they do not differ *that* much.

>
> And I've noticed a few things, please correct my understanding if I got
> something wrong :-)
>
> - The CSR accessors litex_{get,set}_reg are in include/linux/litex.h
> added by "LiteX: add SoC controller". However, there are drivers
> *before* that one in the tree that use them (ie. non-bisectable
> series). Some like UART or LiteEth don't even use the accessors.

That's because the accessors were added later (when developing the SoC
controller, if I remember correctly)
and some drivers were back-ported to use them. The ordering of commits
on the litex-vexriscv-rebase
repo corresponds to the chronological order in which drivers were
developed, hence the inconsistency.

As Gabriel Somlo mentioned, the accessors are included in the first
patchset to the mainline kernel, together with
the UART driver (that is slightly reworked to use those accessors), to
avoid the ordering problems.

> - LiteEth is broken in that it mixes up the LiteEth MMIO region with
> its CSRs and so pokes at the wrong register to reset the PHY.

> - Generally speaking, the CSRs are directly accessed assuming a 32-bit
> wide direct memory map. More specifically what I mean here is that each
> individual device will find its CSRs as one of its memory resources in
> the device-tree and poke at it via the accessors. While I (very much)
> like this (:-) it goes against the will of some to support the same
> driver on top of different busses such as USB etc... Do we really care
> ? If yes, see below.
>
> - I don't understand the bit manupulation/size business done byte
> litex_{get,set}_reg. Care to explain ? It feels ... wrong. But I might
> be missing something. Generally speaking, the two main reasons for
> having the accessors at all that I can see are 1) native endian and 2)
> those bit manipulations. Is there a reason why we couldn't get rid of
> both and just have the drivers directly access the CSRs using standard
> accessors in a fixed (LE ?) mode ?

As already covered, this is caused by two factors:
(a) the LiteX CSRs are in CPU native endianness and eliminating this
requires HW/LiteX changes [that's currently under discussion],
(b) the fact we started with the support for 8-bit CSR data width that
requires the scatter/gather logic.

Having a 32-bit CSR data width and a fixed endianness would make
accessing LiteX CSRs the same
as "normal" memory-mapped registers on a 32-bit CPU, I guess.

How about 64-bit CPU (e.g. Rocket) - what CSR data width it has by
default / we plan to support in Linux?
Having assessors could be handy if we plan to support both 32/64-bit
CSR data widths (as we would still need the scatter/gather logic).

> Otherwise, I would prefer mimmicing
> what regmap-mmio.c does and just have an ifdef LITTLE_ENDIAN/BIG_ENDIAN
> does the appropriate "be" vs "le" accessor rather than re-invent it
> with hand-spinkled barriers.

In fact using ifdefs and ioread32le/be was our initial approach, but it wasn't
warmly welcomed on LKML. We were said that:

"Almost no kernel code should EVER be testing __LITTLE_ENDIAN, don't add
to it as it is not a good idea."

and that was why we started re-implementing the accessors.

> Now some more general questions...
>
> - First a bit of bike shedding ... are we wedded to the names litex_{
> get,set}_reg ? These are specific to *CSRs* whihc aren't all LiteX
> registers (Eth uses its own "registers" that aren't CSRs, it's my
> understanding that SD will soon as well etc...). Should we be more
> explicit and call them litex_{get,set}_csr ?
>
> - Since CSRs aren't supposed to be a hot path, should we bit the
> bullet then and just use regmap ? It will cost us one or two
> indirections per access, but it will handle going through different
> transport busses if ever needed, handles native endian, handle
> selecting fields of a register etc.. etc... I am not *super familiar*
> with it but I'm going to have a closer look.
>
> - If we are going to stick with direct MMIO, any reason not define
> once and for all a fixed endian for LiteX devices (hint: LE) and let
> drivers directly use readl/writel ? If we still want to deal with the
> special case of "larger-than-32-bit" CSRs we could have accessors
> specifically for that but generally I thought Florent wanted to get rid
> of them (or did I misunderstand?)
>
> Finally as soon as we decide, I'll try to get Linux going in Microwatt
> and fixup liteeth to do MDIO etc...
>
> Cheers,
> Ben.
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-litex/b7cdbaaa26610e0a99921a78a08bf6d0c8731952.camel%40kernel.crashing.org.

Best,
Mateusz

--
Mateusz Holenko
Antmicro Ltd | www.antmicro.com
Roosevelta 22, 60-829 Poznan, Poland

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 4:26:19 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
If you want to share gateware between the widest range of firmware, 8bit CSRs is the correct current option.

Both emulation in QEMU (RISC-V, or1k and lm32) and Renode upstream only support 8-bit CSR widths at the moment.

This is also the mode that both Zephyr upstream and NuttX upstream use.

Linux only supporting 32bit width CSRs will require all the above to be adapted too.

Having a common peripheral bus shared between a core running Linux and a core running Zephyr (which is a known configuration we want to support) would also become impossible.

There seems to be a lot of downsides to dropping 8bit CSR support in Linux for what seems like a minor decrease in complexity?

Tim 'mithro' Ansell



Benjamin Herrenschmidt

unread,
Jun 15, 2020, 4:38:25 AM6/15/20
to linux...@googlegroups.com, Joel Stanley, Stafford Horne, Mateusz Holenko
On Mon, 2020-06-15 at 09:17 +0200, Florent Kermarrec wrote:
>
> For the endianness, benh, can you describe the advantages of a fixed
> endianness for the Linux drivers vs using native endianness of the
> CPU? The changes in the gateware are easy, but i would like to
> understand since it will also impact other software like the BIOS and
> second stage bootloaders support. We should also probably include in
> the discussion all MMAP accesses to peripherals (so both CSR and
> Wishbone/AXI accesses).

I'll start with the end .. yes it includes all mmap accesses :-)

There are three main reasons off the top of my head (it's something the
Linux community has been nagging HW vendors about).

1- The least important one... no need to create funky accessors in
Linux. Linux does not have "native endian" accessors somewhat on
purpose, so if you registers are thus, you have to jump through hoops
such as the folks have been doing in the current Linux ports. It's
messy and could be hard to get past upstream maintainers.

2- Tim mentioned one might want to have LiteX devices on a pluggable
device such as a PCIe card. In which case what does "native endian"
mean ? Worse, a given system might have built-in LiteX devices of one
endian and plugged-in devices of another... You *really* don't want
drivers to do runtime endian checks on register accesses. Been there,
done that (the USB OHCI/EHCI drivers in Linux do that bcs some SoC
vendors wanted to be too smart :-)

3- The concept of "native endian" might not even be particularly
meaningful. For example some CPUs can operate in both endians just
fine, your "native endian" becomes the endian in which you compiled
your OS. You can run a POWER system with a BE or an LE OS ... what does
"native endian" means for devices in that case ? I think ARM can do the
same, not sure about RiscV

That said, I don't personally care *that much*, as in, none of this is
a problem for Microwatt which is LE anyways :-) But these things might
get in the way of a quick upstreaming.

Cheers,
Ben.


Benjamin Herrenschmidt

unread,
Jun 15, 2020, 4:46:16 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 2020-06-15 at 10:13 +0200, Mateusz Holenko wrote:
> Hi Benjamin,
>
> On Sun, Jun 14, 2020 at 6:23 AM Benjamin Herrenschmidt
> <be...@kernel.crashing.org> wrote:
> >
> > Hi folks !
> >
> > I would like to revive the discussion and hopefully get to a consensus
> > about how CSRs are mapped/used in Linux.
> >
> > I've looked at what's in the current Linux ports for LiteX here:
> >
> > https://github.com/litex-hub/linux/commits/litex-vexriscv-rebase
>
> Just to let you know, there are currently many branches on this repo.
> They eventually should merge into a single one, but ATM each project
> seems to use its own branch - e.g. linux-on-litex-vexriscv points to
> https://github.com/litex-hub/linux/commits/litex-vexriscv--linux5.0_sd_card.
>
> They all, however, are some version of linux with applied patches from
> linux-on-litex-vexriscv
> plus some additional tweaks, so they do not differ *that* much.

Yes, I've noticed :-) I've made my own locally for microwatt but I'm
hoping we'll converge as result of this discussion. I'm happy to
volunteer to maintain a converged branch if nobody else wants to do it,
but my spare time might become fairly limited.

> > And I've noticed a few things, please correct my understanding if I got
> > something wrong :-)
> >
> > - The CSR accessors litex_{get,set}_reg are in include/linux/litex.h
> > added by "LiteX: add SoC controller". However, there are drivers
> > *before* that one in the tree that use them (ie. non-bisectable
> > series). Some like UART or LiteEth don't even use the accessors.
>
> That's because the accessors were added later (when developing the SoC
> controller, if I remember correctly)
> and some drivers were back-ported to use them. The ordering of commits
> on the litex-vexriscv-rebase
> repo corresponds to the chronological order in which drivers were
> developed, hence the inconsistency.
>
> As Gabriel Somlo mentioned, the accessors are included in the first
> patchset to the mainline kernel, together with
> the UART driver (that is slightly reworked to use those accessors), to
> avoid the ordering problems.

Right. We should maintain a converged branch that mirrors what we
intend to upstream. What happened to the first patch set ? I remember
commenting on lkml a while back but never heard back..

Should we sort out the questions at hand first and then restart the
process ?

I have a long experience with upstream having been an arch maintainer
myself and can definitely help get things into a potentially easier-to-
accept shape, and help argue with the relevant maintainers.

.../...

> How about 64-bit CPU (e.g. Rocket) - what CSR data width it has by
> default / we plan to support in Linux?

32-bit. Let's stick to that for now, there is no fundamental need for
something bigger at this stage and we have a perfectly functional
downconverter now.

> Having assessors could be handy if we plan to support both 32/64-bit
> CSR data widths (as we would still need the scatter/gather logic).

I'd rather we didn't. We had this discussion on a github issue earlier
and we couldn't find good reason why we would imperatively need atomic
access of 64-bit quantities in the foreseable future.

> > Otherwise, I would prefer mimmicing
> > what regmap-mmio.c does and just have an ifdef LITTLE_ENDIAN/BIG_ENDIAN
> > does the appropriate "be" vs "le" accessor rather than re-invent it
> > with hand-spinkled barriers.
>
> In fact using ifdefs and ioread32le/be was our initial approach, but it wasn't
> warmly welcomed on LKML. We were said that:
>
> "Almost no kernel code should EVER be testing __LITTLE_ENDIAN, don't add
> to it as it is not a good idea."

Yes I saw it, it was Greg right ?

> and that was why we started re-implementing the accessors.

I know, I wish I had been in that discussion in the first place because
the end result is worse imho, and I might have been able to convince
Greg of that.

In fact testing __LITTLE_ENDIAN is exactly what regmap (upstream) does
today.

That said, see the other conversation, that problem would go away if we
just fixed the CSR endianness once and for all (there are good reasons
to do that anyways).

I'd rather we got to a point where no accessors are needed at all.

The current litex_accessor_ok() for example completely breaks if the
LiteX IP blocks are used outside of a LiteX SoC since there might not
be scratch registers to play with or there might be several sets of
them in different places :-)

For example "standalone" Microwatt uses litedram and liteeth (if you
pick my branch that adds it) generated separately using standalone
generators, they thus have independent CSR busses.

Cheers,
Ben.


Benjamin Herrenschmidt

unread,
Jun 15, 2020, 4:52:50 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 2020-06-15 at 01:26 -0700, Tim 'mithro' Ansell wrote:
> If you want to share gateware between the widest range of firmware,
> 8bit CSRs is the correct current option.

We are talking about Linux-supported setups where the CSRs are exposed
to the kernel. This is no reason we can think of (I think Florent and I
agree here) where 8-bit CSRs would have any value.

> Both emulation in QEMU (RISC-V, or1k and lm32) and Renode upstream
> only support 8-bit CSR widths at the moment.

That's easily fixed.

> This is also the mode that both Zephyr upstream and NuttX upstream
> use.

Is there a LiteX driver upstream already ? Can you point me to it ?

> Linux only supporting 32bit width CSRs will require all the above to
> be adapted too.
>
> Having a common peripheral bus shared between a core running Linux
> and a core running Zephyr (which is a known configuration we want to
> support) would also become impossible.

Allright, I have no idea what Zephyr is, so I'll need to do some
digging, but keep in mind that supporting multiple CSR sizes forces the
Linux drivers to go from trivial direct-mmio "lean" stuff that's well
understood and easy to upstream, to a big mess of common accessors
doing horrid things and hard to upstream, so you have to be very very
certain of the benefit here.

> There seems to be a lot of downsides to dropping 8bit CSR support in
> Linux for what seems like a minor decrease in complexity?

So far I haven't seen any but I missed the model you talk about above.

The key thing here is that I would like to completely remove the
concerpt of accessors and make LiteX devices just ordinary devices
accessed via "normal" MMIO registers from their drivers.

Anything else will requires layers of "stuff"... see my other email.

I proposed in another email a kind of "in between" solution involving
inline accessors that would support direct access vs. regmap, we could
then throw all the "complicated cases" under regmap, at the cost of a
test per access.

Cheers,
Ben.

Mateusz Holenko

unread,
Jun 15, 2020, 5:41:39 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, Jun 15, 2020 at 10:46 AM Benjamin Herrenschmidt
Both Gabriel Somlo and I responded to your comments on LKML
with some explanations, but perhaps it got lost in tons of other mails.

The patchset is still "maintained" - we address comments and resubmit
it to LKML. Currently there is V7:
https://lore.kernel.org/patchwork/patch/1252358/

This, however, targets the 32-bit CPU/8-bit CSR data width configuration.

> Should we sort out the questions at hand first and then restart the
> process ?

It would be great if we come to the point where everyone
is happy with the shape of the overall solution. Once it's done we can
refactor the patchset and resubmit it again.

> I have a long experience with upstream having been an arch maintainer
> myself and can definitely help get things into a potentially easier-to-
> accept shape, and help argue with the relevant maintainers.
>
> .../...
>
> > How about 64-bit CPU (e.g. Rocket) - what CSR data width it has by
> > default / we plan to support in Linux?
>
> 32-bit. Let's stick to that for now, there is no fundamental need for
> something bigger at this stage and we have a perfectly functional
> downconverter now.
>
> > Having assessors could be handy if we plan to support both 32/64-bit
> > CSR data widths (as we would still need the scatter/gather logic).
>
> I'd rather we didn't. We had this discussion on a github issue earlier
> and we couldn't find good reason why we would imperatively need atomic
> access of 64-bit quantities in the foreseable future.

That's great. It should allow us to simplify things.

> > > Otherwise, I would prefer mimmicing
> > > what regmap-mmio.c does and just have an ifdef LITTLE_ENDIAN/BIG_ENDIAN
> > > does the appropriate "be" vs "le" accessor rather than re-invent it
> > > with hand-spinkled barriers.
> >
> > In fact using ifdefs and ioread32le/be was our initial approach, but it wasn't
> > warmly welcomed on LKML. We were said that:
> >
> > "Almost no kernel code should EVER be testing __LITTLE_ENDIAN, don't add
> > to it as it is not a good idea."
>
> Yes I saw it, it was Greg right ?

Correct.

> > and that was why we started re-implementing the accessors.
>
> I know, I wish I had been in that discussion in the first place because
> the end result is worse imho, and I might have been able to convince
> Greg of that.
>
> In fact testing __LITTLE_ENDIAN is exactly what regmap (upstream) does
> today.
>
> That said, see the other conversation, that problem would go away if we
> just fixed the CSR endianness once and for all (there are good reasons
> to do that anyways).
>
> I'd rather we got to a point where no accessors are needed at all.
>
> The current litex_accessor_ok() for example completely breaks if the
> LiteX IP blocks are used outside of a LiteX SoC since there might not
> be scratch registers to play with or there might be several sets of
> them in different places :-)

The point of testing litex_accessor_ok() and returning the -EPROBE_DEFER
in the driver is just to make debugging easier, i.e., to report an
error on *the first*
access to LiteX CSR when testing the scratch register.

This is, however, not necessary for the LiteX driver to call it in
order to function properly
(assuming the LiteX design is fine).

> For example "standalone" Microwatt uses litedram and liteeth (if you
> pick my branch that adds it) generated separately using standalone
> generators, they thus have independent CSR busses.

I guess this is a scenario we didn't pay much attention to ;) In this context
the call to litex_accessors_ok() is indeed problematic.

Since, as mentioned above, litex_accesors_ok() serves a role of a debug helper,
we might consider removing it to make drivers more versatile and useful in
"out-of-LiteX" scenarios.

> Cheers,
> Ben.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-litex/edb262d6956bb1bba10706d539c923804719b4c5.camel%40kernel.crashing.org.

Benjamin Herrenschmidt

unread,
Jun 15, 2020, 5:54:42 AM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 2020-06-15 at 11:41 +0200, Mateusz Holenko wrote:
> > Right. We should maintain a converged branch that mirrors what we
> > intend to upstream. What happened to the first patch set ? I remember
> > commenting on lkml a while back but never heard back..
>
> Both Gabriel Somlo and I responded to your comments on LKML
> with some explanations, but perhaps it got lost in tons of other mails.

Ah I did miss your replies, sorry about that. I'll try to dig them out,
my apologies.

> The patchset is still "maintained" - we address comments and resubmit
> it to LKML. Currently there is V7:
> https://lore.kernel.org/patchwork/patch/1252358/
>
> This, however, targets the 32-bit CPU/8-bit CSR data width configuration.

Ok.

The biggest issue with the patch set as-is or whatever was tweaked in
the various git repos is that if we are going to support 8-bit CSRs
*and* 32-bit CSRs, this should probably not be a #define or CONFIG
option but a runtime thing picked from the device-tree as LiteX devices
can exist outside of a "LiteX SoC" and heterogneous configurations can
exist if LiteX devices are put on PCIe adapters for example.

Again it's a *if* we decide we stay down that path, which Tim is very
keen on.

In that case, I would advocate for having a device-node representing
the CSR bridge (sort-of your "SoC controller" but a CSR bridge would be
more useful in the context of non-SoC uses) which contains the
appopriate properties, and a phandle in the DT pointing to it from the
individual devices.

Then, we could provide a litex_map_csr_bank(...) helper that devices
use, which would return an opaque object that can be passed back to the
accessors. That object could contain for example an internal pointer
to the CSR bridge with the necessary shift/masks or even a regmap point
er if we ever decide to support LiteX devices over USB,SPI, SDIO,
etc...

I can whip up something if people prefer that approach.

> > Should we sort out the questions at hand first and then restart the
> > process ?
>
> It would be great if we come to the point where everyone
> is happy with the shape of the overall solution. Once it's done we can
> refactor the patchset and resubmit it again.

Yup. The actual coding is the easy part :)

> > I have a long experience with upstream having been an arch maintainer
> > myself and can definitely help get things into a potentially easier-to-
> > accept shape, and help argue with the relevant maintainers.
> >
> > .../...
> >
> > > How about 64-bit CPU (e.g. Rocket) - what CSR data width it has by
> > > default / we plan to support in Linux?
> >
> > 32-bit. Let's stick to that for now, there is no fundamental need for
> > something bigger at this stage and we have a perfectly functional
> > downconverter now.
> >
> > > Having assessors could be handy if we plan to support both 32/64-bit
> > > CSR data widths (as we would still need the scatter/gather logic).
> >
> > I'd rather we didn't. We had this discussion on a github issue earlier
> > and we couldn't find good reason why we would imperatively need atomic
> > access of 64-bit quantities in the foreseable future.
>
> That's great. It should allow us to simplify things.

Yup.
So with the approach I proposed above, we could keep that test as part
of the "CSR bridge" implementation and have the test be folded inside
the litex_map_csr_bank() helper.

Gabriel L. Somlo

unread,
Jun 15, 2020, 10:59:11 AM6/15/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Florent Kermarrec, Stafford Horne, Joel Stanley
My $0.02:


Re. Linux drivers: I think the idea of a rebasing, clean,
"for-upstream" repo containing a current representation of what
we think is the "Right Thing (TM)" to work against at any given time
is a great one. Also, bonus points if @benh is offering to curate,
given his inside knowledge into what's likely to be most palatable
to (and encounter least resistance from) upstream.


Re. CSR "architecture", my utopian end result would be
registers I can directly access via load/store operations
(straightforward MMIO), in a predictable (LE) endianness.

Also, let's not lose track of large (> 64 bit) registers, registers
that aren't a multiple of 4 or 8 bytes in size. Maybe having CSRs
that are known to represent numbers (and therefore are laid out in
LE order) vs. CSRs that represent byte strings (which can be memcpy'd)
would be worth a bit of additional gateware hassle?

I'm ok with 32-bits being the "atomic" transfer width. However, I'd
also also like to see 64-bit-and-larger CSRs aligned on a 64-bit
address boundary, so a 64-bit access can be made by the CPU and
transparently handled by the 64-to-32 down converter.


Any additional features (and complexity) is optional as far as I'm
concerned -- I'm down with whatever as long as there's a well defined
interface drivers can use that covers all the cases: up-to-64-bit LE
numeric registers, arbitrary-length byte-string (memcpy) registers,
and whatever I've missed because *I* personally haven't hit my head
against it yet... :)

(in the current accessor set -- common.h in the litex bios, litex.h in
the litex-rocket-rebase kernel branch -- I also had arrays of uint16
and uint32 embedded inside larger -- 64 or 128 bit -- registers, but
I suppose if each driver rolls its own for *that* corner case it
should be acceptable, and rare).

Thanks,
--Gabriel

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 12:12:21 PM6/15/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Florent Kermarrec, Stafford Horne, Joel Stanley
Hi all,

A case I think a lot about is the NeTV2 board which can be configured in multiple different ways;
 (a) As a desktop computer expansion card connected via PCIe.
 (b) As a laptop computer expansion board connected via USB3.
 (c) As a RPi hat connected via SPI bus.
 (d) As a stand alone computer running Linux.

In *all* these cases if you want video output of the NeTV2 you need to do things like read the EDID using bitbanging CSR registers, set up the MMCM clocking configuration using CSR registers, etc.

In (a) and (d) the CSR registers are directly memory mapped.
In (b) and (c) the CSR registers can only be accessed via another bus.

The "holy grail" would be for *one* driver to be able support all these different set ups.

Ben has also pointed out that a system could have multiple LiteX CSR busses in them. You could even potentially have a NeTV2 card which is in mode (d) connected to an NeTV2 card in mode (a)!

It sounds like we might be converging on the following;
 * LiteX CSR busses should appear as a device tree node with the CSR configuration.
 * For 32bit CSR width, the CSR accesses should devolve to be just MMIO read/writes.
 * For all other CSR types, using something like regmap might work.

We can then have a Linux configuration option for LiteX CSR bus like "LITEX_CSR_SUPPORT_REGMAP_ACCESS_MODE" which enables more than just 32bit MMIO access modes? Ben -- does that make sense?

Thoughts?

Tim 'mithro' Ansell


--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Florent Kermarrec

unread,
Jun 15, 2020, 6:37:50 PM6/15/20
to Tim 'mithro' Ansell, linux...@googlegroups.com, Benjamin Herrenschmidt, Stafford Horne, Joel Stanley
In fact we should probably try to allow MMIO in all cases, for this, we only have to:
- 1) make sure a CSR don't have gap in the address space (which is now causing the troubles we have with 8-bit CSR data-width)
- 2) make sure CSR simples ordering will have the right endianness (ie a 128-bit register would be accessed the same way a MMAP memory would be).

This would still allow supporting 8-bit CSR data-width and any bridge combination.

The changes for this would be:
- to connect the 8-bit CSR bus from a DownConverted 8-bit Wishbone bus, so for a 32-bit register, instead of being spanned on address x to x+0xc (with only 8-bit populated), it would be only on address x.
- verify the endianness (ordering) of CSR simples and eventually change the order.
- use a CSR alignment that is at least the CSR data width (to avoid two registers to be on the same CSR bus access).

With this, the CPU would no longer need to have a notion of CSR data-width and would just access the CSR bus as any other MMAP peripheral.

This will break compatibility (addressing for 8-bit CSR data width and maybe order of CSR simples) but it could make our life a lot easier in the future (and avoid such discussion :)), what do you think?

Florent

Benjamin Herrenschmidt

unread,
Jun 15, 2020, 6:55:09 PM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 2020-06-15 at 10:59 -0400, Gabriel L. Somlo wrote:
> My $0.02:
>
>
> Re. Linux drivers: I think the idea of a rebasing, clean,
> "for-upstream" repo containing a current representation of what
> we think is the "Right Thing (TM)" to work against at any given time
> is a great one. Also, bonus points if @benh is offering to curate,
> given his inside knowledge into what's likely to be most palatable
> to (and encounter least resistance from) upstream.

Let's see, I don't want to walk on the toes of others. The Antmicro
folks seem to be maintaining the patch series for upstream at the
moment, so if they want to continue owning that process I won't get in
the way :-)

> Re. CSR "architecture", my utopian end result would be
> registers I can directly access via load/store operations
> (straightforward MMIO), in a predictable (LE) endianness.

I would love for CSRs to just be ordinary registers that a driver
directly accesses with readl/writel indeed :-)

That said, I don't want to be the one to dismiss the use cases
described by Tim that require a more complex mechanism to cope with
8-bit "transport" or even non-MMIO transports such as USB etc..

> Also, let's not lose track of large (> 64 bit) registers, registers
> that aren't a multiple of 4 or 8 bytes in size.

Those will never be accessed atomically, I think we agree on this. So
in this regard, they are effectively just multiple CRSs containing
"related" values. We have an open github issue on this, I think you and
I agree that the current layout of these things is sub-optimal as they
don't end up looking like a "byte stream" to the OS and they should.

> Maybe having CSRs
> that are known to represent numbers (and therefore are laid out in
> LE order) vs. CSRs that represent byte strings (which can be
> memcpy'd) would be worth a bit of additional gateware hassle?

It's not a significant hassle at all HW side I believe (am I wrong here
?) But yes, such CSRs are like FIFOs. IE, They convey byte streams
rather than numbers, thus they need to be layout in what is
traditionally called "byte address invariant" mode, where the address
of indivudal bytes is what matters.

The typical problem with endianness is that people (esp CPU designers
:-) tend to confuse the transport with the data that is transported.

Ideally a bus shouldn't have an endianness. It should have a "byte
order" which defines which bytes represent the first address and which
byte represents the last address in ascending order (ie, which bit
lanes since bits don't have to be numbered in a descending way... see
old school IBM stuff who numbers bits "backwards").

Endianness per-se is a property of a "value" or more specifically a
larger-than-a-byte "values" (this in itself is a simplification of the
general concept that assumes that a byte is an octet, ie 8 bits, but
let's not go down that specific rabbit hole :-)

A byte stream (FIFO or sequence of registers holding a complex piece of
data) do not have an endian per se. Fields inside that byte stream
might but it's completely unrelated to how the stream is transported.

So for things like CSRs, in essence, what matters is whether they
contain a "vlaue" in which case they do have endianness and need to
present the parts of that value onto the appropriate byte lanes for a
given endian, or they represent a higher level structure (or byte
stream which may have no structure at all) in which case they should
represent the bytes in ascending address order, which ensures that
memcpy work.

You may notice that both a are fundamentally equivalent: One should be
able to memcpy's a whole bunch of CSRs and contain an identical image
in memory that can be "interpreted" with the appropriate endian
accessors.

> I'm ok with 32-bits being the "atomic" transfer width. However, I'd
> also also like to see 64-bit-and-larger CSRs aligned on a 64-bit
> address boundary, so a 64-bit access can be made by the CPU and
> transparently handled by the 64-to-32 down converter.

Fair enough, that's just a matter layout. We probably want to "fix" CSR
addresses anyways (another discussion and another github issue), so
that could be part of the "rules" and CSRStorage could even enforce it.

As far as I'm concerned, a detail :-)

> Any additional features (and complexity) is optional as far as I'm
> concerned -- I'm down with whatever as long as there's a well defined
> interface drivers can use that covers all the cases: up-to-64-bit LE
> numeric registers, arbitrary-length byte-string (memcpy) registers,
> and whatever I've missed because *I* personally haven't hit my head
> against it yet... :)

Right, so in Linux terms: readl/writel/readq/writeq for the former,
memcpy_to/from_io for the latter. There's also the FIFO case which is a
variant of the "byte stream" case, for which there is the "s" accessors
(readsl etc..). This is the standard driver toolbox.

> (in the current accessor set -- common.h in the litex bios, litex.h
> in
> the litex-rocket-rebase kernel branch -- I also had arrays of uint16
> and uint32 embedded inside larger -- 64 or 128 bit -- registers, but
> I suppose if each driver rolls its own for *that* corner case it
> should be acceptable, and rare).

This can be handled locally in drivers.

Cheers,
Ben.

> Thanks,
> --Gabriel
>

Benjamin Herrenschmidt

unread,
Jun 15, 2020, 6:56:24 PM6/15/20
to Tim 'mithro' Ansell, linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 2020-06-15 at 09:12 -0700, Tim 'mithro' Ansell wrote:
> We can then have a Linux configuration option for LiteX CSR bus like
> "LITEX_CSR_SUPPORT_REGMAP_ACCESS_MODE" which enables more than just
> 32bit MMIO access modes? Ben -- does that make sense?

I wouldn't even make it a config option.

I put some pseudo-code in of my emails, let me know what you think.

Cheers,
Ben.


Benjamin Herrenschmidt

unread,
Jun 15, 2020, 7:02:02 PM6/15/20
to Florent Kermarrec, Tim 'mithro' Ansell, linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 00:37 +0200, Florent Kermarrec wrote:
> In fact we should probably try to allow MMIO in all cases, for this,
> we only have to:
> - 1) make sure a CSR don't have gap in the address space (which is
> now causing the troubles we have with 8-bit CSR data-width)

This goes against Tim's stated goal of having 8-bit CSRs be a "thing"
for really small setups that doens't want to pay the cost of a real bus
down-converter and just wire the low 8-bits of the bus :-)

That said, it would definitely not mind myself :)

> - 2) make sure CSR simples ordering will have the right endianness
> (ie a 128-bit register would be accessed the same way a MMAP memory
> would be).

Yup.

> This would still allow supporting 8-bit CSR data-width and any bridge
> combination.
>
> The changes for this would be:
> - to connect the 8-bit CSR bus from a DownConverted 8-bit Wishbone
> bus, so for a 32-bit register, instead of being spanned on address x
> to x+0xc (with only 8-bit populated), it would be only on address x.
> - verify the endianness (ordering) of CSR simples and eventually
> change the order.
> - use a CSR alignment that is at least the CSR data width (to avoid
> two registers to be on the same CSR bus access).
>
> With this, the CPU would no longer need to have a notion of CSR data-
> width and would just access the CSR bus as any other MMAP peripheral.

Sounds good.

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 7:29:22 PM6/15/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Stafford Horne, Joel Stanley
On Mon, 15 Jun 2020 at 15:37, Florent Kermarrec <f.ker...@gmail.com> wrote:
This will break compatibility (addressing for 8-bit CSR data width and maybe order of CSR simples) but it could make our life a lot easier in the future (and avoid such discussion :)), what do you think?

The 8bit CSR on 32bit mode is already supported in multiple external projects, so it would be bad to break all that existing software.

I would suggest any break in compatibility should be done with the gradual creation of a new "CSR2" bus or similar approach (it might be Python API compatible but could be CPU interface incompatible)?

Tim 'mithro' Ansell

 

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 7:40:01 PM6/15/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Mon, 15 Jun 2020 at 01:52, Benjamin Herrenschmidt <be...@kernel.crashing.org> wrote:
On Mon, 2020-06-15 at 01:26 -0700, Tim 'mithro' Ansell wrote:
> If you want to share gateware between the widest range of firmware,
> 8bit CSRs is the correct current option.

We are talking about Linux-supported setups where the CSRs are exposed
to the kernel. This is no reason we can think of (I think Florent and I
agree here) where 8-bit CSRs would have any value.

> Both emulation in QEMU (RISC-V, or1k and lm32) and Renode upstream
> only support 8-bit CSR widths at the moment.

That's easily fixed.

> This is also the mode that both Zephyr upstream and NuttX upstream
> use.

Is there a LiteX driver upstream already ? Can you point me to it ?

 
> Linux only supporting 32bit width CSRs will require all the above to
> be adapted too.
>
> Having a common peripheral bus shared between a core running Linux
> and a core running Zephyr (which is a known configuration we want to
> support) would also become impossible.

Allright, I have no idea what Zephyr is,

Zephyr is an RTOS that can be described as "I wanted Linux but only had 64kbytes of RAM".

Tim 'mithro' Ansell

Florent Kermarrec

unread,
Jun 15, 2020, 7:41:53 PM6/15/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Stafford Horne, Joel Stanley
But since this new CSR2 bus would be the one used by Linux and would not support 8-bit (since no longer recommended/useful), why not just keep the current CSR, keep 8-bit support for Baremetal/existing software, but just use the 32-bit data-width for Linux SoCs? This would meen Linux drivers could use full MMIO and we would just have to rework the existing Linux software support, which should be doable and less painful than having to use/support accessors.
Florent

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 7:50:01 PM6/15/20
to linux...@googlegroups.com, Joel Stanley, Florent Kermarrec, Stafford Horne, Mateusz Holenko
On Sun, 14 Jun 2020 at 20:28, Benjamin Herrenschmidt <be...@kernel.crashing.org> wrote:
For 2) well... not supporting means the simple choice, direct MMIO from devices.

If we go the regmap route instead, which would support all sorts of crazy setups,
then we need to add plumbing to create the regmaps etc.. it complicates things and
it adds I think at least 2 indirect calls on the path to every IO.

We could try to add a "layer" than can do either inlined ... ie, create some kind
of (pseudo-code). [Note: I don't actually *like* this, but it's an option...]

  struct litex_csr_region {
          enum { LITEX_IO_DIRECT, LITEX_IO_REGMAP } model;
          union {
                  void __iomem  *direct_addr;
                  struct regmap *regmap;
          } arg;
  };

And have some helpers to set that up form the DT, which could initially be simple
and only support the direct model, and acessors do something like:

  static inline void litex_write_csr(u32 val, struct litex_csr_region *r,
                                     unsigned int offset)
  {
          if (r->model == LITEX_IO_DIRECT)
                  writel(val, r->arg.direct_addr + offset);
          else
                  regmap_stuff(.....);
  }

And a helper drivers can use to create such region objects from the DT.

How do other drivers handle things like this?

This was what I was thinking when I said a configuration option;
-----

#ifdef LITEX_SUPPORT_REGMAP
  struct litex_csr_region {
          enum { LITEX_IO_DIRECT, LITEX_IO_REGMAP } model;
          union {
                  void __iomem  *direct_addr;
                  struct regmap *regmap;
          } arg;
  };
#else
  struct litex_csr_region {
                  void __iomem  *direct_addr;
  };
#endif

----

  static inline void litex_write_csr(u32 val, struct litex_csr_region *r,
                                     unsigned int offset)
  {
#ifdef LITEX_SUPPORT_REGMAP
          if (r->model == LITEX_IO_REGMAP) {
                  regmap_stuff(.....);
                  return;
          }
#endif
           writel(val, r->arg.direct_addr + offset);
  }

-----
 
But that might be a bad idea.....

Tim 'mithro' Ansell

Tim 'mithro' Ansell

unread,
Jun 15, 2020, 8:03:08 PM6/15/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Stafford Horne, Joel Stanley
On Mon, 15 Jun 2020 at 16:41, Florent Kermarrec <f.ker...@gmail.com> wrote:
But since this new CSR2 bus would be the one used by Linux and would not support 8-bit (since no longer recommended/useful), why not just keep the current CSR, keep 8-bit support for Baremetal/existing software, but just use the 32-bit data-width for Linux SoCs? This would meen Linux drivers could use full MMIO and we would just have to rework the existing Linux software support, which should be doable and less painful than having to use/support accessors.

If we want Linux + Zephyr to share peripherals in an Asymmetric MultiProcessing setup, this might be problematic? The solution might to be to expand the Zephyr support (as it is easier to get things merged into Zephyr than Linux kernel).

See the following for more information about how AMP systems work;

Thoughts?

Tim 'mithro' Ansell

Benjamin Herrenschmidt

unread,
Jun 15, 2020, 8:13:48 PM6/15/20
to Florent Kermarrec, linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 01:41 +0200, Florent Kermarrec wrote:
> But since this new CSR2 bus would be the one used by Linux and would
> not support 8-bit (since no longer recommended/useful), why not just
> keep the current CSR, keep 8-bit support for Baremetal/existing
> software, but just use the 32-bit data-width for Linux SoCs? This
> would meen Linux drivers could use full MMIO and we would just have
> to rework the existing Linux software support, which should be doable
> and less painful than having to use/support accessors.

It would all work out if we come up with accessors that default to
an MMIO fast path but do have a fallback to something like regmap to
handle all the crazy configurations.

It's doable, might be a bit harder to get upstream because Greg might
just ask "why not just regmap always", to which I would respond
something along the lines of "it's much slower on some setups..." (with
appropriate details) but you get the gist of it.

I'll try to find time this week to whip up something based on the
existing patch series and see where that goes, with the assumption that
the "fast path" case have CSRs always LE.

Note: Florent we still need to sort out the layout of "larger than 32-
bit" CSRs, see the other thread of discussion with Gabriel.

Cheers,
Ben.

Florent Kermarrec

unread,
Jun 15, 2020, 8:45:56 PM6/15/20
to Benjamin Herrenschmidt, linux...@googlegroups.com, Stafford Horne, Joel Stanley
For the AMP system with Linux and Zephyr, we'll probably want 32-bit CSR for performance for Linux, so would need to add support for it in Zephyr for mixed systems. Also the current 8-bit CSR bus has not been designed for interoperability of different systems and will cause issues because of the gap in the address map. Since this was not designed to do so, i'm not sure it's good to encourage interoperability on this and it's probably better to add a requirement  that for creating a such system 8-bit CSR bus is not supported (since it's not possible to downconvert it correctly from the main bus) than allowing every configuration and having to maintain it.
Florent

Benjamin Herrenschmidt

unread,
Jun 15, 2020, 8:48:42 PM6/15/20
to Florent Kermarrec, linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 02:45 +0200, Florent Kermarrec wrote:
> For the AMP system with Linux and Zephyr, we'll probably want 32-bit
> CSR for performance for Linux, so would need to add support for it in
> Zephyr for mixed systems. Also the current 8-bit CSR bus has not been
> designed for interoperability of different systems and will cause
> issues because of the gap in the address map. Since this was not
> designed to do so, i'm not sure it's good to encourage
> interoperability on this and it's probably better to add a
> requirement that for creating a such system 8-bit CSR bus is not
> supported (since it's not possible to downconvert it correctly from
> the main bus) than allowing every configuration and having to
> maintain it.

Ok. The main reason for regmap in my mind wasn't so much the 8-bit
stuff, but Tim's idea of having LiteX devices behind non-memory busses
such as USB or SDIO..

The problem with just using regmap always, is that it's cumbersome and
slow as afaik it adds at least 2 function pointer calls per MMIO at
least.

Cheers,
Ben.


Florent Kermarrec

unread,
Jun 16, 2020, 4:01:17 AM6/16/20
to Benjamin Herrenschmidt, linux...@googlegroups.com, Stafford Horne, Joel Stanley
For the NeTV2 example,  as soon as PCIe is used, CSR data-width automatically restricted to 32-bit (PCIe has a high latency and LitePCIe has been designed and only tested with 32-bit CSR data-width to avoid inefficient accesses from the Host).

So 8-bit CSR data-width will not be possible on several systems and if we want interoperability we should probably not encourage its use and no longer spend too much effort on it.
We could still allow it and support it for specific and delimited cases but should probably focus our efforts supporting 32-bit which will give simplicity, performance and interoperability.

Regarding the multiple CSR buses in a system, that's probably a different topic where multiple SoCs are connected together (through Wishbone or AXI), where we'll have several layers of buses and will want to create a global mapping of the system from the local ones. That's probably a bit early to discuss this here since things still need to be experimented on the gateware side. On such complex systems, not having notion of the CSR bus from the top level bus would also be a lot easier: a top level master would just do a MMAP access on any region of the system and this access would passed/translated/down-up-converted to the sub-SoCs automatically.

Ben: i'll start having a look at the ordering of CSR for registers > 32 bit to be sure we can access them as traditional MMAP memory.

Best,

Florent

Benjamin Herrenschmidt

unread,
Jun 16, 2020, 8:52:56 AM6/16/20
to Florent Kermarrec, linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 10:01 +0200, Florent Kermarrec wrote:
> Regarding the multiple CSR buses in a system, that's probably a
> different topic where multiple SoCs are connected together (through
> Wishbone or AXI), where we'll have several layers of buses and will
> want to create a global mapping of the system from the local ones.

Actually in standalone microwatt I have multiple CSR busses already :-)
One per LiteX "IO" I import (today one for litedram and one for
liteeth). But it's no big deal on the Linux side as long as they are
just memory mapped.

> That's probably a bit early to discuss this here since things still
> need to be experimented on the gateware side. On such complex
> systems, not having notion of the CSR bus from the top level bus
> would also be a lot easier: a top level master would just do a MMAP
> access on any region of the system and this access would
> passed/translated/down-up-converted to the sub-SoCs automatically.

Right, which is why I personally lean toward the option of not having a
centralised notion of a "CSR bus" but just pass the CSR region like any
other MMIO region to the individual devices.

However, if we want to make it easier to one day support the usage
model where LiteX IO blocks are behind non-mapped busses such as USB,
SDIO etc..., then there is room for some accessors that could possibly
wrap that. I have ideas on how to do that in a way that doesn't
preclude the direct map in the normal case.

> Ben: i'll start having a look at the ordering of CSR for registers >
> 32 bit to be sure we can access them as traditional MMAP memory.

Thanks !

Cheers,
Ben.


Mateusz Holenko

unread,
Jun 16, 2020, 10:29:26 AM6/16/20
to linux...@googlegroups.com, Florent Kermarrec, Stafford Horne, Joel Stanley
On Tue, Jun 16, 2020 at 12:55 AM Benjamin Herrenschmidt
<be...@kernel.crashing.org> wrote:
>
> On Mon, 2020-06-15 at 10:59 -0400, Gabriel L. Somlo wrote:
> > My $0.02:
> >
> >
> > Re. Linux drivers: I think the idea of a rebasing, clean,
> > "for-upstream" repo containing a current representation of what
> > we think is the "Right Thing (TM)" to work against at any given time
> > is a great one. Also, bonus points if @benh is offering to curate,
> > given his inside knowledge into what's likely to be most palatable
> > to (and encounter least resistance from) upstream.
>
> Let's see, I don't want to walk on the toes of others. The Antmicro
> folks seem to be maintaining the patch series for upstream at the
> moment, so if they want to continue owning that process I won't get in
> the way :-)

Our main goal is to get into a situation where LiteX drivers are in
the mainline kernel,
because we believe it would help making LiteX more popular and could
attract new users.

In this context we don't insist on maintaining the patch series ourselves.
In fact, we would be glad to hand it over to Ben if it increased the
chances of having it upstream.

We plan to contribute and extend the LiteX support in both Zephyr and
Linux in the future
regardless of who is currently curating it :)
> --
> You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-litex/5a848ff5bd5781db4a68c241d3925bd4fd547847.camel%40kernel.crashing.org.

Tim 'mithro' Ansell

unread,
Jun 16, 2020, 2:46:25 PM6/16/20
to linux...@googlegroups.com, Benjamin Herrenschmidt, Stafford Horne, Joel Stanley
I'm okay with the 8bit CSR mode being a "legacy mode maintained for compatibility with existing software".

I do think that we want the CSR bus to exist even if it ends up "just" being MMIO in many cases. As I mentioned previously, there are multiple cases where the CSR bus won't end up being MMIO accessible from Linux. 
 - The most common will be a device like a RPi or Beaglebone "managing" an FPGA SoC via a SPI or I2C bus.
 - A more theoretical case would be managing an FPGA SoC on a peripheral via USB, things like xobs' wishbone-tool have already shown how powerful that abstraction can be.

I think this means LiteX peripherals should support;
 1) "32bit MMIO CSRs" fast path
 2) any other configuration (such as non-MMIO CSRs, non-32bit CSRs, etc) will only be supported via regmap

Could this be implemented as two "instances" of the same module (liteeth-mmio, liteeth-regmap) which use different sets of assessors so there isn't even a runtime cost here? There are multiple ways this could be implemented -- off the top of my head;
liteeth_base -----
uses LITEX_REG_WRITE
uses LITEX_REG_READ
----------------------

liteeth_mmio ----
#define LITEX_REG_WRITE iowrite
#define LITEX_REG_READ ioread

#include "liteeth_base.c"
----------------------

liteeth_regmap ----
#define LITEX_REG_WRITE regmap_write
#define LITEX_REG_READ regmap_read

#include "liteeth_base.c"
----------------------

I don't know if something like this would be acceptable upstream?

Tim 'mithro' Ansell

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Benjamin Herrenschmidt

unread,
Jun 16, 2020, 10:10:35 PM6/16/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 11:46 -0700, Tim 'mithro' Ansell wrote:
> I'm okay with the 8bit CSR mode being a "legacy mode maintained for
> compatibility with existing software".
>
> I do think that we want the CSR bus to exist even if it ends up
> "just" being MMIO in many cases. As I mentioned previously, there are
> multiple cases where the CSR bus won't end up being MMIO accessible
> from Linux.
> - The most common will be a device like a RPi or Beaglebone
> "managing" an FPGA SoC via a SPI or I2C bus.
> - A more theoretical case would be managing an FPGA SoC on a
> peripheral via USB, things like xobs' wishbone-tool have already
> shown how powerful that abstraction can be.
>
> I think this means LiteX peripherals should support;
> 1) "32bit MMIO CSRs" fast path
> 2) any other configuration (such as non-MMIO CSRs, non-32bit CSRs,
> etc) will only be supported via regmap
>
> Could this be implemented as two "instances" of the same module
> (liteeth-mmio, liteeth-regmap) which use different sets of assessors
> so there isn't even a runtime cost here? There are multiple ways this
> could be implemented -- off the top of my head;

It's easier to do the accessor trick I mentioned earlier.

The main thing to figure out then is what does the DT representation of
a non-MMIO device look like. An MMIO one logically lives below the SoC
bus with the standard address ranges mapping for "reg" properties.

Now, if we create the concept of a "CSR bus", do we move LiteX below it
? That would cause weird representation issues for things like LiteEth
that have both CSRs and non-CSR memory ranges.

I'm a bit busy today but give me a few days to try to whip up something
that I think makes sense and we'll see where that goes.

Cheers,
Ben.


Tim 'mithro' Ansell

unread,
Jun 16, 2020, 11:23:57 PM6/16/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
> Could this be implemented as two "instances" of the same module
> (liteeth-mmio, liteeth-regmap) which use different sets of assessors
> so there isn't even a runtime cost here? There are multiple ways this
> could be implemented -- off the top of my head;

It's easier to do the accessor trick I mentioned earlier.

That has a runtime cost of a check for every accessor? If we had two separate modules which only get bound to the correct instances (dependent on device tree properties) then there would be no runtime cost right?
 

The main thing to figure out then is what does the DT representation of
a non-MMIO device look like. An MMIO one logically lives below the SoC
bus with the standard address ranges mapping for "reg" properties.

Now, if we create the concept of a "CSR bus", do we move LiteX below it
? That would cause weird representation issues for things like LiteEth
that have both CSRs and non-CSR memory ranges.

If I understand device tree correctly, you seem to be able to reference stuff at other locations? I think that is what things like pinctrl and spigpio seem to be doing?


I'm a bit busy today but give me a few days to try to whip up something
that I think makes sense and we'll see where that goes.

Looking forward to seeing it...

Tim 'mithro' Ansell
 

Benjamin Herrenschmidt

unread,
Jun 17, 2020, 12:24:17 AM6/17/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Tue, 2020-06-16 at 20:23 -0700, Tim 'mithro' Ansell wrote:
> > > Could this be implemented as two "instances" of the same module
> > > (liteeth-mmio, liteeth-regmap) which use different sets of
> > assessors
> > > so there isn't even a runtime cost here? There are multiple ways
> > this
> > > could be implemented -- off the top of my head;
> >
> > It's easier to do the accessor trick I mentioned earlier.
>
> That has a runtime cost of a check for every accessor? If we had two
> separate modules which only get bound to the correct instances
> (dependent on device tree properties) then there would be no runtime
> cost right?

Good luck getting that #include trick upstream...

I think the cost of a simple test is small enough for a system running
Linux. Better than the two indirect function calls in regmap.

> If I understand device tree correctly, you seem to be able to
> reference stuff at other locations? I think that is what things like
> pinctrl and spigpio seem to be doing?

Sure you can always have phandle's pointing to other nodes.

(Hint: I came up with the whole flat device-tree thing a while ago...
:-)

The question is more where does the LiteX devices themselves resides
since that tends to drive the translation mechanism for MMIO addresses,
and the unit-address (the @xxxx part of the name) which is used among
others for de-ambiguation.

It's not completely trivial to derive from two bus path for MMIO
mappings... we could have stub nodes linked to the main device under
the CSR bus but that means what happens to the unit-address of a CSR-
only device ?

Overall my thinking is to stick to making the CSRs normal "reg" maps
of the device under the SoC "bus" which is the main address space, and
maybe (optionally ?) point with a phandle toward a "CSR bridge" which
can provide "alternate" CSR access methods.

Cheers,
Ben.


Florent Kermarrec

unread,
Jun 17, 2020, 2:27:24 AM6/17/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
Hi,

just a precision on the accesses done through the bridges: that's not really the CSR bus that is bridged but the main bus of the SoC with the full address space. I don't know if there is a term for this translation mechanism in linux, but that's something more generic than bridging CSR buses, so we should probably avoid CSR in the naming of this.

Best,

Florent

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Martin Peres

unread,
Jun 17, 2020, 3:29:00 AM6/17/20
to Linux for LiteX FPGA SoC
On Tuesday, 16 June 2020 21:46:25 UTC+3, Tim 'mithro' Ansell wrote:
I'm okay with the 8bit CSR mode being a "legacy mode maintained for compatibility with existing software".

I do think that we want the CSR bus to exist even if it ends up "just" being MMIO in many cases. As I mentioned previously, there are multiple cases where the CSR bus won't end up being MMIO accessible from Linux. 
 - The most common will be a device like a RPi or Beaglebone "managing" an FPGA SoC via a SPI or I2C bus.
 - A more theoretical case would be managing an FPGA SoC on a peripheral via USB, things like xobs' wishbone-tool have already shown how powerful that abstraction can be.

I think this means LiteX peripherals should support;
 1) "32bit MMIO CSRs" fast path
 2) any other configuration (such as non-MMIO CSRs, non-32bit CSRs, etc) will only be supported via regmap

Could this be implemented as two "instances" of the same module (liteeth-mmio, liteeth-regmap) which use different sets of assessors so there isn't even a runtime cost here? There are multiple ways this could be implemented -- off the top of my head;
liteeth_base -----
uses LITEX_REG_WRITE
uses LITEX_REG_READ
----------------------

liteeth_mmio ----
#define LITEX_REG_WRITE iowrite
#define LITEX_REG_READ ioread

#include "liteeth_base.c"
----------------------

liteeth_regmap ----
#define LITEX_REG_WRITE regmap_write
#define LITEX_REG_READ regmap_read

#include "liteeth_base.c"
----------------------

I don't know if something like this would be acceptable upstream?

As a kernel developer for the past 10+ years, IMO, the problem with upstream will not be the oddities of accessing the CSR, but instead how to enumerate the list of blocks available on the device.I wrote my thoughts on this on a blog article: https://mupuf.org/blog/2020/06/09/FPGA-why-so-few-drivers/

What we really need to nail is a very good interface / ABI. Fixed hardware can get away with whatever interface in the Linux kernel because there is a guarantee that it will not change for a particular version (since it already ships). We thus only need to make sure that the interfaces for each modules are *clearly* defined AND *versioned*.
So far, I have only seen the plan to use the device tree to expose the base addresses of each module, but this only really works for executing Linux on the soft CPU. Don't get me wrong, that would already be a huge step forward, but it won't solve the problem of peripherals because the modules won't be discoverable...

To address the problem statement from above, I think we need to co-design the driver/module and have them both as part of the same project, which led to creating: https://gitlab.freedesktop.org/mupuf/litedip. Basically, this is like the device tree, but exposed through the CSR itself, see for yourself: https://gitlab.freedesktop.org/mupuf/litedip/-/blob/fan_wip/litedip/core/__init__.py#L110

The best example about how much I care about making great interfaces would be the sensor module which exposes any sort of sensor without having to know anything about XADC, or any future FPGA we will support: https://gitlab.freedesktop.org/mupuf/litedip/-/blob/fan_wip/litedip/sensors/__init__.py#L30
This is still WIP, as I would like modules themselves to be able to expose sensors, and I would like to create hierarchies per block, which would be a better solution than adding a name to a sensor and hope the user or software can map each sensor to a function, but at least you can see how far I am willing to go to provide a simple interface to the kernel while not constraining anything on the gateware side.

The problem with all this? Exposing a lot of modules or sensors will end up costing a lot of gates, but we just need to strike a good balance and try to optimize the resource usage by using memory rather than LUTs when possible. On the kernel side, I am trying to make a driver that would be runnable both in userspace or in Linux, but this is a touchy topic in the kernel community with AFAIK only one driver who managed to pull it off (Nouveau) while all others failed (AMD, <undisclosed list>). I will however abandon any of such plans if it jeopardize the ideal of having an upstream kernel driver. Help/critics are welcomed on all sides!

PS: I am new to the hobby, and I have to thank Tim, Florent, and Bunnie for getting me hooked on it! I come from the kernel development / GFX / reverse engineering world, so I definitely have a different perspective on things but I want to learn from you guys!
PPS: I am posting from the googlegroup web interface while trying to subscribe with my normal email address, so don't be surprised when it changes

Benjamin Herrenschmidt

unread,
Jun 17, 2020, 5:16:09 AM6/17/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
On Wed, 2020-06-17 at 08:27 +0200, Florent Kermarrec wrote:
> just a precision on the accesses done through the bridges: that's not
> really the CSR bus that is bridged but the main bus of the SoC with
> the full address space. I don't know if there is a term for this
> translation mechanism in linux, but that's something more generic
> than bridging CSR buses, so we should probably avoid CSR in the
> naming of this.

What i meant is that the main bus of the SoC is bridged to the CSR bus.

IE. In the device-tree, the "root" of the tree tend to be the main bus
of the SoC. There's often a soc@ subnode that tends ot be used to
gather all the IO together, often as a way to separate the main memory
bus from IOs but it's not strictly mandatory.

On that bus, we could represent a "CSR bridge" node that represent the
main bus -> CSR bridge and contains information about the access modes.

Individual devices could then have a pointer to it.

That said, one could find weird that the devices don't sit as children
of the CSR bridge itself, with the bridge providing the ranges mapping.
But that would make things very awkward for devices that have both CSR
and pure MMIO.

Cheers,
Ben.


Florent Kermarrec

unread,
Jun 18, 2020, 3:10:54 AM6/18/20
to linux...@googlegroups.com, Stafford Horne, Joel Stanley
Hi Ben,

the precision was just regarding the previous message, not necessarily yours. Just to make things clear that when an external bridge is used, it's the main bus that is bridged.

I suggest we continue in this direction (try to get CSR working as MMIO with 32-bit data-width, simplify things, etc...) and we'll figure out the others details while working on it.

Cheers,

Florent

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Florent Kermarrec

unread,
Jun 18, 2020, 6:30:55 AM6/18/20
to linux...@googlegroups.com
Hi Martin,

thanks for posting here explanation of your work (that i found and read a few days ago before you posted here).

That's a really interesting analysis and I really understand the direction you want to go and the potential.

I however don't fully agree on the reasons why only a few drivers related to FPGA hardware are present in the linux kernel. The main reason is probably that open-hardware is still in its early days compared to open-software and until a fews years ago, FPGA development was mostly done with proprietary tools and for proprietary projects. This is evolving as new languages/tools are arising, allowing creating hardware more easily and we are clearly seeing a wider adoption of FPGA in open-hardware projects, so could expect more contributions on open-source software projects like Linux in the next few years.

Linux running on FPGA with open-hardware is not very new (OpenRISC CPU implementations like Mor1kx are running Linux since 2014 IIRC) but this was still developed with traditional flows so was only targeting a few FPGA boards and with a few developers who probably haven't got the time to upstream the drivers that were created.

What has changed since then with projects like LiteX is that it's now easier to target very various FPGA with different configuration, different underlying hardware while still using the exact same software API and we think it could make sense to upstream/improve the linux drivers of the LiteX cores to Linux since this already a nice step.

LiteX's philosophy is to use the flexibility of the new language/tools to easily target very various FPGAs and easily create complex systems, but it's still using an approach very close to embedded systems: it generates non-discoverable hardware. Device trees were created for embedded systems and using them with LiteX is then the natural choice. We are of course working on the interface of the cores to make sure that the flexibility provided by LiteX to generate hardware will not compromise the software API stability, but introducing discoverability would not be easily applicable to project without revisiting the whole ecosystem of cores and would not make things very clear on the generated hardware: parts of the hardware would work as non-discoverable hardware, some others as discoverable hardware.

So your research introduces some very interesting concepts and potential, but i'm not sure we are ready to switch to something similar for LiteX (at least for now). We probably first need to demonstrate that we can generate systems with FPGA that are already very capable and very similar to traditional embedded systems already well supported by Linux (while still having only one driver per functionality and not per hardware). If a FPGA SoC builder project goes in the direction of discoverable hardware, it probably should extend the concept even further and fully work as traditional discoverable hardware does. This is probably the next step for FPGAs, but i'm not sure we are ready yet :) (i would however be happy to continue discussing the concepts/ideas with you)

Regards,

Florent

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Benjamin Herrenschmidt

unread,
Jun 18, 2020, 9:17:23 AM6/18/20
to linux...@googlegroups.com
On Thu, 2020-06-18 at 12:30 +0200, Florent Kermarrec wrote:
> So your research introduces some very interesting concepts and
> potential, but i'm not sure we are ready to switch to something
> similar for LiteX (at least for now). We probably first need to
> demonstrate that we can generate systems with FPGA that are already
> very capable and very similar to traditional embedded systems already
> well supported by Linux (while still having only one driver per
> functionality and not per hardware). If a FPGA SoC builder project
> goes in the direction of discoverable hardware, it probably should
> extend the concept even further and fully work as traditional
> discoverable hardware does. This is probably the next step for FPGAs,
> but i'm not sure we are ready yet :) (i would however be happy to
> continue discussing the concepts/ideas with you)

One approach is to just pack a device-tree blob in a "ROM" of the FPGA
gateware and have some kind of standard interface to retreive it.

Linux has mechanisms to inject a new DT fragment at runtime, which I
think were developed for partially reconfigurable FPGAs at runtime.

I'm not super familiar with those mechanisms though. The device-tree
has long outgrown its creators :-) (mind you my claim to fame is only
the flat representation and it's use in Linux for embedded, the concept
itself goes back to Open Firmware, aka ieee1275).

Cheers,
Ben.


Ewen McNeill

unread,
Jun 19, 2020, 12:15:38 AM6/19/20
to linux...@googlegroups.com
On 2020-06-19 01:15, Benjamin Herrenschmidt wrote:
> Linux has mechanisms to inject a new DT fragment at runtime, which I
> think were developed for partially reconfigurable FPGAs at runtime.
>
> I'm not super familiar with those mechanisms though. The device-tree
> has long outgrown its creators :-)

Modern Device Tree on Linux (eg, ARM based boards like the BeagleBone
series, Raspberry Pi, etc) definitely makes extensive use of Device Tree
Overlays, which are fragments loaded and merged into the Device Tree
(with basically executable style loader fixups to resolve symbols that
reference bits of the already-loaded Device Tree). The binary
representation of those Device Tree Overlays changed a few times in the
last 5 years, but AFAICT has been fairly stable in the last couple of
years (eg, Linux 5.x kernels onwards; but also a bit before that).

While historically there were runtime loadable overlays (eg the
BeagleBone had a "cape" overlay loading mechanism -- the "cape" being a
BeagleBone add on board), as of recent kernels (last year or two?) that
seems to have been removed in favour of only boot time overlays (loaded
by, eg, uBoot). So, eg, for a BeagleBone Device Tree update project
I've been working on for someone, I've got:

uboot_overlay_addr0=/lib/firmware/BB-I2C1-FAST-00A0.dtbo
uboot_overlay_addr1=/lib/firmware/BB-I2C2-FAST-00A0.dtbo
uboot_overlay_addr2=/lib/firmware/bone_eqep1-00A0.dtbo
uboot_overlay_addr3=/lib/firmware/bone_eqep2-00A0.dtbo
uboot_overlay_addr4=/lib/firmware/BB-PWM1-00A0.dtbo
uboot_overlay_addr5=/lib/firmware/BB-CAN1-00A0.dtbo
uboot_overlay_pru=/lib/firmware/AM335X-PRU-UIO-00A0.dtbo

in uBoot's uEnv.txt, plus an entry for a .dtbo for the custom things for
their hardware (all of the above are standard overlays --
https://github.com/beagleboard/bb.org-overlays/).

All of those overlays are merged into a *base* overlay, which in the
case of BeagleBone series is default determined by the model burned into
some ID ROM, with some uBoot configuration logic.

I believe the Raspberry Pi does something similar, but I've only glanced
at that; see eg
https://www.raspberrypi.org/documentation/configuration/device-tree.md
(At a glance it seems much the same, but with a different boot loader
configuration.)

For LiteX, I think it'd be easy to follow that sort of approach: have a
fairly generic base Device Tree that had "common to every single LiteX
environment" things in it, and then individual Device Tree Overlays for
each component that was enabled (Wishbone buses, CSR buses, RAM buses,
UART(s), liteeth, etc). And have the boot loader load them all in
individually. I *think* uBoot is basically just loading them all into a
chunk of RAM and handing the Linux kernel a descriptor table that says
"here's the DTB and a bunch of DTBOs", and that it's Linux that is doing
the "loader" symbol resolution. (But I've not had to look into that in
detail.)

Definitely Device Tree Overlays is the obvious way to handle "hardware
that may or may not be present at runtime" (which is why the BeagleBone
uses it for "capes" and the Raspberry Pi for "shields", etc). The
BeagleBone examples (link above) give a pretty good idea of how to
structure this approach.

Ewen

Florent Kermarrec

unread,
Jun 19, 2020, 3:10:43 AM6/19/20
to linux...@googlegroups.com
Interesting, Device Tree Overlays would be less invasive for LiteX (it could be added without revisiting the cores).

If we go in this direction, i would more see it in a first time for the same usage it seems to be used on BeagleBone or Rapsberry PI: only for plugable elements of the SoC, so for extensions (on connectors, standard SPI/I2C buses, etc...) or for the case of the bridged SoCs we were discussing before: each SoC would provide it's DTB(O) as a ROM attached to the main bus and the first thing we would do when accessing a bridge would be to recover the DTB(O) of the external SoC. But the granularity would need to be discussed to see if this would need to be attached to a SoC or to each peripherals: attaching it to a SoC makes sense currently since that's the current granularity provided by LiteX but for future use cases like dynamic re-configuration, it could make sense to have it attached to peripherals.

Best,

Florent

--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.

Drew Fustini

unread,
Jun 19, 2020, 5:22:49 AM6/19/20
to linux...@googlegroups.com
One thing to keep in mind with device tree overlays is that they have to
be done in the bootloader (which is u-boot for us and almost every arm
board other that rpi).

The concept of device tree overlays were introduced as part of the
launch of the original BeagleBone. While the basic tooling for device
tree overlays has landed upstream, we were never able to land a
runtime mechanism to load and unload overlays. We had an out-of-tree
driver called the cape manager which we abondaned when bootlin (formerly
free electrons) added overlay support to u-boot.

While we have u-boot overlays working for us, the two main issues for
are users are:

1) if there is an issue with an overlay that causes uboot to fail, then
the user will never see any output unless on the console serial port
(which is not the majority of users since we our out-of-the-box
experience is based on USB gadget devices).

2) And changes to overlays requires rebooting. While this seems simple,
some users complain that it is annoying and breaks focus.

So basically all of us making arm boards have given up on run-time
overlays. It would be quite exciting if FPGA use case are able to
champanion a userspace-facing, run-time mechanism for overlays :)
For this to happen, Frank Rowand's concerns would need to be addressed
[1].


thanks,
drew
beagleboard.org

[1] https://elinux.org/Frank%27s_Evolving_Overlay_Thoughts

Olof Kindgren

unread,
Jun 19, 2020, 7:04:29 AM6/19/20
to linux...@googlegroups.com
The OHWR group at CERN has been experimenting with self-describing buses since 2010 or so. I took part in the initial specifications but haven't looked at it actively since then. I think much of this has been in upstream linux for many years but I also have a vague memory that they realized over time that they got the abstraction level a bit wrong so it wasn't that useful in the end. Still, I encourage you to look at that and reach out if you're looking to do something similar. They can probably save you a lot of time, especially on the parts that didn't work out.

//Olof


--
You received this message because you are subscribed to the Google Groups "Linux for LiteX FPGA SoC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to linux-litex...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages