Re: new language features for "5G" and "edge" use cases

H
hemant@mnkcg.com
Wed, Jun 16, 2021 11:36 PM

If a packet is not stored in a register, define a new BaaS control which has
one arg as packet_in. I think P4 would need to define a new data struct to
store whole packets.

Hemant

From: Mihai Budiu mbudiu@vmware.com
Sent: Wednesday, June 16, 2021 3:13 PM
To: Gergely Pongracz Gergely.Pongracz@ericsson.com; hemant@mnkcg.com;
jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

Shouldn't we have this conversation on the design mailing list? More people
may want to weigh in.

We have been making moves towards factoring psa.p4 and pna.p4 into a set of
common libraries, e.g., common externs (like registers). Other libraries
will be useful, like standard protocol header definitions. So I think that
in the long term there will be a set of useful libraries, e.g., timers,
buffers, etc., and a target architecture may include several of them,
signaling in this way what is available.

You cannot store the payload of a packet in a register, only the headers.

Mihai

From: Gergely Pongracz <Gergely.Pongracz@ericsson.com
mailto:Gergely.Pongracz@ericsson.com >
Sent: Wednesday, June 16, 2021 2:31 AM
To: Mihai Budiu <mbudiu@vmware.com mailto:mbudiu@vmware.com >;
hemant@mnkcg.com mailto:hemant@mnkcg.com ; jnfoster@cs.cornell.edu
mailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

Hi,

I think it would make sense to think whether such functionality could be
implemented on non-CPU systems or not. If not (always) then I fully agree to
have them as externs. But it would be of course in my view better to have
them as part of PSA or PNA or some future, widely adopted architecture. Of
course these calls will use "extern" functions in the underlying system, but
having them as part of a few generic architectures would be good because
that would mean better portability of the P4 codes.

Having buffers in a Tofino-like device doesn't seem to be impossible, as
Hemant pointed out below one could use register arrays for that purpose. Of
course the size is limited, but the functionality seems to be there.

I guess the issue is more how we implement callbacks on a Tofino-like
device.

One workaround could be to have a register array with the packets and
timeout events just how Hemant described below and on each (or each Nth)
packet arrival we'd check the arrival time and compare it with the timer in
the first entry of the reg array. If we exceeded the timer we'd clone the
packet and would do the timer based event on the copy (basically dropping
the original packet and getting the buffered one to work on). This could fly
for one specific timeout (e.g. a fix 100 msec resend timer), in which case
the register array contains events as an ordered list (well, better say a
ring buffer, but still the timestamps are constantly ascending). Of course
this is nasty and should be hidden under some nice API, I just wanted to
check whether it could work or not.

Gergely

From: Mihai Budiu <mbudiu@vmware.com mailto:mbudiu@vmware.com >
Sent: Monday, June 14, 2021 7:55 PM
To: hemant@mnkcg.com mailto:hemant@mnkcg.com ; Gergely Pongracz
<Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com >;
jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

I don't know of any programming language where timers are a built-in
construct, they are always library functions, i.e., externs for us. So I
don't see a choice. Perhaps the question is "which architecture/library file
should the timers be a part of?"

Mihai

From: Hemant Singh via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Friday, June 11, 2021 4:11 PM
To: hemant@mnkcg.com mailto:hemant@mnkcg.com ;
Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com ;
jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org mailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

I don't see a way out unless start and stop timer are added as new externs.

Gergely doesn't like externs because the code is not portable.  But the new
externs can be added for all software p4c backends. If externs are not used,
what other choice do we have?  Maybe we can discuss this question in the
June 14th LDWG meeting.

Thanks,

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Thursday, June 10, 2021 11:52 AM
To: Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com ;
jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org mailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Here is strawman  P4 events code to implement BaaS. shared_register is an
extern used by the Ibanez paper

enum Events {

set_timer,

del_timer,

exp_timer

}

struct metadata_t {

Events    ev;

.

}

The parser sets events for set_timer and del_timer. When set_timer is used,
a callback is registered which sets exp_timer event.

          shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ;

    bit<64> expire_time = 2;

    if (meta.ev == Events.set_timer) {

     meta.handle = timer_start(expire_time);

     bufSize_reg . write (handle , hdr);

 } else if (meta.ev == Events.del_timer) {

     timer_stop(meta.handle);

     bufSize_reg . write (handle , 0 );

 } else if (meta.ev == Events.exp_timer) {

    resend(hdr);

    meta.handle = timer_start(expire_time);

    bufSize_reg . write (handle , hdr);

 }

We will need time_start() and timer_stop() service from a Timer block
similar to Traffic Manager in one architecture.  The code can be extended to
support multiple timers. Right now, the code uses one timer and thus the
single meta.handle is used.

Hemant

From: Gergely Pongracz via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Friday, June 04, 2021 5:14 AM
To: Nate Foster <jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu >
Cc: p4-design <p4-design@lists.p4.org mailto:p4-design@lists.p4.org >
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi Nate,

Sorry for the long delay. I uploaded our example code here:
https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst

The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we
used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for
compiling and running.

Buffering would be executed in the RANDownlink() control block, now we add a
special header and send out the packet towards the service IP of the BaaS
(buffer-as-a-service) which runs as a Kubernetes service for now. We could
clone the packet just as well and send it directly to the downlink path
while sending the copy to the buffer, but now it is sent to the buffer and
on successful buffering the BaaS service returns the packet - this way we
know that the original packet is buffered and timeout counter is started.

Regarding to your questions:

  1. You are right, maybe it could be solved by an extern similarly as we
    solve it with a non-P4 component. On the other hand I don't particularly
    like having too much architectures around as that kills one of the main
    advantages of P4 (to my knowledge) which is portability. So I'd rather go
    for a language change with this - for me the only reason not doing that
    could be if the task would be impossible to support by some hardware
    targets. You know the language much better, but I'd say buffering a few
    packets could be similar to having a bit more registers. So buffering itself
    doesn't seem a huge issue for me. Running timers and assigning events to
    them on the other hand might be a bigger change as potentially there would
    be a large amount of parallel timers - and of course there are good data
    structures for that, but are they hardware friendly enough? Ibanez's
    presentation suggests it can be done fairly simply on FPGA.
  2. According to the presentation I think the proposed solution -
    especially if all proposed primitives on slide 6 would be implemented - is a
    superset of what we'd need (I'd say for us enqueue, dequeue and timer
    expiration would be enough). So if Ibanez's proposal would be part of the
    language, we wouldn't need more (at least for now).
  3. Yes, if you have a look at the code you'll see that we already use
    control blocks for modularizing the code. With Tofino sometimes it's not
    straightforward as the compiler tends to use more stages in this case
    compared to if you use less control blocks (this issue was also mentioned in
    the uP4 talk). As I understood, Lyra is a higher layer solution for
    portability over multiple DSLs, so I guess that would be handy if even in
    the long term portability would be an issue. I think Lyra's composition part
    could deal with composing multiple modules / programs on a single switch - I
    guess you referred to this feature, but I don't think we'd need a Lyra-like
    engine in the long run.

So my only question that is remaining: is the proposals from Ibanez & co.
already considered by some of the working groups e.g. LDWG? If yes, I'll go
thru the details as that is quite likely a good solution for us too.

Thanks!

Gergely

From: Nate Foster <jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu

Sent: Tuesday, June 1, 2021 3:51 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.com
mailto:Gergely.Pongracz@ericsson.com >
Cc: p4-design <p4-design@lists.p4.org mailto:p4-design@lists.p4.org >
Subject: Re: [P4-design] new language features for "5G" and "edge" use cases

Hi Gergely,

The best way to get involved is to attend the monthly P4 LDWG meeting. For
bringing proposals, we have a process discussed on the README.md for the
p4-spec repository (https://github.com/p4lang/p4-spec). Basically we are
happy to entertain a high-level proposal, resulting in a thumbs up or thumbs
down. For detailed proposals, we expect to see a number of things fully
worked out: specification language changes, prototype implementation, and
example programs. Suffice to say, this is a lot of work, so it's great to
either be certain you want to pursue it and are able to follow through, or
you can convince others to help you out.

Responding to these topics:

  1. Does this need a language change or could these buffers be modeled with
    an extern? If so, then no language change is needed, just an architecture
    that supports buffering.

  2. Is the idea the same as Ibanez et al.'s notion of events
    (https://dl.acm.org/doi/10.1145/3365609.3365848)? If not, what language
    changes would you propose?

  3. Simple forms of modularity can be accomplished by treating controls as
    composable units. Note that they support constructors with
    statically-determined parameters. Otherwise, have you looked at Lyra and
    MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology,
    so I recommend using the name "primary" and not the one you used.)

-N

On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design
<p4-design@lists.p4.org mailto:p4-design@lists.p4.org > wrote:

Hi,

On the P4 WS we presented a use case that is I admit not very unique these
days (implementing 5G network functions with P4), but while we were
implementing these NFs we came across a few limitations in P4 and started to
wonder whether some new features solving these issues could be part of P4 in
the future.

There are a few slides on these:
https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slide
s.pdf

Basically there are the following 3 cases:

  1. Short-term (~1 RTT) buffering: this would be handy for
    segmentation-reassembly and retransmit loop use cases. Basically some
    "buffer", and "remove" actions would be needed preferably with timer support
  2. More generic buffering and time-based events: for programmable
    traffic management, packet scheduling or keepalive messages we could use a
    more generic method. Possibly the problem is very similar to the previous on
    the API level, but in these cases there is probably less limitations on the
    buffer size, which could make this a bit more tricky from hardware
    development perspective
  3. Modular pipelines: this would mean that we could specify multiple
    pipelines and a "master" pipeline that would call the underlying ones as
    subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master"
    can also be re-used as a module by a higher layer pipeline. Probably this
    could be solved entirely on the compiler level.

We also described a few less important cases (e.g. re-using tables,
registers, etc.), but we could easily find workarounds to overcome those
limitations, so let's focus on the ones above.

I'm quite new to this community although we've been using P4 for a while
now. So I don't really know what is the best way to start discussing these
issues and if you find these useful, how to start working on (some of)
these.

Thanks for any hints and help.

BR,

Gergely


P4-design mailing list -- p4-design@lists.p4.org
mailto:p4-design@lists.p4.org
To unsubscribe send an email to p4-design-leave@lists.p4.org

If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets. Hemant From: Mihai Budiu <mbudiu@vmware.com> Sent: Wednesday, June 16, 2021 3:13 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com>; hemant@mnkcg.com; jnfoster@cs.cornell.edu Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Shouldn't we have this conversation on the design mailing list? More people may want to weigh in. We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available. You cannot store the payload of a packet in a register, only the headers. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> > Sent: Wednesday, June 16, 2021 2:31 AM To: Mihai Budiu <mbudiu@vmware.com <mailto:mbudiu@vmware.com> >; hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Hi, I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use "extern" functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes. Having buffers in a Tofino-like device doesn't seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there. I guess the issue is more how we implement callbacks on a Tofino-like device. One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we'd check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we'd clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not. Gergely From: Mihai Budiu <mbudiu@vmware.com <mailto:mbudiu@vmware.com> > Sent: Monday, June 14, 2021 7:55 PM To: hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> >; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I don't know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don't see a choice. Perhaps the question is "which architecture/library file should the timers be a part of?" Mihai From: Hemant Singh via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Friday, June 11, 2021 4:11 PM To: hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases I don't see a way out unless start and stop timer are added as new externs. Gergely doesn't like externs because the code is not portable. But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have? Maybe we can discuss this question in the June 14th LDWG meeting. Thanks, Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Thursday, June 10, 2021 11:52 AM To: Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Here is strawman P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper enum Events { set_timer, del_timer, exp_timer } struct metadata_t { Events ev; . } The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event. shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ; bit<64> expire_time = 2; if (meta.ev == Events.set_timer) { meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } else if (meta.ev == Events.del_timer) { timer_stop(meta.handle); bufSize_reg . write (handle , 0 ); } else if (meta.ev == Events.exp_timer) { resend(hdr); meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture. The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used. Hemant From: Gergely Pongracz via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Friday, June 04, 2021 5:14 AM To: Nate Foster <jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> > Cc: p4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Hi Nate, Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for compiling and running. Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet - this way we know that the original packet is buffered and timeout counter is started. Regarding to your questions: 1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don't particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I'd rather go for a language change with this - for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I'd say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn't seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers - and of course there are good data structures for that, but are they hardware friendly enough? Ibanez's presentation suggests it can be done fairly simply on FPGA. 2. According to the presentation I think the proposed solution - especially if all proposed primitives on slide 6 would be implemented - is a superset of what we'd need (I'd say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez's proposal would be part of the language, we wouldn't need more (at least for now). 3. Yes, if you have a look at the code you'll see that we already use control blocks for modularizing the code. With Tofino sometimes it's not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra's composition part could deal with composing multiple modules / programs on a single switch - I guess you referred to this feature, but I don't think we'd need a Lyra-like engine in the long run. So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I'll go thru the details as that is quite likely a good solution for us too. Thanks! Gergely From: Nate Foster <jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> > Sent: Tuesday, June 1, 2021 3:51 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> > Cc: p4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Subject: Re: [P4-design] new language features for "5G" and "edge" use cases Hi Gergely, The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spec). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out. Responding to these topics: 1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering. 2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848)? If not, what language changes would you propose? 3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.) -N On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > wrote: Hi, On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future. There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slide s.pdf Basically there are the following 3 cases: 1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some "buffer", and "remove" actions would be needed preferably with timer support 2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective 3. Modular pipelines: this would mean that we could specify multiple pipelines and a "master" pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master" can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level. We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let's focus on the ones above. I'm quite new to this community although we've been using P4 for a while now. So I don't really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these. Thanks for any hints and help. BR, Gergely _______________________________________________ P4-design mailing list -- p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> To unsubscribe send an email to p4-design-leave@lists.p4.org
H
hemant@mnkcg.com
Thu, Jun 17, 2021 3:24 PM

To store packets, I would need an array of Packet_in in P4 which is a new
construct for P4.

Hemant

From: Hemant Singh via P4-design p4-design@lists.p4.org
Sent: Wednesday, June 16, 2021 7:37 PM
To: mbudiu@vmware.com; Gergely.Pongracz@ericsson.com;
jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

If a packet is not stored in a register, define a new BaaS control which has
one arg as packet_in. I think P4 would need to define a new data struct to
store whole packets.

Hemant

From: Mihai Budiu <mbudiu@vmware.com mailto:mbudiu@vmware.com >
Sent: Wednesday, June 16, 2021 3:13 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.com
mailto:Gergely.Pongracz@ericsson.com >; hemant@mnkcg.com
mailto:hemant@mnkcg.com ; jnfoster@cs.cornell.edu
mailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

Shouldn't we have this conversation on the design mailing list? More people
may want to weigh in.

We have been making moves towards factoring psa.p4 and pna.p4 into a set of
common libraries, e.g., common externs (like registers). Other libraries
will be useful, like standard protocol header definitions. So I think that
in the long term there will be a set of useful libraries, e.g., timers,
buffers, etc., and a target architecture may include several of them,
signaling in this way what is available.

You cannot store the payload of a packet in a register, only the headers.

Mihai

From: Gergely Pongracz <Gergely.Pongracz@ericsson.com
mailto:Gergely.Pongracz@ericsson.com >
Sent: Wednesday, June 16, 2021 2:31 AM
To: Mihai Budiu <mbudiu@vmware.com mailto:mbudiu@vmware.com >;
hemant@mnkcg.com mailto:hemant@mnkcg.com ; jnfoster@cs.cornell.edu
mailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

Hi,

I think it would make sense to think whether such functionality could be
implemented on non-CPU systems or not. If not (always) then I fully agree to
have them as externs. But it would be of course in my view better to have
them as part of PSA or PNA or some future, widely adopted architecture. Of
course these calls will use "extern" functions in the underlying system, but
having them as part of a few generic architectures would be good because
that would mean better portability of the P4 codes.

Having buffers in a Tofino-like device doesn't seem to be impossible, as
Hemant pointed out below one could use register arrays for that purpose. Of
course the size is limited, but the functionality seems to be there.

I guess the issue is more how we implement callbacks on a Tofino-like
device.

One workaround could be to have a register array with the packets and
timeout events just how Hemant described below and on each (or each Nth)
packet arrival we'd check the arrival time and compare it with the timer in
the first entry of the reg array. If we exceeded the timer we'd clone the
packet and would do the timer based event on the copy (basically dropping
the original packet and getting the buffered one to work on). This could fly
for one specific timeout (e.g. a fix 100 msec resend timer), in which case
the register array contains events as an ordered list (well, better say a
ring buffer, but still the timestamps are constantly ascending). Of course
this is nasty and should be hidden under some nice API, I just wanted to
check whether it could work or not.

Gergely

From: Mihai Budiu <mbudiu@vmware.com mailto:mbudiu@vmware.com >
Sent: Monday, June 14, 2021 7:55 PM
To: hemant@mnkcg.com mailto:hemant@mnkcg.com ; Gergely Pongracz
<Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com >;
jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use
cases

I don't know of any programming language where timers are a built-in
construct, they are always library functions, i.e., externs for us. So I
don't see a choice. Perhaps the question is "which architecture/library file
should the timers be a part of?"

Mihai

From: Hemant Singh via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Friday, June 11, 2021 4:11 PM
To: hemant@mnkcg.com mailto:hemant@mnkcg.com ;
Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com ;
jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org mailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

I don't see a way out unless start and stop timer are added as new externs.

Gergely doesn't like externs because the code is not portable.  But the new
externs can be added for all software p4c backends. If externs are not used,
what other choice do we have?  Maybe we can discuss this question in the
June 14th LDWG meeting.

Thanks,

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Thursday, June 10, 2021 11:52 AM
To: Gergely.Pongracz@ericsson.com mailto:Gergely.Pongracz@ericsson.com ;
jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org mailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Here is strawman  P4 events code to implement BaaS. shared_register is an
extern used by the Ibanez paper

enum Events {

set_timer,

del_timer,

exp_timer

}

struct metadata_t {

Events    ev;

.

}

The parser sets events for set_timer and del_timer. When set_timer is used,
a callback is registered which sets exp_timer event.

          shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ;

    bit<64> expire_time = 2;

    if (meta.ev == Events.set_timer) {

     meta.handle = timer_start(expire_time);

     bufSize_reg . write (handle , hdr);

 } else if (meta.ev == Events.del_timer) {

     timer_stop(meta.handle);

     bufSize_reg . write (handle , 0 );

 } else if (meta.ev == Events.exp_timer) {

    resend(hdr);

    meta.handle = timer_start(expire_time);

    bufSize_reg . write (handle , hdr);

 }

We will need time_start() and timer_stop() service from a Timer block
similar to Traffic Manager in one architecture.  The code can be extended to
support multiple timers. Right now, the code uses one timer and thus the
single meta.handle is used.

Hemant

From: Gergely Pongracz via P4-design <p4-design@lists.p4.org
mailto:p4-design@lists.p4.org >
Sent: Friday, June 04, 2021 5:14 AM
To: Nate Foster <jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu >
Cc: p4-design <p4-design@lists.p4.org mailto:p4-design@lists.p4.org >
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi Nate,

Sorry for the long delay. I uploaded our example code here:
https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst

The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we
used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for
compiling and running.

Buffering would be executed in the RANDownlink() control block, now we add a
special header and send out the packet towards the service IP of the BaaS
(buffer-as-a-service) which runs as a Kubernetes service for now. We could
clone the packet just as well and send it directly to the downlink path
while sending the copy to the buffer, but now it is sent to the buffer and
on successful buffering the BaaS service returns the packet - this way we
know that the original packet is buffered and timeout counter is started.

Regarding to your questions:

  1. You are right, maybe it could be solved by an extern similarly as we
    solve it with a non-P4 component. On the other hand I don't particularly
    like having too much architectures around as that kills one of the main
    advantages of P4 (to my knowledge) which is portability. So I'd rather go
    for a language change with this - for me the only reason not doing that
    could be if the task would be impossible to support by some hardware
    targets. You know the language much better, but I'd say buffering a few
    packets could be similar to having a bit more registers. So buffering itself
    doesn't seem a huge issue for me. Running timers and assigning events to
    them on the other hand might be a bigger change as potentially there would
    be a large amount of parallel timers - and of course there are good data
    structures for that, but are they hardware friendly enough? Ibanez's
    presentation suggests it can be done fairly simply on FPGA.
  2. According to the presentation I think the proposed solution -
    especially if all proposed primitives on slide 6 would be implemented - is a
    superset of what we'd need (I'd say for us enqueue, dequeue and timer
    expiration would be enough). So if Ibanez's proposal would be part of the
    language, we wouldn't need more (at least for now).
  3. Yes, if you have a look at the code you'll see that we already use
    control blocks for modularizing the code. With Tofino sometimes it's not
    straightforward as the compiler tends to use more stages in this case
    compared to if you use less control blocks (this issue was also mentioned in
    the uP4 talk). As I understood, Lyra is a higher layer solution for
    portability over multiple DSLs, so I guess that would be handy if even in
    the long term portability would be an issue. I think Lyra's composition part
    could deal with composing multiple modules / programs on a single switch - I
    guess you referred to this feature, but I don't think we'd need a Lyra-like
    engine in the long run.

So my only question that is remaining: is the proposals from Ibanez & co.
already considered by some of the working groups e.g. LDWG? If yes, I'll go
thru the details as that is quite likely a good solution for us too.

Thanks!

Gergely

From: Nate Foster <jnfoster@cs.cornell.edu mailto:jnfoster@cs.cornell.edu

Sent: Tuesday, June 1, 2021 3:51 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.com
mailto:Gergely.Pongracz@ericsson.com >
Cc: p4-design <p4-design@lists.p4.org mailto:p4-design@lists.p4.org >
Subject: Re: [P4-design] new language features for "5G" and "edge" use cases

Hi Gergely,

The best way to get involved is to attend the monthly P4 LDWG meeting. For
bringing proposals, we have a process discussed on the README.md for the
p4-spec repository (https://github.com/p4lang/p4-spec). Basically we are
happy to entertain a high-level proposal, resulting in a thumbs up or thumbs
down. For detailed proposals, we expect to see a number of things fully
worked out: specification language changes, prototype implementation, and
example programs. Suffice to say, this is a lot of work, so it's great to
either be certain you want to pursue it and are able to follow through, or
you can convince others to help you out.

Responding to these topics:

  1. Does this need a language change or could these buffers be modeled with
    an extern? If so, then no language change is needed, just an architecture
    that supports buffering.

  2. Is the idea the same as Ibanez et al.'s notion of events
    (https://dl.acm.org/doi/10.1145/3365609.3365848)? If not, what language
    changes would you propose?

  3. Simple forms of modularity can be accomplished by treating controls as
    composable units. Note that they support constructors with
    statically-determined parameters. Otherwise, have you looked at Lyra and
    MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology,
    so I recommend using the name "primary" and not the one you used.)

-N

On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design
<p4-design@lists.p4.org mailto:p4-design@lists.p4.org > wrote:

Hi,

On the P4 WS we presented a use case that is I admit not very unique these
days (implementing 5G network functions with P4), but while we were
implementing these NFs we came across a few limitations in P4 and started to
wonder whether some new features solving these issues could be part of P4 in
the future.

There are a few slides on these:
https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slide
s.pdf

Basically there are the following 3 cases:

  1. Short-term (~1 RTT) buffering: this would be handy for
    segmentation-reassembly and retransmit loop use cases. Basically some
    "buffer", and "remove" actions would be needed preferably with timer support
  2. More generic buffering and time-based events: for programmable
    traffic management, packet scheduling or keepalive messages we could use a
    more generic method. Possibly the problem is very similar to the previous on
    the API level, but in these cases there is probably less limitations on the
    buffer size, which could make this a bit more tricky from hardware
    development perspective
  3. Modular pipelines: this would mean that we could specify multiple
    pipelines and a "master" pipeline that would call the underlying ones as
    subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master"
    can also be re-used as a module by a higher layer pipeline. Probably this
    could be solved entirely on the compiler level.

We also described a few less important cases (e.g. re-using tables,
registers, etc.), but we could easily find workarounds to overcome those
limitations, so let's focus on the ones above.

I'm quite new to this community although we've been using P4 for a while
now. So I don't really know what is the best way to start discussing these
issues and if you find these useful, how to start working on (some of)
these.

Thanks for any hints and help.

BR,

Gergely


P4-design mailing list -- p4-design@lists.p4.org
mailto:p4-design@lists.p4.org
To unsubscribe send an email to p4-design-leave@lists.p4.org
mailto:p4-design-leave@lists.p4.org

To store packets, I would need an array of Packet_in in P4 which is a new construct for P4. Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org> Sent: Wednesday, June 16, 2021 7:37 PM To: mbudiu@vmware.com; Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edu Cc: p4-design@lists.p4.org Subject: [P4-design] Re: new language features for "5G" and "edge" use cases If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets. Hemant From: Mihai Budiu <mbudiu@vmware.com <mailto:mbudiu@vmware.com> > Sent: Wednesday, June 16, 2021 3:13 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> >; hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Shouldn't we have this conversation on the design mailing list? More people may want to weigh in. We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available. You cannot store the payload of a packet in a register, only the headers. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> > Sent: Wednesday, June 16, 2021 2:31 AM To: Mihai Budiu <mbudiu@vmware.com <mailto:mbudiu@vmware.com> >; hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Hi, I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use "extern" functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes. Having buffers in a Tofino-like device doesn't seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there. I guess the issue is more how we implement callbacks on a Tofino-like device. One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we'd check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we'd clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not. Gergely From: Mihai Budiu <mbudiu@vmware.com <mailto:mbudiu@vmware.com> > Sent: Monday, June 14, 2021 7:55 PM To: hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> >; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I don't know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don't see a choice. Perhaps the question is "which architecture/library file should the timers be a part of?" Mihai From: Hemant Singh via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Friday, June 11, 2021 4:11 PM To: hemant@mnkcg.com <mailto:hemant@mnkcg.com> ; Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> ; jnfoster@cs.cornell.edu Cc: p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases I don't see a way out unless start and stop timer are added as new externs. Gergely doesn't like externs because the code is not portable. But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have? Maybe we can discuss this question in the June 14th LDWG meeting. Thanks, Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Thursday, June 10, 2021 11:52 AM To: Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> ; jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Here is strawman P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper enum Events { set_timer, del_timer, exp_timer } struct metadata_t { Events ev; . } The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event. shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ; bit<64> expire_time = 2; if (meta.ev == Events.set_timer) { meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } else if (meta.ev == Events.del_timer) { timer_stop(meta.handle); bufSize_reg . write (handle , 0 ); } else if (meta.ev == Events.exp_timer) { resend(hdr); meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture. The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used. Hemant From: Gergely Pongracz via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Sent: Friday, June 04, 2021 5:14 AM To: Nate Foster <jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> > Cc: p4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Hi Nate, Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for compiling and running. Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet - this way we know that the original packet is buffered and timeout counter is started. Regarding to your questions: 1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don't particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I'd rather go for a language change with this - for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I'd say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn't seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers - and of course there are good data structures for that, but are they hardware friendly enough? Ibanez's presentation suggests it can be done fairly simply on FPGA. 2. According to the presentation I think the proposed solution - especially if all proposed primitives on slide 6 would be implemented - is a superset of what we'd need (I'd say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez's proposal would be part of the language, we wouldn't need more (at least for now). 3. Yes, if you have a look at the code you'll see that we already use control blocks for modularizing the code. With Tofino sometimes it's not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra's composition part could deal with composing multiple modules / programs on a single switch - I guess you referred to this feature, but I don't think we'd need a Lyra-like engine in the long run. So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I'll go thru the details as that is quite likely a good solution for us too. Thanks! Gergely From: Nate Foster <jnfoster@cs.cornell.edu <mailto:jnfoster@cs.cornell.edu> > Sent: Tuesday, June 1, 2021 3:51 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com <mailto:Gergely.Pongracz@ericsson.com> > Cc: p4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > Subject: Re: [P4-design] new language features for "5G" and "edge" use cases Hi Gergely, The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spec). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out. Responding to these topics: 1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering. 2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848)? If not, what language changes would you propose? 3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.) -N On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> > wrote: Hi, On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future. There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slide s.pdf Basically there are the following 3 cases: 1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some "buffer", and "remove" actions would be needed preferably with timer support 2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective 3. Modular pipelines: this would mean that we could specify multiple pipelines and a "master" pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master" can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level. We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let's focus on the ones above. I'm quite new to this community although we've been using P4 for a while now. So I don't really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these. Thanks for any hints and help. BR, Gergely _______________________________________________ P4-design mailing list -- p4-design@lists.p4.org <mailto:p4-design@lists.p4.org> To unsubscribe send an email to p4-design-leave@lists.p4.org <mailto:p4-design-leave@lists.p4.org>
MB
Mihai Budiu
Fri, Jun 18, 2021 12:17 AM

Packet_in is an extern. There are no operations on extern except instantiation and method calls.
In particular, there is no assignment between externs (except compile-time binding as parameters).
If you want to do something like this you will probably have to invent a new extern to represent a dynamic array.

Mihai

From: hemant@mnkcg.com hemant@mnkcg.com
Sent: Thursday, June 17, 2021 8:24 AM
To: hemant@mnkcg.com; Mihai Budiu mbudiu@vmware.com; Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

To store packets, I would need an array of Packet_in in P4 which is a new construct for P4.

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Wednesday, June 16, 2021 7:37 PM
To: mbudiu@vmware.commailto:mbudiu@vmware.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets.

Hemant

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Wednesday, June 16, 2021 3:13 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Shouldn't we have this conversation on the design mailing list? More people may want to weigh in.

We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available.

You cannot store the payload of a packet in a register, only the headers.

Mihai

From: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Sent: Wednesday, June 16, 2021 2:31 AM
To: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi,

I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use "extern" functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes.

Having buffers in a Tofino-like device doesn't seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there.

I guess the issue is more how we implement callbacks on a Tofino-like device.

One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we'd check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we'd clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not.

Gergely

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Monday, June 14, 2021 7:55 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

I don't know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don't see a choice. Perhaps the question is "which architecture/library file should the timers be a part of?"

Mihai

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 11, 2021 4:11 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

I don't see a way out unless start and stop timer are added as new externs.
Gergely doesn't like externs because the code is not portable.  But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have?  Maybe we can discuss this question in the June 14th LDWG meeting.

Thanks,

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Thursday, June 10, 2021 11:52 AM
To: Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Here is strawman  P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper

enum Events {
set_timer,
del_timer,
exp_timer
}

struct metadata_t {
Events    ev;
...
}

The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event.

          shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ;
    bit<64> expire_time = 2;
    if (meta.ev == Events.set_timer) {
     meta.handle = timer_start(expire_time);
     bufSize_reg . write (handle , hdr);
 } else if (meta.ev == Events.del_timer) {
     timer_stop(meta.handle);
     bufSize_reg . write (handle , 0 );
 } else if (meta.ev == Events.exp_timer) {
    resend(hdr);
    meta.handle = timer_start(expire_time);
    bufSize_reg . write (handle , hdr);
 }

We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture.  The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used.

Hemant

From: Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 04, 2021 5:14 AM
To: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi Nate,

Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bsthttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fgithub.com%25252FP4ELTE%25252Fuse_cases%25252Ftree%25252Fmaster%25252Fp4-16%25252Fbst%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011636683%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DkcgHleNt%25252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713989711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3KBQBqp02y6%2FapOd%2FzADsaPFtk9akY3UHtzv5rFq7%2Bo%3D&reserved=0
The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for compiling and running.

Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet - this way we know that the original packet is buffered and timeout counter is started.

Regarding to your questions:

  1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don't particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I'd rather go for a language change with this - for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I'd say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn't seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers - and of course there are good data structures for that, but are they hardware friendly enough? Ibanez's presentation suggests it can be done fairly simply on FPGA.
  2. According to the presentation I think the proposed solution - especially if all proposed primitives on slide 6 would be implemented - is a superset of what we'd need (I'd say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez's proposal would be part of the language, we wouldn't need more (at least for now).
  3. Yes, if you have a look at the code you'll see that we already use control blocks for modularizing the code. With Tofino sometimes it's not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra's composition part could deal with composing multiple modules / programs on a single switch - I guess you referred to this feature, but I don't think we'd need a Lyra-like engine in the long run.

So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I'll go thru the details as that is quite likely a good solution for us too.
Thanks!

Gergely

From: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Sent: Tuesday, June 1, 2021 3:51 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: Re: [P4-design] new language features for "5G" and "edge" use cases

Hi Gergely,

The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spechttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fgithub.com%2525252Fp4lang%2525252Fp4-spec%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011646679%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DZkOLVUcmxtB%25252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713999706%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rikaINpRgM5w6O%2BCZN26gnxECxY%2BkCut4VR%2FY1OrAAk%3D&reserved=0). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out.

Responding to these topics:

  1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering.
  2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fdl.acm.org%2525252Fdoi%2525252F10.1145%2525252F3365609.3365848%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011646679%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DvcqViatf%25252BEyFhVY3Cqk94ho%25252FDqieIcCqEmfCoPX1hLM%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713999706%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z8XsgbD%2FmlSHZ7XiNxGee%2F9bDDWGq5E9toh8Q9fNlcw%3D&reserved=0)? If not, what language changes would you propose?
  3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.)

-N

On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org> wrote:
Hi,

On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future.

There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdfhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fopennetworking.org%2525252Fwp-content%2525252Fuploads%2525252F2021%2525252F05%2525252FGergely-Pongracz-Slides.pdf%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011656674%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%25252B5A9qFK2YlWA%25252Fk%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326714009697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%3D&reserved=0

Basically there are the following 3 cases:

  1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some "buffer", and "remove" actions would be needed preferably with timer support
  2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective
  3. Modular pipelines: this would mean that we could specify multiple pipelines and a "master" pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master" can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level.

We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let's focus on the ones above.

I'm quite new to this community although we've been using P4 for a while now. So I don't really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these.
Thanks for any hints and help.
BR,

Gergely


P4-design mailing list -- p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
To unsubscribe send an email to p4-design-leave@lists.p4.orgmailto:p4-design-leave@lists.p4.org

Packet_in is an extern. There are no operations on extern except instantiation and method calls. In particular, there is no assignment between externs (except compile-time binding as parameters). If you want to do something like this you will probably have to invent a new extern to represent a dynamic array. Mihai From: hemant@mnkcg.com <hemant@mnkcg.com> Sent: Thursday, June 17, 2021 8:24 AM To: hemant@mnkcg.com; Mihai Budiu <mbudiu@vmware.com>; Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edu Cc: p4-design@lists.p4.org Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases To store packets, I would need an array of Packet_in in P4 which is a new construct for P4. Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Wednesday, June 16, 2021 7:37 PM To: mbudiu@vmware.com<mailto:mbudiu@vmware.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets. Hemant From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Wednesday, June 16, 2021 3:13 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Shouldn't we have this conversation on the design mailing list? More people may want to weigh in. We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available. You cannot store the payload of a packet in a register, only the headers. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Sent: Wednesday, June 16, 2021 2:31 AM To: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Hi, I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use "extern" functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes. Having buffers in a Tofino-like device doesn't seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there. I guess the issue is more how we implement callbacks on a Tofino-like device. One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we'd check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we'd clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not. Gergely From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Monday, June 14, 2021 7:55 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I don't know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don't see a choice. Perhaps the question is "which architecture/library file should the timers be a part of?" Mihai From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 11, 2021 4:11 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases I don't see a way out unless start and stop timer are added as new externs. Gergely doesn't like externs because the code is not portable. But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have? Maybe we can discuss this question in the June 14th LDWG meeting. Thanks, Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Thursday, June 10, 2021 11:52 AM To: Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Here is strawman P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper enum Events { set_timer, del_timer, exp_timer } struct metadata_t { Events ev; ... } The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event. shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ; bit<64> expire_time = 2; if (meta.ev == Events.set_timer) { meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } else if (meta.ev == Events.del_timer) { timer_stop(meta.handle); bufSize_reg . write (handle , 0 ); } else if (meta.ev == Events.exp_timer) { resend(hdr); meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture. The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used. Hemant From: Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 04, 2021 5:14 AM To: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Hi Nate, Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fgithub.com%25252FP4ELTE%25252Fuse_cases%25252Ftree%25252Fmaster%25252Fp4-16%25252Fbst%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011636683%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DkcgHleNt%25252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713989711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=3KBQBqp02y6%2FapOd%2FzADsaPFtk9akY3UHtzv5rFq7%2Bo%3D&reserved=0> The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 - we used Tofino's compiler and t4p4s (basically p4c with DPDK backend) for compiling and running. Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet - this way we know that the original packet is buffered and timeout counter is started. Regarding to your questions: 1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don't particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I'd rather go for a language change with this - for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I'd say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn't seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers - and of course there are good data structures for that, but are they hardware friendly enough? Ibanez's presentation suggests it can be done fairly simply on FPGA. 2. According to the presentation I think the proposed solution - especially if all proposed primitives on slide 6 would be implemented - is a superset of what we'd need (I'd say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez's proposal would be part of the language, we wouldn't need more (at least for now). 3. Yes, if you have a look at the code you'll see that we already use control blocks for modularizing the code. With Tofino sometimes it's not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra's composition part could deal with composing multiple modules / programs on a single switch - I guess you referred to this feature, but I don't think we'd need a Lyra-like engine in the long run. So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I'll go thru the details as that is quite likely a good solution for us too. Thanks! Gergely From: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Sent: Tuesday, June 1, 2021 3:51 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: Re: [P4-design] new language features for "5G" and "edge" use cases Hi Gergely, The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spec<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fgithub.com%2525252Fp4lang%2525252Fp4-spec%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011646679%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DZkOLVUcmxtB%25252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713999706%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rikaINpRgM5w6O%2BCZN26gnxECxY%2BkCut4VR%2FY1OrAAk%3D&reserved=0>). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out. Responding to these topics: 1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering. 2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fdl.acm.org%2525252Fdoi%2525252F10.1145%2525252F3365609.3365848%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011646679%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DvcqViatf%25252BEyFhVY3Cqk94ho%25252FDqieIcCqEmfCoPX1hLM%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326713999706%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=z8XsgbD%2FmlSHZ7XiNxGee%2F9bDDWGq5E9toh8Q9fNlcw%3D&reserved=0>)? If not, what language changes would you propose? 3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.) -N On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> wrote: Hi, On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future. There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdf<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%26q%3D1%26e%3D59368853-ae85-41d3-b2f8-ae81acdbed03%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%252526q%25253D1%252526e%25253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%252526u%25253Dhttps%2525253A%2525252F%2525252Fopennetworking.org%2525252Fwp-content%2525252Fuploads%2525252F2021%2525252F05%2525252FGergely-Pongracz-Slides.pdf%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257C6660e11d35c548622f1408d92d2e07e8%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637590498011656674%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%25252B5A9qFK2YlWA%25252Fk%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7Cc36a67bb44594dbe4d2608d930a97806%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637594326714009697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%3D&reserved=0> Basically there are the following 3 cases: 1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some "buffer", and "remove" actions would be needed preferably with timer support 2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective 3. Modular pipelines: this would mean that we could specify multiple pipelines and a "master" pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a "master" can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level. We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let's focus on the ones above. I'm quite new to this community although we've been using P4 for a while now. So I don't really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these. Thanks for any hints and help. BR, Gergely _______________________________________________ P4-design mailing list -- p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> To unsubscribe send an email to p4-design-leave@lists.p4.org<mailto:p4-design-leave@lists.p4.org>
GP
Gergely Pongracz
Fri, Jun 18, 2021 9:05 AM

I guess the “only headers can be stored” could work as long as that header can be 1,5k byte long. 😉
Unfortunately I guess this is not the case, so my proposed workaround doesn’t really work generally.

Then as Hemant said: we’d need something (extern, language construct) to be able to work on the payload. Or we’d need to increase the size of the header structure to 1,5k (this way jumbo frames will still cause problems, but the majority of the use cases would work). Actually I think payload buffer would be simpler, as that would require less functionality (basically store, send, delete), but if we could extend the header that would open up interesting possibilities, e.g. http/sip parsing, proxies, DPI.

BR,

Gergely

From: Mihai Budiu mbudiu@vmware.com
Sent: Friday, June 18, 2021 2:17 AM
To: hemant@mnkcg.com; Gergely Pongracz Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Packet_in is an extern. There are no operations on extern except instantiation and method calls.
In particular, there is no assignment between externs (except compile-time binding as parameters).
If you want to do something like this you will probably have to invent a new extern to represent a dynamic array.

Mihai

From: hemant@mnkcg.commailto:hemant@mnkcg.com <hemant@mnkcg.commailto:hemant@mnkcg.com>
Sent: Thursday, June 17, 2021 8:24 AM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

To store packets, I would need an array of Packet_in in P4 which is a new construct for P4.

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Wednesday, June 16, 2021 7:37 PM
To: mbudiu@vmware.commailto:mbudiu@vmware.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets.

Hemant

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Wednesday, June 16, 2021 3:13 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Shouldn’t we have this conversation on the design mailing list? More people may want to weigh in.

We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available.

You cannot store the payload of a packet in a register, only the headers.

Mihai

From: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Sent: Wednesday, June 16, 2021 2:31 AM
To: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi,

I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use “extern” functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes.

Having buffers in a Tofino-like device doesn’t seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there.

I guess the issue is more how we implement callbacks on a Tofino-like device.

One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we’d check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we’d clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not.

Gergely

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Monday, June 14, 2021 7:55 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

I don’t know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don’t see a choice. Perhaps the question is “which architecture/library file should the timers be a part of?”

Mihai

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 11, 2021 4:11 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

I don’t see a way out unless start and stop timer are added as new externs.
Gergely doesn’t like externs because the code is not portable.  But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have?  Maybe we can discuss this question in the June 14th LDWG meeting.

Thanks,

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Thursday, June 10, 2021 11:52 AM
To: Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Here is strawman  P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper

enum Events {
set_timer,
del_timer,
exp_timer
}

struct metadata_t {
Events    ev;

}

The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event.

          shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ;
    bit<64> expire_time = 2;
    if (meta.ev == Events.set_timer) {
     meta.handle = timer_start(expire_time);
     bufSize_reg . write (handle , hdr);
 } else if (meta.ev == Events.del_timer) {
     timer_stop(meta.handle);
     bufSize_reg . write (handle , 0 );
 } else if (meta.ev == Events.exp_timer) {
    resend(hdr);
    meta.handle = timer_start(expire_time);
    bufSize_reg . write (handle , hdr);
 }

We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture.  The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used.

Hemant

From: Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 04, 2021 5:14 AM
To: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi Nate,

Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bsthttps://protect2.fireeye.com/v1/url?k=40455f7b-1fde6656-40451fe0-8692dc8284cb-3de00b901f4e7ed2&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fgithub.com%2525252FP4ELTE%2525252Fuse_cases%2525252Ftree%2525252Fmaster%2525252Fp4-16%2525252Fbst%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011636683%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DkcgHleNt%2525252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713989711%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3D3KBQBqp02y6%252FapOd%252FzADsaPFtk9akY3UHtzv5rFq7%252Bo%253D%26reserved%3D0
The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 – we used Tofino’s compiler and t4p4s (basically p4c with DPDK backend) for compiling and running.

Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet – this way we know that the original packet is buffered and timeout counter is started.

Regarding to your questions:

  1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don’t particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I’d rather go for a language change with this – for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I’d say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn’t seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers – and of course there are good data structures for that, but are they hardware friendly enough? Ibanez’s presentation suggests it can be done fairly simply on FPGA.
  2. According to the presentation I think the proposed solution – especially if all proposed primitives on slide 6 would be implemented – is a superset of what we’d need (I’d say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez’s proposal would be part of the language, we wouldn’t need more (at least for now).
  3. Yes, if you have a look at the code you’ll see that we already use control blocks for modularizing the code. With Tofino sometimes it’s not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra’s composition part could deal with composing multiple modules / programs on a single switch – I guess you referred to this feature, but I don’t think we’d need a Lyra-like engine in the long run.

So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I’ll go thru the details as that is quite likely a good solution for us too.
Thanks!

Gergely

From: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Sent: Tuesday, June 1, 2021 3:51 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: Re: [P4-design] new language features for "5G" and "edge" use cases

Hi Gergely,

The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spechttps://protect2.fireeye.com/v1/url?k=63aec4a5-3c35fd88-63ae843e-8692dc8284cb-3dbc0b6419d34cb6&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fgithub.com%252525252Fp4lang%252525252Fp4-spec%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011646679%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DZkOLVUcmxtB%2525252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713999706%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3DrikaINpRgM5w6O%252BCZN26gnxECxY%252BkCut4VR%252FY1OrAAk%253D%26reserved%3D0). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out.

Responding to these topics:

  1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering.
  2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848https://protect2.fireeye.com/v1/url?k=197d3d73-46e6045e-197d7de8-8692dc8284cb-f12c9d155507dc84&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fdl.acm.org%252525252Fdoi%252525252F10.1145%252525252F3365609.3365848%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011646679%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DvcqViatf%2525252BEyFhVY3Cqk94ho%2525252FDqieIcCqEmfCoPX1hLM%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713999706%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3Dz8XsgbD%252FmlSHZ7XiNxGee%252F9bDDWGq5E9toh8Q9fNlcw%253D%26reserved%3D0)? If not, what language changes would you propose?
  3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.)

-N

On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org> wrote:
Hi,

On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future.

There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdfhttps://protect2.fireeye.com/v1/url?k=317f2388-6ee41aa5-317f6313-8692dc8284cb-38bbdc8e91a3703a&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fopennetworking.org%252525252Fwp-content%252525252Fuploads%252525252F2021%252525252F05%252525252FGergely-Pongracz-Slides.pdf%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011656674%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%2525252B5A9qFK2YlWA%2525252Fk%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326714009697%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3DzkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%253D%26reserved%3D0

Basically there are the following 3 cases:

  1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some “buffer”, and “remove” actions would be needed preferably with timer support
  2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective
  3. Modular pipelines: this would mean that we could specify multiple pipelines and a “master” pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a “master” can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level.

We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let’s focus on the ones above.

I’m quite new to this community although we’ve been using P4 for a while now. So I don’t really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these.
Thanks for any hints and help.
BR,

Gergely


P4-design mailing list -- p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
To unsubscribe send an email to p4-design-leave@lists.p4.orgmailto:p4-design-leave@lists.p4.org

I guess the “only headers can be stored” could work as long as that header can be 1,5k byte long. 😉 Unfortunately I guess this is not the case, so my proposed workaround doesn’t really work generally. Then as Hemant said: we’d need something (extern, language construct) to be able to work on the payload. Or we’d need to increase the size of the header structure to 1,5k (this way jumbo frames will still cause problems, but the majority of the use cases would work). Actually I think payload buffer would be simpler, as that would require less functionality (basically store, send, delete), but if we could extend the header that would open up interesting possibilities, e.g. http/sip parsing, proxies, DPI. BR, Gergely From: Mihai Budiu <mbudiu@vmware.com> Sent: Friday, June 18, 2021 2:17 AM To: hemant@mnkcg.com; Gergely Pongracz <Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu Cc: p4-design@lists.p4.org Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Packet_in is an extern. There are no operations on extern except instantiation and method calls. In particular, there is no assignment between externs (except compile-time binding as parameters). If you want to do something like this you will probably have to invent a new extern to represent a dynamic array. Mihai From: hemant@mnkcg.com<mailto:hemant@mnkcg.com> <hemant@mnkcg.com<mailto:hemant@mnkcg.com>> Sent: Thursday, June 17, 2021 8:24 AM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases To store packets, I would need an array of Packet_in in P4 which is a new construct for P4. Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Wednesday, June 16, 2021 7:37 PM To: mbudiu@vmware.com<mailto:mbudiu@vmware.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets. Hemant From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Wednesday, June 16, 2021 3:13 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Shouldn’t we have this conversation on the design mailing list? More people may want to weigh in. We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available. You cannot store the payload of a packet in a register, only the headers. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Sent: Wednesday, June 16, 2021 2:31 AM To: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Hi, I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use “extern” functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes. Having buffers in a Tofino-like device doesn’t seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there. I guess the issue is more how we implement callbacks on a Tofino-like device. One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we’d check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we’d clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not. Gergely From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Monday, June 14, 2021 7:55 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I don’t know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don’t see a choice. Perhaps the question is “which architecture/library file should the timers be a part of?” Mihai From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 11, 2021 4:11 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases I don’t see a way out unless start and stop timer are added as new externs. Gergely doesn’t like externs because the code is not portable. But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have? Maybe we can discuss this question in the June 14th LDWG meeting. Thanks, Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Thursday, June 10, 2021 11:52 AM To: Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Here is strawman P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper enum Events { set_timer, del_timer, exp_timer } struct metadata_t { Events ev; … } The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event. shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ; bit<64> expire_time = 2; if (meta.ev == Events.set_timer) { meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } else if (meta.ev == Events.del_timer) { timer_stop(meta.handle); bufSize_reg . write (handle , 0 ); } else if (meta.ev == Events.exp_timer) { resend(hdr); meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture. The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used. Hemant From: Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 04, 2021 5:14 AM To: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Hi Nate, Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst<https://protect2.fireeye.com/v1/url?k=40455f7b-1fde6656-40451fe0-8692dc8284cb-3de00b901f4e7ed2&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fgithub.com%2525252FP4ELTE%2525252Fuse_cases%2525252Ftree%2525252Fmaster%2525252Fp4-16%2525252Fbst%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011636683%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DkcgHleNt%2525252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713989711%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3D3KBQBqp02y6%252FapOd%252FzADsaPFtk9akY3UHtzv5rFq7%252Bo%253D%26reserved%3D0> The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 – we used Tofino’s compiler and t4p4s (basically p4c with DPDK backend) for compiling and running. Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet – this way we know that the original packet is buffered and timeout counter is started. Regarding to your questions: 1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don’t particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I’d rather go for a language change with this – for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I’d say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn’t seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers – and of course there are good data structures for that, but are they hardware friendly enough? Ibanez’s presentation suggests it can be done fairly simply on FPGA. 2. According to the presentation I think the proposed solution – especially if all proposed primitives on slide 6 would be implemented – is a superset of what we’d need (I’d say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez’s proposal would be part of the language, we wouldn’t need more (at least for now). 3. Yes, if you have a look at the code you’ll see that we already use control blocks for modularizing the code. With Tofino sometimes it’s not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra’s composition part could deal with composing multiple modules / programs on a single switch – I guess you referred to this feature, but I don’t think we’d need a Lyra-like engine in the long run. So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I’ll go thru the details as that is quite likely a good solution for us too. Thanks! Gergely From: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Sent: Tuesday, June 1, 2021 3:51 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: Re: [P4-design] new language features for "5G" and "edge" use cases Hi Gergely, The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spec<https://protect2.fireeye.com/v1/url?k=63aec4a5-3c35fd88-63ae843e-8692dc8284cb-3dbc0b6419d34cb6&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fgithub.com%252525252Fp4lang%252525252Fp4-spec%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011646679%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DZkOLVUcmxtB%2525252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713999706%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3DrikaINpRgM5w6O%252BCZN26gnxECxY%252BkCut4VR%252FY1OrAAk%253D%26reserved%3D0>). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out. Responding to these topics: 1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering. 2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848<https://protect2.fireeye.com/v1/url?k=197d3d73-46e6045e-197d7de8-8692dc8284cb-f12c9d155507dc84&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fdl.acm.org%252525252Fdoi%252525252F10.1145%252525252F3365609.3365848%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011646679%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DvcqViatf%2525252BEyFhVY3Cqk94ho%2525252FDqieIcCqEmfCoPX1hLM%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326713999706%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3Dz8XsgbD%252FmlSHZ7XiNxGee%252F9bDDWGq5E9toh8Q9fNlcw%253D%26reserved%3D0>)? If not, what language changes would you propose? 3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.) -N On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> wrote: Hi, On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future. There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdf<https://protect2.fireeye.com/v1/url?k=317f2388-6ee41aa5-317f6313-8692dc8284cb-38bbdc8e91a3703a&q=1&e=6446927f-d577-4cff-a434-29e5fccda319&u=https%3A%2F%2Fnam04.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%253A%252F%252Fprotect2.fireeye.com%252Fv1%252Furl%253Fk%253D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%2526q%253D1%2526e%253D59368853-ae85-41d3-b2f8-ae81acdbed03%2526u%253Dhttps%25253A%25252F%25252Fnam04.safelinks.protection.outlook.com%25252F%25253Furl%25253Dhttps%2525253A%2525252F%2525252Fprotect2.fireeye.com%2525252Fv1%2525252Furl%2525253Fk%2525253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%25252526q%2525253D1%25252526e%2525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%25252526u%2525253Dhttps%252525253A%252525252F%252525252Fopennetworking.org%252525252Fwp-content%252525252Fuploads%252525252F2021%252525252F05%252525252FGergely-Pongracz-Slides.pdf%252526data%25253D04%2525257C01%2525257Cmbudiu%25252540vmware.com%2525257C6660e11d35c548622f1408d92d2e07e8%2525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%2525257C0%2525257C1%2525257C637590498011656674%2525257CUnknown%2525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%2525253D%2525257C1000%252526sdata%25253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%2525252B5A9qFK2YlWA%2525252Fk%2525253D%252526reserved%25253D0%26data%3D04%257C01%257Cmbudiu%2540vmware.com%257Cc36a67bb44594dbe4d2608d930a97806%257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%257C0%257C1%257C637594326714009697%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%26sdata%3DzkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%253D%26reserved%3D0> Basically there are the following 3 cases: 1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some “buffer”, and “remove” actions would be needed preferably with timer support 2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective 3. Modular pipelines: this would mean that we could specify multiple pipelines and a “master” pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a “master” can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level. We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let’s focus on the ones above. I’m quite new to this community although we’ve been using P4 for a while now. So I don’t really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these. Thanks for any hints and help. BR, Gergely _______________________________________________ P4-design mailing list -- p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> To unsubscribe send an email to p4-design-leave@lists.p4.org<mailto:p4-design-leave@lists.p4.org>
MB
Mihai Budiu
Fri, Jun 18, 2021 8:35 PM

A header is defined by a P4 parser. If you are willing to parse an entire packet then you won’t have a payload. But in general, you can’t write parsers for arbitrary-size packets, even using varbits – in the extract call you have to know the header size that you want to parse.

Moreover, hardware devices like Tofino have a limit on both the number of parser transitions that they can execute for a packet and the number of bytes parsed per state. These are very natural, if you expect N packets/second/parser and you need P parser transitions per packet, then you need to perform P * N * clock period transitions/second to parse the packets (a device like Tofino has fewer parsers than input ports). This puts a bound on P if you want to sustain this bandwidth without dropping packets (or buffering them). You can statically bound P for the worst case, and reject programs that have parsers that are too complex, or assume something about the expected duration P for the mix of packets you are getting (P is not a constant, it depends on the packet) and hope that you have enough parser bandwidth to cope with all packets in practice.

For a NIC N is much smaller than for a router, so it may be feasible to do complex parsing, but for a high speed switch you really don’t have enough time to do deep parsing at line rate.

Mihai

From: Gergely Pongracz Gergely.Pongracz@ericsson.com
Sent: Friday, June 18, 2021 2:06 AM
To: Mihai Budiu mbudiu@vmware.com; hemant@mnkcg.com; jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

I guess the “only headers can be stored” could work as long as that header can be 1,5k byte long. 😉
Unfortunately I guess this is not the case, so my proposed workaround doesn’t really work generally.

Then as Hemant said: we’d need something (extern, language construct) to be able to work on the payload. Or we’d need to increase the size of the header structure to 1,5k (this way jumbo frames will still cause problems, but the majority of the use cases would work). Actually I think payload buffer would be simpler, as that would require less functionality (basically store, send, delete), but if we could extend the header that would open up interesting possibilities, e.g. http/sip parsing, proxies, DPI.

BR,

Gergely

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Friday, June 18, 2021 2:17 AM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Packet_in is an extern. There are no operations on extern except instantiation and method calls.
In particular, there is no assignment between externs (except compile-time binding as parameters).
If you want to do something like this you will probably have to invent a new extern to represent a dynamic array.

Mihai

From: hemant@mnkcg.commailto:hemant@mnkcg.com <hemant@mnkcg.commailto:hemant@mnkcg.com>
Sent: Thursday, June 17, 2021 8:24 AM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

To store packets, I would need an array of Packet_in in P4 which is a new construct for P4.

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Wednesday, June 16, 2021 7:37 PM
To: mbudiu@vmware.commailto:mbudiu@vmware.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets.

Hemant

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Wednesday, June 16, 2021 3:13 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Shouldn’t we have this conversation on the design mailing list? More people may want to weigh in.

We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available.

You cannot store the payload of a packet in a register, only the headers.

Mihai

From: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Sent: Wednesday, June 16, 2021 2:31 AM
To: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>; hemant@mnkcg.commailto:hemant@mnkcg.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi,

I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use “extern” functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes.

Having buffers in a Tofino-like device doesn’t seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there.

I guess the issue is more how we implement callbacks on a Tofino-like device.

One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we’d check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we’d clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not.

Gergely

From: Mihai Budiu <mbudiu@vmware.commailto:mbudiu@vmware.com>
Sent: Monday, June 14, 2021 7:55 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases

I don’t know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don’t see a choice. Perhaps the question is “which architecture/library file should the timers be a part of?”

Mihai

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 11, 2021 4:11 PM
To: hemant@mnkcg.commailto:hemant@mnkcg.com; Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

I don’t see a way out unless start and stop timer are added as new externs.
Gergely doesn’t like externs because the code is not portable.  But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have?  Maybe we can discuss this question in the June 14th LDWG meeting.

Thanks,

Hemant

From: Hemant Singh via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Thursday, June 10, 2021 11:52 AM
To: Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com; jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu
Cc: p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Here is strawman  P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper

enum Events {
set_timer,
del_timer,
exp_timer
}

struct metadata_t {
Events    ev;

}

The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event.

          shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ;
    bit<64> expire_time = 2;
    if (meta.ev == Events.set_timer) {
     meta.handle = timer_start(expire_time);
     bufSize_reg . write (handle , hdr);
 } else if (meta.ev == Events.del_timer) {
     timer_stop(meta.handle);
     bufSize_reg . write (handle , 0 );
 } else if (meta.ev == Events.exp_timer) {
    resend(hdr);
    meta.handle = timer_start(expire_time);
    bufSize_reg . write (handle , hdr);
 }

We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture.  The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used.

Hemant

From: Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Sent: Friday, June 04, 2021 5:14 AM
To: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: [P4-design] Re: new language features for "5G" and "edge" use cases

Hi Nate,

Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bsthttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D40455f7b-1fde6656-40451fe0-8692dc8284cb-3de00b901f4e7ed2%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fgithub.com%252525252FP4ELTE%252525252Fuse_cases%252525252Ftree%252525252Fmaster%252525252Fp4-16%252525252Fbst%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011636683%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DkcgHleNt%252525252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713989711%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253D3KBQBqp02y6%25252FapOd%25252FzADsaPFtk9akY3UHtzv5rFq7%25252Bo%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587162429%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=njUIoFwMCz%2FyoxAYjFf%2FQehDicM2J3VAgBg9PApCnP0%3D&reserved=0
The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 – we used Tofino’s compiler and t4p4s (basically p4c with DPDK backend) for compiling and running.

Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet – this way we know that the original packet is buffered and timeout counter is started.

Regarding to your questions:

  1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don’t particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I’d rather go for a language change with this – for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I’d say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn’t seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers – and of course there are good data structures for that, but are they hardware friendly enough? Ibanez’s presentation suggests it can be done fairly simply on FPGA.
  2. According to the presentation I think the proposed solution – especially if all proposed primitives on slide 6 would be implemented – is a superset of what we’d need (I’d say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez’s proposal would be part of the language, we wouldn’t need more (at least for now).
  3. Yes, if you have a look at the code you’ll see that we already use control blocks for modularizing the code. With Tofino sometimes it’s not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra’s composition part could deal with composing multiple modules / programs on a single switch – I guess you referred to this feature, but I don’t think we’d need a Lyra-like engine in the long run.

So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I’ll go thru the details as that is quite likely a good solution for us too.
Thanks!

Gergely

From: Nate Foster <jnfoster@cs.cornell.edumailto:jnfoster@cs.cornell.edu>
Sent: Tuesday, June 1, 2021 3:51 PM
To: Gergely Pongracz <Gergely.Pongracz@ericsson.commailto:Gergely.Pongracz@ericsson.com>
Cc: p4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org>
Subject: Re: [P4-design] new language features for "5G" and "edge" use cases

Hi Gergely,

The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spechttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D63aec4a5-3c35fd88-63ae843e-8692dc8284cb-3dbc0b6419d34cb6%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fgithub.com%25252525252Fp4lang%25252525252Fp4-spec%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011646679%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DZkOLVUcmxtB%252525252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713999706%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DrikaINpRgM5w6O%25252BCZN26gnxECxY%25252BkCut4VR%25252FY1OrAAk%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587172413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4lVXxfvC4kfedep8tOIfqI). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out.

Responding to these topics:

  1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering.
  2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D197d3d73-46e6045e-197d7de8-8692dc8284cb-f12c9d155507dc84%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fdl.acm.org%25252525252Fdoi%25252525252F10.1145%25252525252F3365609.3365848%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011646679%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DvcqViatf%252525252BEyFhVY3Cqk94ho%252525252FDqieIcCqEmfCoPX1hLM%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713999706%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253Dz8XsgbD%25252FmlSHZ7XiNxGee%25252F9bDDWGq5E9toh8Q9fNlcw%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587182411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000)? If not, what language changes would you propose?
  3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.)

-N

On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.orgmailto:p4-design@lists.p4.org> wrote:
Hi,

On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future.

There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdfhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D317f2388-6ee41aa5-317f6313-8692dc8284cb-38bbdc8e91a3703a%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fopennetworking.org%25252525252Fwp-content%25252525252Fuploads%25252525252F2021%25252525252F05%25252525252FGergely-Pongracz-Slides.pdf%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011656674%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%252525252B5A9qFK2YlWA%252525252Fk%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326714009697%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DzkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587192408%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj

Basically there are the following 3 cases:

  1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some “buffer”, and “remove” actions would be needed preferably with timer support
  2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective
  3. Modular pipelines: this would mean that we could specify multiple pipelines and a “master” pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a “master” can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level.

We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let’s focus on the ones above.

I’m quite new to this community although we’ve been using P4 for a while now. So I don’t really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these.
Thanks for any hints and help.
BR,

Gergely


P4-design mailing list -- p4-design@lists.p4.orgmailto:p4-design@lists.p4.org
To unsubscribe send an email to p4-design-leave@lists.p4.orgmailto:p4-design-leave@lists.p4.org

A header is defined by a P4 parser. If you are willing to parse an entire packet then you won’t have a payload. But in general, you can’t write parsers for arbitrary-size packets, even using varbits – in the extract call you have to *know* the header size that you want to parse. Moreover, hardware devices like Tofino have a limit on both the number of parser transitions that they can execute for a packet and the number of bytes parsed per state. These are very natural, if you expect N packets/second/parser and you need P parser transitions per packet, then you need to perform P * N * clock period transitions/second to parse the packets (a device like Tofino has fewer parsers than input ports). This puts a bound on P if you want to sustain this bandwidth without dropping packets (or buffering them). You can statically bound P for the worst case, and reject programs that have parsers that are too complex, or assume something about the expected duration P for the mix of packets you are getting (P is not a constant, it depends on the packet) and hope that you have enough parser bandwidth to cope with all packets in practice. For a NIC N is much smaller than for a router, so it may be feasible to do complex parsing, but for a high speed switch you really don’t have enough time to do deep parsing at line rate. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com> Sent: Friday, June 18, 2021 2:06 AM To: Mihai Budiu <mbudiu@vmware.com>; hemant@mnkcg.com; jnfoster@cs.cornell.edu Cc: p4-design@lists.p4.org Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I guess the “only headers can be stored” could work as long as that header can be 1,5k byte long. 😉 Unfortunately I guess this is not the case, so my proposed workaround doesn’t really work generally. Then as Hemant said: we’d need something (extern, language construct) to be able to work on the payload. Or we’d need to increase the size of the header structure to 1,5k (this way jumbo frames will still cause problems, but the majority of the use cases would work). Actually I think payload buffer would be simpler, as that would require less functionality (basically store, send, delete), but if we could extend the header that would open up interesting possibilities, e.g. http/sip parsing, proxies, DPI. BR, Gergely From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Friday, June 18, 2021 2:17 AM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Packet_in is an extern. There are no operations on extern except instantiation and method calls. In particular, there is no assignment between externs (except compile-time binding as parameters). If you want to do something like this you will probably have to invent a new extern to represent a dynamic array. Mihai From: hemant@mnkcg.com<mailto:hemant@mnkcg.com> <hemant@mnkcg.com<mailto:hemant@mnkcg.com>> Sent: Thursday, June 17, 2021 8:24 AM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases To store packets, I would need an array of Packet_in in P4 which is a new construct for P4. Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Wednesday, June 16, 2021 7:37 PM To: mbudiu@vmware.com<mailto:mbudiu@vmware.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases If a packet is not stored in a register, define a new BaaS control which has one arg as packet_in. I think P4 would need to define a new data struct to store whole packets. Hemant From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Wednesday, June 16, 2021 3:13 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Shouldn’t we have this conversation on the design mailing list? More people may want to weigh in. We have been making moves towards factoring psa.p4 and pna.p4 into a set of common libraries, e.g., common externs (like registers). Other libraries will be useful, like standard protocol header definitions. So I think that in the long term there will be a set of useful libraries, e.g., timers, buffers, etc., and a target architecture may include several of them, signaling in this way what is available. You cannot store the payload of a packet in a register, only the headers. Mihai From: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Sent: Wednesday, June 16, 2021 2:31 AM To: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>>; hemant@mnkcg.com<mailto:hemant@mnkcg.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases Hi, I think it would make sense to think whether such functionality could be implemented on non-CPU systems or not. If not (always) then I fully agree to have them as externs. But it would be of course in my view better to have them as part of PSA or PNA or some future, widely adopted architecture. Of course these calls will use “extern” functions in the underlying system, but having them as part of a few generic architectures would be good because that would mean better portability of the P4 codes. Having buffers in a Tofino-like device doesn’t seem to be impossible, as Hemant pointed out below one could use register arrays for that purpose. Of course the size is limited, but the functionality seems to be there. I guess the issue is more how we implement callbacks on a Tofino-like device. One workaround could be to have a register array with the packets and timeout events just how Hemant described below and on each (or each Nth) packet arrival we’d check the arrival time and compare it with the timer in the first entry of the reg array. If we exceeded the timer we’d clone the packet and would do the timer based event on the copy (basically dropping the original packet and getting the buffered one to work on). This could fly for one specific timeout (e.g. a fix 100 msec resend timer), in which case the register array contains events as an ordered list (well, better say a ring buffer, but still the timestamps are constantly ascending). Of course this is nasty and should be hidden under some nice API, I just wanted to check whether it could work or not. Gergely From: Mihai Budiu <mbudiu@vmware.com<mailto:mbudiu@vmware.com>> Sent: Monday, June 14, 2021 7:55 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Subject: RE: [P4-design] Re: new language features for "5G" and "edge" use cases I don’t know of any programming language where timers are a built-in construct, they are always library functions, i.e., externs for us. So I don’t see a choice. Perhaps the question is “which architecture/library file should the timers be a part of?” Mihai From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 11, 2021 4:11 PM To: hemant@mnkcg.com<mailto:hemant@mnkcg.com>; Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases I don’t see a way out unless start and stop timer are added as new externs. Gergely doesn’t like externs because the code is not portable. But the new externs can be added for all software p4c backends. If externs are not used, what other choice do we have? Maybe we can discuss this question in the June 14th LDWG meeting. Thanks, Hemant From: Hemant Singh via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Thursday, June 10, 2021 11:52 AM To: Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>; jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu> Cc: p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Here is strawman P4 events code to implement BaaS. shared_register is an extern used by the Ibanez paper enum Events { set_timer, del_timer, exp_timer } struct metadata_t { Events ev; … } The parser sets events for set_timer and del_timer. When set_timer is used, a callback is registered which sets exp_timer event. shared_register <bit <32 > >( NUM_REGS ) bufSize_reg ; bit<64> expire_time = 2; if (meta.ev == Events.set_timer) { meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } else if (meta.ev == Events.del_timer) { timer_stop(meta.handle); bufSize_reg . write (handle , 0 ); } else if (meta.ev == Events.exp_timer) { resend(hdr); meta.handle = timer_start(expire_time); bufSize_reg . write (handle , hdr); } We will need time_start() and timer_stop() service from a Timer block similar to Traffic Manager in one architecture. The code can be extended to support multiple timers. Right now, the code uses one timer and thus the single meta.handle is used. Hemant From: Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Sent: Friday, June 04, 2021 5:14 AM To: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: [P4-design] Re: new language features for "5G" and "edge" use cases Hi Nate, Sorry for the long delay. I uploaded our example code here: https://github.com/P4ELTE/use_cases/tree/master/p4-16/bst<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D40455f7b-1fde6656-40451fe0-8692dc8284cb-3de00b901f4e7ed2%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D7a546f5f-25cf565f-7a542fc4-86fc6812c361-3ba983a94c0be502%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fgithub.com%252525252FP4ELTE%252525252Fuse_cases%252525252Ftree%252525252Fmaster%252525252Fp4-16%252525252Fbst%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011636683%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DkcgHleNt%252525252FbA4seQ4w0hjzbvNyovpsvANR5uTbS1YXq4%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713989711%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253D3KBQBqp02y6%25252FapOd%25252FzADsaPFtk9akY3UHtzv5rFq7%25252Bo%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587162429%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=njUIoFwMCz%2FyoxAYjFf%2FQehDicM2J3VAgBg9PApCnP0%3D&reserved=0> The main code is cpf_ran.p4. In theory it supports both TNA, PSA and V1 – we used Tofino’s compiler and t4p4s (basically p4c with DPDK backend) for compiling and running. Buffering would be executed in the RANDownlink() control block, now we add a special header and send out the packet towards the service IP of the BaaS (buffer-as-a-service) which runs as a Kubernetes service for now. We could clone the packet just as well and send it directly to the downlink path while sending the copy to the buffer, but now it is sent to the buffer and on successful buffering the BaaS service returns the packet – this way we know that the original packet is buffered and timeout counter is started. Regarding to your questions: 1. You are right, maybe it could be solved by an extern similarly as we solve it with a non-P4 component. On the other hand I don’t particularly like having too much architectures around as that kills one of the main advantages of P4 (to my knowledge) which is portability. So I’d rather go for a language change with this – for me the only reason not doing that could be if the task would be impossible to support by some hardware targets. You know the language much better, but I’d say buffering a few packets could be similar to having a bit more registers. So buffering itself doesn’t seem a huge issue for me. Running timers and assigning events to them on the other hand might be a bigger change as potentially there would be a large amount of parallel timers – and of course there are good data structures for that, but are they hardware friendly enough? Ibanez’s presentation suggests it can be done fairly simply on FPGA. 2. According to the presentation I think the proposed solution – especially if all proposed primitives on slide 6 would be implemented – is a superset of what we’d need (I’d say for us enqueue, dequeue and timer expiration would be enough). So if Ibanez’s proposal would be part of the language, we wouldn’t need more (at least for now). 3. Yes, if you have a look at the code you’ll see that we already use control blocks for modularizing the code. With Tofino sometimes it’s not straightforward as the compiler tends to use more stages in this case compared to if you use less control blocks (this issue was also mentioned in the uP4 talk). As I understood, Lyra is a higher layer solution for portability over multiple DSLs, so I guess that would be handy if even in the long term portability would be an issue. I think Lyra’s composition part could deal with composing multiple modules / programs on a single switch – I guess you referred to this feature, but I don’t think we’d need a Lyra-like engine in the long run. So my only question that is remaining: is the proposals from Ibanez & co. already considered by some of the working groups e.g. LDWG? If yes, I’ll go thru the details as that is quite likely a good solution for us too. Thanks! Gergely From: Nate Foster <jnfoster@cs.cornell.edu<mailto:jnfoster@cs.cornell.edu>> Sent: Tuesday, June 1, 2021 3:51 PM To: Gergely Pongracz <Gergely.Pongracz@ericsson.com<mailto:Gergely.Pongracz@ericsson.com>> Cc: p4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> Subject: Re: [P4-design] new language features for "5G" and "edge" use cases Hi Gergely, The best way to get involved is to attend the monthly P4 LDWG meeting. For bringing proposals, we have a process discussed on the README.md for the p4-spec repository (https://github.com/p4lang/p4-spec<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D63aec4a5-3c35fd88-63ae843e-8692dc8284cb-3dbc0b6419d34cb6%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D54d7cf26-0b4cf626-54d78fbd-86fc6812c361-ec3ccadb607885e9%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253D6d35ce48-32aef74a-6d358ed3-869a14f4b08c-43b5c3677e857afe%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fgithub.com%25252525252Fp4lang%25252525252Fp4-spec%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011646679%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DZkOLVUcmxtB%252525252Bi2TvhUQZ13sWqNe5gyDpwGWkaQY1fUU%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713999706%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DrikaINpRgM5w6O%25252BCZN26gnxECxY%25252BkCut4VR%25252FY1OrAAk%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587172413%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4lVXxfvC4kfedep8tOIfqI>). Basically we are happy to entertain a high-level proposal, resulting in a thumbs up or thumbs down. For detailed proposals, we expect to see a number of things fully worked out: specification language changes, prototype implementation, and example programs. Suffice to say, this is a lot of work, so it's great to either be certain you want to pursue it and are able to follow through, or you can convince others to help you out. Responding to these topics: 1. Does this need a language change or could these buffers be modeled with an extern? If so, then no language change is needed, just an architecture that supports buffering. 2. Is the idea the same as Ibanez et al.'s notion of events (https://dl.acm.org/doi/10.1145/3365609.3365848<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D197d3d73-46e6045e-197d7de8-8692dc8284cb-f12c9d155507dc84%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253Df20888c1-ad93b1c1-f208c85a-86fc6812c361-90393c7e564bca59%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253Df95a03cb-a6c13ac9-f95a4350-869a14f4b08c-5b32820b43216c12%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fdl.acm.org%25252525252Fdoi%25252525252F10.1145%25252525252F3365609.3365848%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011646679%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DvcqViatf%252525252BEyFhVY3Cqk94ho%252525252FDqieIcCqEmfCoPX1hLM%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326713999706%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253Dz8XsgbD%25252FmlSHZ7XiNxGee%25252F9bDDWGq5E9toh8Q9fNlcw%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587182411%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000>)? If not, what language changes would you propose? 3. Simple forms of modularity can be accomplished by treating controls as composable units. Note that they support constructors with statically-determined parameters. Otherwise, have you looked at Lyra and MicroP4 from SIGCOMM '20? (Minor: we have shifted to inclusive technology, so I recommend using the name "primary" and not the one you used.) -N On Tue, Jun 1, 2021 at 8:56 AM Gergely Pongracz via P4-design <p4-design@lists.p4.org<mailto:p4-design@lists.p4.org>> wrote: Hi, On the P4 WS we presented a use case that is I admit not very unique these days (implementing 5G network functions with P4), but while we were implementing these NFs we came across a few limitations in P4 and started to wonder whether some new features solving these issues could be part of P4 in the future. There are a few slides on these: https://opennetworking.org/wp-content/uploads/2021/05/Gergely-Pongracz-Slides.pdf<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprotect2.fireeye.com%2Fv1%2Furl%3Fk%3D317f2388-6ee41aa5-317f6313-8692dc8284cb-38bbdc8e91a3703a%26q%3D1%26e%3D6446927f-d577-4cff-a434-29e5fccda319%26u%3Dhttps%253A%252F%252Fnam04.safelinks.protection.outlook.com%252F%253Furl%253Dhttps%25253A%25252F%25252Fprotect2.fireeye.com%25252Fv1%25252Furl%25253Fk%25253D0d05633c-529e5a3c-0d0523a7-86fc6812c361-7cd5a710ca94385e%252526q%25253D1%252526e%25253D59368853-ae85-41d3-b2f8-ae81acdbed03%252526u%25253Dhttps%2525253A%2525252F%2525252Fnam04.safelinks.protection.outlook.com%2525252F%2525253Furl%2525253Dhttps%252525253A%252525252F%252525252Fprotect2.fireeye.com%252525252Fv1%252525252Furl%252525253Fk%252525253Dcb6a12ef-94f12bed-cb6a5274-869a14f4b08c-1cb642ccaa4f8b6f%2525252526q%252525253D1%2525252526e%252525253D904f2a87-34c8-4d7b-a5d3-e68eda85728c%2525252526u%252525253Dhttps%25252525253A%25252525252F%25252525252Fopennetworking.org%25252525252Fwp-content%25252525252Fuploads%25252525252F2021%25252525252F05%25252525252FGergely-Pongracz-Slides.pdf%25252526data%2525253D04%252525257C01%252525257Cmbudiu%2525252540vmware.com%252525257C6660e11d35c548622f1408d92d2e07e8%252525257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%252525257C0%252525257C1%252525257C637590498011656674%252525257CUnknown%252525257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%252525253D%252525257C1000%25252526sdata%2525253DmDv72Q4LaMCZY2CXUh4K7DqxFM6Uo%252525252B5A9qFK2YlWA%252525252Fk%252525253D%25252526reserved%2525253D0%2526data%253D04%25257C01%25257Cmbudiu%252540vmware.com%25257Cc36a67bb44594dbe4d2608d930a97806%25257Cb39138ca3cee4b4aa4d6cd83d9dd62f0%25257C0%25257C1%25257C637594326714009697%25257CUnknown%25257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%25253D%25257C1000%2526sdata%253DzkIXSfeyQXgxwXdePyXQwoRiZFVCCyVqpuCBc9mHDxA%25253D%2526reserved%253D0&data=04%7C01%7Cmbudiu%40vmware.com%7C6107be61df8b41f4b7b708d932384790%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637596039587192408%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIj> Basically there are the following 3 cases: 1. Short-term (~1 RTT) buffering: this would be handy for segmentation-reassembly and retransmit loop use cases. Basically some “buffer”, and “remove” actions would be needed preferably with timer support 2. More generic buffering and time-based events: for programmable traffic management, packet scheduling or keepalive messages we could use a more generic method. Possibly the problem is very similar to the previous on the API level, but in these cases there is probably less limitations on the buffer size, which could make this a bit more tricky from hardware development perspective 3. Modular pipelines: this would mean that we could specify multiple pipelines and a “master” pipeline that would call the underlying ones as subroutines. Perhaps more than 2 layers could be allowed, e.g. a “master” can also be re-used as a module by a higher layer pipeline. Probably this could be solved entirely on the compiler level. We also described a few less important cases (e.g. re-using tables, registers, etc.), but we could easily find workarounds to overcome those limitations, so let’s focus on the ones above. I’m quite new to this community although we’ve been using P4 for a while now. So I don’t really know what is the best way to start discussing these issues and if you find these useful, how to start working on (some of) these. Thanks for any hints and help. BR, Gergely _______________________________________________ P4-design mailing list -- p4-design@lists.p4.org<mailto:p4-design@lists.p4.org> To unsubscribe send an email to p4-design-leave@lists.p4.org<mailto:p4-design-leave@lists.p4.org>