[P4-arch] First round of comments and questions on draft PSA

Vladimir Gurevich vag at barefootnetworks.com
Sun Jun 25 19:16:22 EDT 2017

Adding to Andy's comments... I wonder if we should make it an issue on
Github for easier tracking?

In no particular order:

   1. It would be really nice if the authors could standardize on the
   naming conventions. Why there types, named PortId_t, entry_key (without _t)
   and psa_parser_input_metadata_t? if everything is PSA-specific, then let's
   use psa_ everywhere. If we want to use "_t" for types, let's be consistent.
   Also, it would be nice if we can establish some conventions with regards to
   case CamelCase, underscores, etc.
   2. "Write-only" methods. This particular choice will force most programs
   (except for the trivial ones) to keep separate metadata to keep track of
   the port/multicast_group/drop decisions and use these primitives only at
   the very end of the ingress control. For example, very often an ACL might
   want to match on an egress port for a unicast packet. Or, sometimes, an ACL
   might want to reverse a drop() decision. Neither is possible with the
   current set of PRE methods without having additional metadata carried
   3. What happens if neither send_to_port(), nor multicast(), nor drop()
   are called?
   4. What happens if port, specified in send_to_port doesn't exist on a
   given device?
   5. The relationship between psa_ingress_output_metadata_t field
   egress_port  and PRE is quite unclear. Perhaps, it's an oversight and
   should have been removed
   6. Semantic of EgressInstance_t and the corresponding field in
   psa_egress_input_metadata_t is not defined. For example, what's the value
   for a non-multiacst packet or for a clone?
   7. The semantics of port argument in clone() and resubmit() operations
   is not clear
   8. BQE raises the same questions as PRE (see 2-4)
   9. Given that PRE is not passed to the egress control, it is totally not
   clear how egress cloning or recirculation is going to work
   10. What does "send to CPU" mean? (CPU_PORT semantics)
   11. It would be nice if the behavior of getHash() could be specified
   more formally. What is "base", what is "max", how are they used? The usage
   example (BTW, is it really allowed to instantiate hash engine in the
   parser?) does not match the definition of the extern. Also, it is not clear
   how a 56-bit hash can be generated using crc16 algorithm.
   12. Indirect counters, meters and registers. What happens if index is
   13. Meters and registers: what's the initial state?
   14. Section 11: Packet Generation should probably be renamed to "Digest
   generation" or something like that.
   15. ParserStatus_t -- how exactly it relates to the predefined "error"
   16. What is ParserErrorLocation_t?
   17. Based on the definition, checksum errors need to be passed to the
   ingress control independently, suing a mechanism, separate from
   ParserStatus_t. Is that correct?

There are, I am sure more issues, but that might be enough for our first
meeting tomorrow.


On Sun, Jun 25, 2017 at 9:24 AM, Andy Fingerhut <andy.fingerhut at gmail.com>

> These comments are all for the version published here, as of 2017-Jun-25:
> http://p4lang.github.io/p4-spec/docs/PSA.html
> The HTML had the date "May 15, 2017" in the title page.
> Andy
> Typos:
> "pipleline" -> "pipeline"
> "templetized" -> "templatized"  (root English word is "template")
> "ingres," -> "ingress,"
> "bit<56>" -> "bit<16>" in example usage of hash function. A crc16 has
>     a 16-bit output, not a 56-bit output.
> "ivoked" -> "invoked"
> This is not correct P4_16 syntax, according to the latest p4c-bm2-ss
> as of 2017-Jun-16:
>     typedef enum { NORMALINSTANCE_, CLONE_INSTANCE } InstanceType_t;
> Perhaps use this instead?  It allows values to be defined later with
> type "InstanceType_t".
>     enum InstanceType_t { NORMALINSTANCE_, CLONE_INSTANCE }
> What does this statement mean?
>     "The PSA offers flexibility in defining the user metadata,
>     however, one may chose to use a common metadata types for subsets
>     of the objects."
> "It is an error to instantiate the PRE multiple times."  From later
> text, it appears that PRE is an extern object with no constructor, so
> it cannot be instantiated by the user's P4 program.  It seems much
> clearer to instead say: "The PRE extern object has no constructor, and
> thus it cannot be instantiated in the user's P4 program.  The
> architecture instantiates it exactly once, without requiring the
> user's P4 program to instantiate it."
> The same comments of the previous paragraph apply to the BQE.
> What does the paragraph below mean?  Does it have any effect on how a
> user writes P4_16 programs using psa.p4, or is it only intended to be
> a note to implementers?
>     Note: some of these operations may not be implemented as primitive
>     operations on a certain target. However, All of the operations can
>     be implemented as a combination of other operations. Applications
>     that rely on non-primitive operations may incur significant
>     performance penalties, however, they should be functionally
>     correct.
> For the text below defining the behavior of the drop operation, if the
> intent is that any side effects from invoking other extern method
> calls (e.g. counter updates, meter updates, register writes) that are
> in the ingress control block after the drop operation is invoked still
> occur, then it would be best to say that.  Also good to specify that a
> packet dropped in ingress will never execute code in the egress
> pipeline (if that is the intent).
>     "While the drop operation can be invoked anywhere in the ingress
>     pipeline, the semantics supported by the PSA is that the drop will
>     be at the end of the pipeline (ingress or egress)."
> The sentence below describing the truncate operation should specify
> the units of the length, e.g. bits?  bytes?
>     "The length parameter represents the packet length."
> Why is there a send_to_port() operation defined for the BQE?  I would
> have guessed that for the PSA, the egress port is determined during
> ingress (via an ingress send_to_port() call for unicast packets, or
> for multicast packets, each copy's output port is determined by the
> PRE and how its multicast groups have been configured by the control
> plane).  The packet scheduling algorithms that select which packet to
> send to the egress pipeline next can be thrown off if the egress
> processing can change the output port from what ingress determined,
> yes?  Also, there is no 'out' metadata from the Egress control block
> in Section 4.
> Regarding this question: "are random hash algorithms useful?"  It is
> not clear to me what is meant by a random hash algorithm.  If you mean
> "a random value that is independent of the content of the packet and
> its metadata", then that sounds like a pseudo-random number generator,
> which is useful for things like implementing RED and similar
> algorithms with randomized choices.  If you mean a hash algorithm that
> takes as input some kind of "seed" value that is randomly generated,
> plus some header fields, but the result is a deterministic function of
> the seed and the header/metadata values, then that can be described as
> a normal deterministic hash function with some input bits that were
> randomly generated, and I would recommend it be implemented that way.
> If something else is meant, better to add more detail to the question.
> Section on "Hash function".  There is no description of what the 'max'
> parameter of getHash does.
> If the intent is for hashes and checksums to share the type
> 'HashAlgorithm', then HashAlgorithm should have the 16-bit 1's
> complement checksum used by the IPv4 header to its choices.
> The spec might want to note that 'remove' will only work for some hash
> functions, and not others, e.g. it is easy to remove some data from a
> 1's complement checksum calculator, but much more complex, and
> position-dependent, to remove it from a CRC calculation.
> For counters that have a number of counters that is an exact power of
> 2, e.g. 1024, a single type S that is bit<10> is just right for the
> 'index' parameter to 'count', but it is 1 bit too small for the
> 'n_counters' parameter of the constructor.  Might be good to have a
> separate type for the 'n_counters' paremeter?
> For a 'bytes' or 'packets_and_bytes' counter type, count() having a
> second parameter 'increment' that specifies how much to add to the
> byte counter is a very good thing.  It is best if the P4 program has
> the flexibility to choose the length in bytes it wants to use for the
> packet, e.g. in case it is increasing or decreasing the received
> packet's size before transmitting it.
> For counters that have type 'packets', presumably every call to
> 'count' will add exactly 1, yes?  In that case, having a separate
> method call that only takes an 'index' parameter would be best, since
> the 'increment' parameter value would be ignored, anyway.
> The behavior of the control plane API calls should be documented.
> What is the intended meaning of 'sync_read'?
> What do control plane calls 'start' and 'stop' do?
> Suggested alternate names, if I understand the meanings correctly:
> 'write' instead of 'set'.  'clear' instead of 'reset'.  A
> 'read_and_clear' atomic operation is very common for counters, and
> should be considered for addition, unless one of the other operations
> is intended to do that.
> If direct counters with a byte count are intended to be supported, it
> should be specified what byte count is used for a packet.  Note:
> Someone will be unhappy with the choice in at least some situations,
> no matter what you choose.  You could recommend using simply packet
> counters if they don't like the specified choice, or use an explicit
> counter that is not direct, where they get to choose the length
> themselves.
> For meters, the length of the packet used for meters with type 'bytes'
> must be specified.  Even better, add a 'length' parameter to the
> 'execute' call that lets the P4 program author choose what length to
> use.  That will lead to the same issue for direct meters as was
> mentioned for direct counters above, with the same solution -- don't
> use direct meters when you want to choose a custom length.
> "PRNG" as a choice for "RandomDistribution" doesn't mean anything to
> me.  "Uniform" has a well-defined meaning in pseudo-random number
> generation, and seems a better name to use, if that is what "PRNG" was
> intended to mean.  I have commented before elsewhere that supporting
> _only_ uniform seems quite reasonable.  If a P4 program writer wants a
> non-uniform distribution, there are multiple ways to achieve that,
> such as doing arithmetic on the uniformly generated value to make the
> results non-uniform, or also performing a table lookup using the
> uniform random value as the search key, so the control plane can
> create whatever distribution it wishes.
> In the section on 'Programmable blocks', it would be best if the
> following things were specified explicitly.
> + The original value of fields inside 'user_meta' when the Parser is
>   invoked.  Are they all undefined?
> + Is every field of 'hdr' and 'user_meta' given as input to
>   VerifyChecksum supposed to be identical to the result output by
>   Parser?  If so, best to say it explicitly.  Similarly for the rest
>   of the pipeline.
> + An implementation will likely have limits on how much data it can
>   pass through the packet buffer in the PRE.  Should the PSA spec say
>   any words about this, and how it affects P4 program writers?  For
>   example, it could affect portability of programs quite significantly
>   if some programs expect they can carry 1000 bits of user-defined
>   metadata with values written in the ingress control block, and have
>   them all available in the egress block for use.
> _______________________________________________
> P4-arch mailing list
> P4-arch at lists.p4.org
> http://lists.p4.org/mailman/listinfo/p4-arch_lists.p4.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p4.org/pipermail/p4-arch_lists.p4.org/attachments/20170625/d4d46842/attachment-0002.html>

More information about the P4-arch mailing list