[P4-design] Maybe too much text describing issues around P4 table and parser value set sizes

Vladimir Gurevich vag at barefootnetworks.com
Mon Mar 26 13:06:54 EDT 2018


Hi Mihai,

I think that this is a very good observation (that tables are more like
interfaces and that the size might be better left to a constructor). Having
said that, I think we've already noticed that more often than not in P4 we
use each type/interface only once and thus it makes sense to have
accommodations in the language to allow people to define both inside the
same construct.

Ultimately, what I think P4 programmers really need is a language that is
easy to read and write that will also allow automatic generation of the
control plane APIs.

Thanks,
Vladimir

On Mon, Mar 26, 2018 at 8:32 AM, Mihai Budiu <mbudiu at vmware.com> wrote:

> This happens all the time.  For example, java hashset has a constructor
> with initial capacity. The meaning is not very clear, but it is the only
> way to convey information about the resources to allocate for that data
> structure. The problem with tables is that they are more like an interface
> (think map<k,v>) than classes, so the size is logically a constructor
> argument for a not very precisely specified implementation of this
> interface.
>
>
>
> Mihai
>
>
> ------------------------------
> *From:* P4-design <p4-design-bounces at lists.p4.org> on behalf of Nate
> Foster <jnfoster at cs.cornell.edu>
> *Sent:* Monday, March 26, 2018 7:07:45 AM
> *To:* Vladimir Gurevich
> *Cc:* p4-design
>
> *Subject:* Re: [P4-design] Maybe too much text describing issues around
> P4 table and parser value set sizes
>
> I'm wading into this conversation late. How does an upper bound on
> resources help the programmer? I can understand the utility of a
> programming abstraction that said "this table can hold at least N
> elements." But I'm struggling to understand how one would effectively use
> "up to N." Maybe it's just a compiler hint that is not actually useful to
> the programmer?
>
> -N
>
> On Sun, Mar 25, 2018 at 10:55 PM, Vladimir Gurevich <
> vag at barefootnetworks.com> wrote:
>
>> Hi Andy,
>>
>> Yes, I think this recommendation is reasonable, although I'd probably
>> reword it a little bit, because the word "guarantees" worries me a little.
>> How about something like that:
>>
>> “If a table or parser value set has a size attribute specified for it
>> with value N, it is recommended that a compiler should choose a data plane
>> implementation that can store (hold)  N exact, lpm, and/or ternary
>> entries.  It does not guarantee though that an arbitrary set of N entries
>> can always be inserted in such a table".
>>
>> Note that deliberate difference between being able to hold N entries and
>> being able to insert arbitrary N entries
>>
>> Thanks,
>> Vladimir
>>
>> On Sun, Mar 25, 2018 at 7:47 PM, Andy Fingerhut (jafinger) <
>> jafinger at cisco.com> wrote:
>>
>>> When you say the phrase “up to N”, do you mean the same thing as the
>>> following sentences, or do you mean something different?
>>>
>>>
>>>
>>> “If a table or parser value set has a size attribute specified for it
>>> with value N, it is recommended that if a compiler can choose a data plane
>>> implementation that guarantees that an arbitrary set of N exact, lpm,
>>> and/or ternary entries can be added, at a reasonable cost, it should do
>>> so.  However, a compiler is allowed to choose data plane implementation
>>> that can fail to add a new entry even if significantly fewer than N entries
>>> are currently installed.”
>>>
>>>
>>>
>>> In other words, nothing is guaranteed, but there is a specific kind of
>>> recommendation given, and simultaneously a warning to people using the
>>> ‘size’ attribute that they are not guaranteed they can add N entries to
>>> such an object, unless they find those guarantees outside of the language
>>> spec (e.g. in their P4 vendor’s custom attributes/compiler options).
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>>
>>>
>>> *From: *Vladimir Gurevich <vag at barefootnetworks.com>
>>> *Date: *Sunday, March 25, 2018 at 7:37 PM
>>>
>>> *To: *"Andy Fingerhut (jafinger)" <jafinger at cisco.com>
>>> *Cc: *Mihai Budiu <mbudiu at vmware.com>, p4-design <p4-design at lists.p4.org
>>> >
>>> *Subject: *Re: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>>
>>>
>>> I think that your article is a great explanation and with "up to N"
>>> clause I think it provides a totally usable definition that is
>>> understandable by most people. I was responding to Mihai's suggestion that
>>> because we can't have a "formal" definition of "size" it's better to leave
>>> it alone.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vladimir
>>>
>>>
>>>
>>> On Sun, Mar 25, 2018 at 7:34 PM, Andy Fingerhut (jafinger) <
>>> jafinger at cisco.com> wrote:
>>>
>>> Vladimir, do you have a specific recommendation on what you think the
>>> language spec should say about the ‘size’ table attribute?  That is what I
>>> was going for in part of my most recent message, and I don’t think it was
>>> trying to tie the hands of implementers much at all.
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>> *From: *Vladimir Gurevich <vag at barefootnetworks.com>
>>> *Date: *Sunday, March 25, 2018 at 7:31 PM
>>> *To: *"Andy Fingerhut (jafinger)" <jafinger at cisco.com>
>>> *Cc: *Mihai Budiu <mbudiu at vmware.com>, p4-design <p4-design at lists.p4.org
>>> >
>>>
>>>
>>> *Subject: *Re: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> Hello,
>>>
>>>
>>>
>>> I think we need to re-state the basic goals of our activities.
>>> Specifically
>>>
>>>    1. Are we trying to define a super-formal language that will allow
>>>    one to create not only super-portable programs, but even super-portable
>>>    APIs that control those programs to the extent that you can swap targets
>>>    (ranging from high-speed ASICs to fully SW implementations and anything
>>>    in-between) as easily as one swaps one pair of gloves for another
>>>    *or*
>>>    2. Are we trying to define a practical language that would allow
>>>    people to easily create data plane programs on target device of their
>>>    choice and generate a corresponding control plane API?
>>>
>>> I think that goal #2 is quite achievable, but the goal #1 is,
>>> unfortunately, not. Moreover, trying to get closer to goal #1 will probably
>>> take us further away from being able to define a practical, usable language
>>> that gives data plane designers the ability to get things done.
>>>
>>>
>>>
>>> I totally agree that it is impossible to formally define what "size" is.
>>> The question is: should that preclude us from moving forward and having
>>> this in the language? Even in Euclidian geometry there are terms that are
>>> not defined, e.g a point and a straight line. That's ok. Having "size" and,
>>> perhaps, other similar attributes in the language makes programs generally
>>> more readable and understandable. In fact, it even makes them more
>>> target-independent and more suitable for optimization, because it would not
>>> require people to use the externs, requiring a very specific implementation
>>> (as you suggested).
>>>
>>>
>>>
>>> I can certainly see the value of implementation externs, especially if
>>> they would allow us to use fewer annotations, but unfortunately they will
>>> not, because any real implementation(e.g. for hash-tables) will require you
>>> to specify anywhere from one to dozens of parameters per table, most of of
>>> them optional -- something externs are not very good at (not to mention
>>> that it would be nice if they were named).
>>>
>>>
>>>
>>> I urge everyone to keep P4 a practical language and sometimes that means
>>> that we might need to have things there that we might not be able to
>>> formally define. That's OK.
>>>
>>>
>>>
>>> Thanks,
>>> Vladiimir
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Mar 25, 2018 at 5:55 PM, Andy Fingerhut (jafinger) <
>>> jafinger at cisco.com> wrote:
>>>
>>> If ‘size’ were an integer that meant, ‘the compiler will implement this
>>> table with something that can in some cases install up to N entries, but
>>> how likely being able to reach N entries is implementation- and
>>> table-specific’, then the value of N could be used as the “raw capacity” of
>>> a hash table, for example.
>>>
>>>
>>>
>>> This might be what Vladimir means when he says “up to N”.  It seems a
>>> bit odd to me for an implementation to use, say, a size of 16384 for a hash
>>> table if then takes that value and uses it as the raw theoretical maximum
>>> capacity of a hash table, where in practice that could mean that adding a
>>> set of 12,000 keys causes the 12,000th one to fail to add because of
>>> too many collisions with earlier added entries.  75% to 95% typical usable
>>> capacity from a medium to large hash table is about the best I’ve seen
>>> (where the actual fraction depends on the raw size, how many hash
>>> functions, overflow TCAM size used, if any, etc.).
>>>
>>>
>>>
>>> I think it would be reasonable to define ‘size’ to take an integer type
>>> value in the language spec, and leave the meaning of how that value is used
>>> up to the implementation, with just a general guideline of “it is
>>> recommended if you can make a table / parser value set that guarantees that
>>> many exact, lpm, and/or ternary entries can be added, but an implementation
>>> is allowed to use implementations that can fail to add entries at lower
>>> number of entries than the given size’.
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>> *From: *Mihai Budiu <mbudiu at vmware.com>
>>> *Date: *Sunday, March 25, 2018 at 5:17 PM
>>> *To: *"Andy Fingerhut (jafinger)" <jafinger at cisco.com>, Vladimir
>>> Gurevich <vag at barefootnetworks.com>
>>> *Cc: *p4-design <p4-design at lists.p4.org>
>>> *Subject: *RE: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> The conclusion is that it is impossible to give a simple description of
>>> the meaning of “size”.
>>>
>>>
>>>
>>> There are two possibilities going forward:
>>>
>>>    - We specify in the spec that the tables and value_sets can have a
>>>    size integer parameter, but the meaning is target-dependent
>>>    - We do not standardize these at all. Tables can still have a size
>>>    parameter (it is not mandated or forbidden by the spec, as it is today),
>>>    but each target can choose a different way to specify sizes. An example is
>>>    given below. For value_sets we have to use an annotation.
>>>
>>>
>>>
>>> For tables I will submit again the example in ebpf_model, which is
>>> perfect for this particular target:
>>>
>>>
>>>
>>> /*
>>>
>>> Each table must have an implementation property which is either an
>>> array_table
>>>
>>> or a hash_table.
>>>
>>> */
>>>
>>>
>>>
>>> /**
>>>
>>> Implementation property for tables indicating that tables must be
>>> implemented
>>>
>>> using EBPF array map.
>>>
>>> */
>>>
>>> extern array_table {
>>>
>>>     /// @param size: maximum number of entries in table
>>>
>>>     array_table(bit<32> size);
>>>
>>> }
>>>
>>>
>>>
>>> /**
>>>
>>> Implementation property for tables indicating that tables must be
>>> implemented
>>>
>>> using EBPF hash map.
>>>
>>> */
>>>
>>> extern hash_table {
>>>
>>>     /// @param size: maximum number of entries in table
>>>
>>>     hash_table(bit<32> size);
>>>
>>> }
>>>
>>>
>>>
>>> table t {
>>>
>>>    keys = …
>>>
>>>    implementation = hash_table(32);
>>>
>>> }
>>>
>>>
>>>
>>> The benefit of this approach is that an architecture can define whatever
>>> parameters it deems necessary for describing the implementation. If
>>> Vladimir likes max_size, min_size, average_size and other parameters, he
>>> can define an extern which has all these as constructor arguments for his
>>> architecture.
>>>
>>>
>>>
>>> The downside is that this is certainly target-specific, and thus
>>> non-portable.
>>>
>>>
>>>
>>> But it looks from Andy’s description that the portability of making size
>>> a standard feature is illusory.
>>>
>>>
>>>
>>> Mihai
>>>
>>>
>>>
>>> *From:* P4-design [mailto:p4-design-bounces at lists.p4.org] *On Behalf Of
>>> *Andy Fingerhut (jafinger)
>>> *Sent:* Saturday, March 24, 2018 5:10 PM
>>> *To:* Vladimir Gurevich <vag at barefootnetworks.com>
>>> *Cc:* p4-design <p4-design at lists.p4.org>
>>> *Subject:* Re: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> If by “up to N” you mean “up to N with high probability”, or “up to N
>>> for most practical sets of keys you probably want to install”, yes that is
>>> what I was trying to convey in the article for hash tables.  I wasn’t
>>> trying to say that ATCAMs are any worse than hash tables in that regard
>>> (although some variations I have worked with _*are*_), just that they
>>> are no better in them, either.
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>> *From: *Vladimir Gurevich <vag at barefootnetworks.com>
>>> *Date: *Saturday, March 24, 2018 at 4:36 PM
>>> *To: *"Andy Fingerhut (jafinger)" <jafinger at cisco.com>
>>> *Cc: *p4-design <p4-design at lists.p4.org>
>>> *Subject: *Re: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>>
>>>
>>> I totally agree that the practical ATCAM capacity from the control plane
>>> point of view (e.g. how many routes you can put into a ATCAM with N entries
>>> divided into M partitions) is difficult to predict, but "up to N" statement
>>> can still be holding some water :) I think that this is an inherent ATCAM
>>> property, much like the fact that you can't generally guarantee any
>>> specific occupancy for a hashed table (as you clearly outlined in the
>>> article).
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Vladmir
>>>
>>>
>>>
>>> On Sat, Mar 24, 2018 at 10:40 AM, Andy Fingerhut (jafinger) <
>>> jafinger at cisco.com> wrote:
>>>
>>> Thanks for the comments, Vladimir.  I agree with just about every word,
>>> and would be happy to incorporate your comments into it, perhaps in another
>>> week or so after I get some other things done.
>>>
>>>
>>>
>>> One quick response here.  The article does not say that algorithmic
>>> TCAMs are not “fast”.  It says: ‘All techniques I am aware of here either
>>> cannot go "fast" as defined above, or have capacity that is dependent upon
>>> the contents of the table entries.’.  If you know of an algorithmic TCAM
>>> method that is both fast _and_ has capacity independent of the contents of
>>> the table entries, then I really want to hear about it.  I’ve spent way too
>>> much of my life dealing with the non-deterministic capacity of algorithmic
>>> TCAMs, for Cisco platforms where they wish they could claim the capacity
>>> was a fixed number N.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Andy
>>>
>>>
>>>
>>> *From: *Vladimir Gurevich <vag at barefootnetworks.com>
>>> *Date: *Saturday, March 24, 2018 at 9:06 AM
>>> *To: *"Andy Fingerhut (jafinger)" <jafinger at cisco.com>
>>> *Cc: *p4-design <p4-design at lists.p4.org>
>>> *Subject: *Re: [P4-design] Maybe too much text describing issues around
>>> P4 table and parser value set sizes
>>>
>>>
>>>
>>> Hi Andy,
>>>
>>>
>>>
>>> Great stuff, worthy of a book chapter or an encyclopedia article.
>>>
>>>
>>>
>>> If anything, I'd just say that some of the assumptions are a little
>>> "rosy" meaning that they cover the "classic" behaviors, but do not account
>>> for a variety of idiosyncratic behaviors that are often encountered in real
>>> life. Call them bugs, features or both, but they do have an effect on table
>>> capacity. One big topic you sort of omitted is the question of how to count
>>> default actions: are they entries or not? do they need to be included in
>>> total entry count or not (i.e. should the "natural" size be 2^N, 2^N-1 or
>>> 2^N+1)? Even in "good" cases, such as TCAMs or indexed tables, I am sure
>>> that different implementations will answer this question differently. In
>>> other cases control plane might need to reserve additional entries for a
>>> variety of technical/efficiency reasons (faster entry moves in TCAMs,
>>> atomic updates, garbage collection (in action profiles), etc.).
>>>
>>>
>>>
>>> Therefore, people usually advertise their table capacities as being "up
>>> to N entries", with that "up to" being the key, that allows customers to
>>> reasonably judge the device capacity, while freeing the vendors from
>>> frivolous lawsuits :)
>>>
>>>
>>>
>>> I know, there are different opinions in this area (mostly guided by (a)
>>> the n-to-1 case you described and (b) the fact that SW implementations
>>> might not need size indication at all), but coming from the ASIC
>>> background, I firmly believe that the basic "up to" size attribute is vital
>>> for both tables and value_sets and is preferred to an annotation, because
>>> it expresses a basic intent of the data plane program designer that the
>>> compiler cannot reasonably guess (as opposed to many technical allocation
>>> parameters (e.g. the number of ways in a hash table) that the compilers can
>>> determine automatically, often even better than humans). P4_14 even has
>>> min_size and max_size attributes in addition to size and I think that was
>>> not a bad idea (although I am not sure if they were implemented anywhere).
>>>
>>>
>>>
>>> Last, but not least, I am not quite sure what to make of the comment at
>>> the end of the article that assert that ATCAMs are not "fast". I think
>>> everything depends on the implementation, but if you can implement a
>>> multi-way hash table with O(1) lookup performance, there is no reason why
>>> you can't implement an ATCAM with the same performance either.
>>>
>>>
>>>
>>> Happy hacking,
>>>
>>> Vladimir
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Mar 23, 2018 at 9:49 PM, Andy Fingerhut (jafinger) <
>>> jafinger at cisco.com> wrote:
>>>
>>> In the last P4 language design WG meeting, I volunteered to write up
>>> some notes on P4 table sizes, and what “size” might mean.
>>>
>>>
>>>
>>> I have written up a discussion of the issues around P4 table and parser
>>> value set sizes here.  It is more a tutorial style directed at people who
>>> might be unfamiliar with issues around high speed implementations of
>>> P4-style matching, and what their consequences are for specifying table
>>> sizes, especially ones where the actual capacity of a table is not really
>>> predictable.
>>>
>>>
>>>
>>> https://github.com/jafingerhut/p4-guide/blob/master/docs/p4-
>>> table-and-parser-value-set-sizes.md
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jafingerhut_p4-2Dguide_blob_master_docs_p4-2Dtable-2Dand-2Dparser-2Dvalue-2Dset-2Dsizes.md&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=tGW6TKXajnoXSyy1S1P4DHGPe8sj54GGvw-b21n7aWg&m=igHNDkIsLYagXgGCPpIvkY56e6q_4mGo3TN4g8WOQYk&s=Mm_IWg6WCXnjrb1Vh5lONc_7WE2uO3BgmENj5HnM_BQ&e=>
>>>
>>>
>>>
>>> Corrections/questions/comments welcome.
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> P4-design mailing list
>>> P4-design at lists.p4.org
>>> http://lists.p4.org/mailman/listinfo/p4-design_lists.p4.org
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.p4.org_mailman_listinfo_p4-2Ddesign-5Flists.p4.org&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=tGW6TKXajnoXSyy1S1P4DHGPe8sj54GGvw-b21n7aWg&m=igHNDkIsLYagXgGCPpIvkY56e6q_4mGo3TN4g8WOQYk&s=Avv9VPJx5NI3SRUQy9aN7lk_rDLuHqX0_IwMBoDIA2E&e=>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> P4-design mailing list
>> P4-design at lists.p4.org
>> http://lists.p4.org/mailman/listinfo/p4-design_lists.p4.org
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.p4.org_mailman_listinfo_p4-2Ddesign-5Flists.p4.org&d=DwMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=tGW6TKXajnoXSyy1S1P4DHGPe8sj54GGvw-b21n7aWg&m=HtEMc144kjVLI8UOiRbK2DysOG9cW8NmpObH4JrKNQ0&s=mwuKF7A23WXDnIpLriKu7XwnHTVJV5pM4PHPbv0B6Sk&e=>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p4.org/pipermail/p4-design_lists.p4.org/attachments/20180326/a2cd626f/attachment-0001.html>


More information about the P4-design mailing list