[P4-dev] General question on P4

Andy Fingerhut andy.fingerhut at gmail.com
Sat Apr 8 15:39:01 EDT 2017


I haven't seen published packets/sec numbers for Tofino, either, but they
do publish 65 x 100GE interfaces on their fastest device, and they
published that there are 4 independent pipelines, each most likely handling
1/4 of those ports.  I know that their architecture can achieve 1 packet
per clock cycle throughput, and clock rates in the 1 GHz to 1.5 GHz range
are definitely achievable, if not even higher.  Conservatively that puts
them at 4 billion packets per second, which at 65 x 100GE ports would
handle an average packet size of 200 bytes at line rate.  6 billion packets
per second would drive down the average packet size to 111 bytes.  Their
fastest device is likely somewhere in that 4 to 6 billion packets/sec range.

Andy

On Sat, Apr 8, 2017 at 11:48 AM, Michael Borokhovich <michaelbor at gmail.com>
wrote:

> Hi Andy,
>
> Thank you for the insight. If this is the performance difference, then of
> course the advantage of P4 ASIC (e.g., Tofino) is obvious. I see that
> EZchip NP5 supports 300 millions packets per second. But I didn't find a
> similar spec for Tofino. Also, this comparison should be done for
> comparable programs since each additional piece of functionality
> (parsing/modifying an additional header field or doing an additional table
> search) affects this pps metrics.
>
> But again, if Tofino indeed achieves ~10 times more pps than e.g., EZchip
> NP5 for the same program, than I clearly see the benefit and the novelty.
>
> Michael.
>
>
> On Fri, Apr 7, 2017 at 5:52 PM, Andy Fingerhut <andy.fingerhut at gmail.com>
> wrote:
>
>> In case it isn't obvious, max packet rate that you can achieve in an ASIC
>> turns into a significant difference in cost when buying the equipment and
>> paying the power bill for a network.
>>
>> Suppose you have a choice of a programmable ASIC that goes at 2 billion
>> packets per second, and an NPU that goes up to 200 million packets per
>> second, and they both cost roughly the same amount and consume the same
>> power.
>>
>> You have some part of a data center connecting a bunch of hosts together
>> where you decide that kind of programmability is important.  You do some
>> calculations to determine those hosts need 200 billion packets per second
>> of forwarding capacity between them.
>>
>> Do you want buy and provide power for 200/2 = 100 fast programmable
>> ASICs, or 200/.2 = 1,000 programmable NPUs?
>>
>> Andy
>>
>> On Fri, Apr 7, 2017 at 2:37 PM, Andy Fingerhut <andy.fingerhut at gmail.com>
>> wrote:
>>
>>> I don't have experience with all NPUs, but many I have seen top out on
>>> the order of hundreds of millions of packets per second with current
>>> technology.
>>>
>>> With the same current technology, it is possible to design fixed
>>> function ASICs, and programmable ASICs like Barefoot's Tofino, that achieve
>>> billions of packets per second.
>>>
>>> The main difference that I am aware of is that many NPUs are based on
>>> parallel arrays of 32-bit or 64-bit processor cores, and each core requires
>>> many cycles for things like constructing table search keys and performing
>>> side effects on the 'packet vector' (state maintained while forwarding the
>>> packet about that packet only).  If you want to go at billions of packets
>>> per second, the only way I know to get there is to have fixed or
>>> configurable hardware that can do those things in 1 or 2 clock cycles per
>>> packet.
>>>
>>> You can write a compiler that compiles a P4 program to run on an NPU as
>>> described above, and it will achieve portability of the P4 program, but it
>>> won't make that NPU able to go at billions of packets per second.  It is
>>> limited in performance by its hardware architecture.
>>>
>>> There are proprietary methods for programming some ASICs that can go at
>>> billions of packets per second, but all that I know of are lower level than
>>> P4 and non-portable.
>>>
>>> Andy
>>>
>>> On Thu, Apr 6, 2017 at 6:37 PM, Michael Borokhovich <
>>> michaelbor at gmail.com> wrote:
>>>
>>>> Hi Remy,
>>>>
>>>> I'm not confusing hardware with the language... What I mean is that P4
>>>> + ASIC that supports it claims to give us programmable data-plane and this
>>>> is claimed to be the innovation. But that is exactly the purpose of NPUs -
>>>> to give us programmable data-plane and NPUs are around for a very long
>>>> time. So maybe I'm missing the point of innovation that P4 + ASIC that
>>>> supports it gives. As Nate said, and I agree, one big advantage is
>>>> portability and the other - ability to do verification.
>>>> So, P4 brings kind of an open standard for programmable ASICs which is
>>>> analogous to a programming language (e.g., C) for regular CPUs. While each
>>>> NPU currently have its own language and a programming style.
>>>>
>>>> What do you think?
>>>>
>>>> Thanks,
>>>> Michael.
>>>>
>>>>
>>>> On Thu, Apr 6, 2017 at 2:07 PM, Remy Chang <remy at barefootnetworks.com>
>>>> wrote:
>>>>
>>>>> Hi Michael,
>>>>> It seems you're conflating hardware with language.  NPU, programmable
>>>>> ASIC, general purpose CPU, and even GPU can all potentially execute p4
>>>>> code.
>>>>>
>>>>> Regards,
>>>>> Remy
>>>>>
>>>>>
>>>>>
>>>>> On Apr 6, 2017 10:57, "Michael Borokhovich" <michaelbor at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thanks for the reply Nate!
>>>>>
>>>>> So, to summarize, the benefits of P4 approach are: portability and
>>>>> performance. Other than that you probably can achieve the same (if not
>>>>> better) flexibility/programmability with an NPU. Is this correct?
>>>>>
>>>>>
>>>>> On Thu, Apr 6, 2017 at 1:01 AM, Nate Foster <jnfoster at cs.cornell.edu>
>>>>> wrote:
>>>>>
>>>>>> Your question seems to be more about the relative merits of various
>>>>>> architectures than the P4 language. But yes an ASIC is generally more
>>>>>> efficient than an NPU, at least at scale.
>>>>>>
>>>>>> Beyond efficiency there are other benefits to expressing a data plane
>>>>>> algorithm in an open framework like P4. For example, a P4 programs should
>>>>>> be relatively easy to port to a different target. The same is unlikely to
>>>>>> be true for C programs written against closed SDKs.
>>>>>>
>>>>>> -N
>>>>>>
>>>>>> On Wed, Apr 5, 2017 at 6:59 PM, Michael Borokhovich <
>>>>>> michaelbor at gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> P4 allows for configurable data-plane, e.g., we can easily support
>>>>>>> new custom protocols. However, the same functionality may be achieved by
>>>>>>> using a network processor, e.g., EZchip (the one I had experience with).
>>>>>>>
>>>>>>> As I understand, the advantages of programmable ASIC/FPGA that
>>>>>>> supports P4 is better performance and a lower price than a network
>>>>>>> processor?
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Michael.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> P4-dev mailing list
>>>>>>> P4-dev at lists.p4.org
>>>>>>> http://lists.p4.org/mailman/listinfo/p4-dev_lists.p4.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> P4-dev mailing list
>>>>> P4-dev at lists.p4.org
>>>>> http://lists.p4.org/mailman/listinfo/p4-dev_lists.p4.org
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> P4-dev mailing list
>>>> P4-dev at lists.p4.org
>>>> http://lists.p4.org/mailman/listinfo/p4-dev_lists.p4.org
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.p4.org/pipermail/p4-dev_lists.p4.org/attachments/20170408/f975acda/attachment-0002.html>


More information about the P4-dev mailing list