Archives
Opening Plenary, Monday 11 May, 2015, at 2 p.m.:
HANS PETTER HOLEN: Hello everybody. Good afternoon everybody.
(Applause)
We are just passed 2:00 and this is the opening of the RIPE meeting. My name is Hans Petter Holen, I am the chairman of RIPE and has been so since RIPE 68. For those of you who don't know Rob Blokzijl was the chairman before me and he is still here in the audience so he is not letting us off on our own yet. This is RIPE meeting number 70. So, back 70 meetings ago, a group of people met to sort out all the problems or issues with Internet in Europe and that would just take a couple of meetings. We are still here.
So, so far, we have 671 registered for this meeting, only 364 has showed up, so later in the week you will probably have to squeeze into the middle to let every chair be in use because it's pretty full here, and this is really great to see the community is growing and still.
So, as I said, this is a meeting. You are not here for a conference to be entertained, you are here to participate in a meeting. And this meeting is open to everyone and we like it to stay like that, and this means that when we bring together people from different backgrounds, cultures, nationalities, beliefs and genders, we need to make sure that this stays safe, supportive and respectful. And if you, for any reason, feel that that is not the case, we have two trusted contacts, Nick, are you here, Mirjam? She is at the back, so if you feel you need to discuss this with them, then they are your trusted contacts and you can have confidential talks with them.
So we have a meeting plan and if you haven't noticed that is within your badge so you can open it up and look at it here to figure out what you ‑‑ what is of interest to you. For those of you who are following the policy arena, there is a presentation tomorrow on the results of the CRISP team working at what is going to happen when the US Government steps out of IANA. There is also going to be a Cooperation Working Group for that and there is going to be something on Friday for that. So if you are interested in politics ‑‑ or sorry, policy issues, that is where to go. I guess that the technical talks, you will easily find on your own.
On Wednesday and Thursday, you have all the Working Groups, and thanks to all the Working Group chairs that you can see up here now. There is actually a new Working Group chair for MAT Working Group, she just joined before this meeting and there is one secret one secret Working Group Chair here, Peter Koch, and if you are interested in some of these topics feel free to grab them and talk to them, that is what they are there for.
As I said, this is a meeting and we would like you to contribute so after every talk there is time set aside for questions and in the Working Groups even more time for discussions, there are microphones here between the seats, so please go to the microphone and please state your name clearly, so even if you have been here for 20 years, not everybody knows who you are so please state your name and affiliation so that everybody and especially our scribes can take down who said what. And as you can see on the screen here, we have a live transcript of what we are saying and sometimes they even type out what we should have been saying rather than what we are saying so that is very good.
As, you know, all of this is also webcasted so your friends at home can see you when you are here.
The content of the meanry together is all put together by a Programme Committee and you can see all their faces here. Filiz is going to take over after me, she is chairing the Programme Committee. All of these are elected by you, apart from three of them they are selected by the MENOG, the ENOG and RIPE Working Group, and there is going to be an election this week, so there are nominations up and there is going to be an election during this meeting.
Now, if ‑‑ when you have done all your technical or policy meetings and everything else, there are of course breaks here with coffee and so on. And in the evenings you will even find socials. Today you can meet the RIPE NCC Executive Board and there are welcome drinks, just after that. If you don't want to meet the board but just have a drink. Tomorrow, there is a party. That is at the south beach. On Thursday, there is the RIPE dinner. Wednesday is, as you can see, there is no social so that is the do it yourself day, you can hook up with the friends and colleagues that you have met at this meeting and figure out what to do yourself in Amsterdam. And you can find all of this on the website.
So, all of this is possible thanks to, of course, the RIPE NCC, who is putting all of this together for us, and the other sponsors that picks up the bill for the socials and other things here.
So that is everything I had to say now and I will hand over the microphone to Filiz and say welcome to this RIPE 70. Thank you.
(Applause)
FILIZ YILMAZ: Thank you. Thanks everyone. This is a great crowd and I have to extend my thanks to you further on. RIPE Programme Committee, which I am chairing the group of, who Hans Petter Holen showed the faces, we are responsible for the Monday and Tuesday plenary. Those agendas are built up on the submissions we receive and this meeting, it was great because we received 58 submissions out of which 81 of them came all in time before the first deadline. So, 13 slots were already filled in before the first end date or deadline, we call it. So thanks, thanks a lot. This is great, it helps a lot to be organised and only fill in few slots in the last few minutes.
So, with that, I will just make another note for the elections, if you are interested, the nominations will close tomorrow at 15:30, so you have today and tomorrow to come and talk to us if you need more information and also see if you will be interested and send your CVs and biographies and photos to us so they can be put in process. Greg, I will call our first speaker who will talk about Internet speed trends. So let's get on with that, thank you.
GREG HANKINS: Good afternoon, thanks for having me having me back to talk about Ethernet speeds, I showed this a couple of years ago but I like to open with this. This is a diagram that the original Ethernet drew about 40 years ago and I think it's cool because it turned out they got a lot of things right and Ethernet is definitely the predominant networking technology that we have across wire line and wireless infrastructure today, so I think that is pretty amazing that they drew this 40 years ago. There is a lot of stuff going on with the Ethernet so I wanted to give you an update all of the different speeds that we are developing and talk a little bit about the market development first. That is roadmap that was developed by the Ethernet alliance and if you follow the link they have a bunch of cool PDFs and graphics there. There is new market requirements that we are seeing in terms of speed, distance and cost and these are driving requirements for a variety of different new speeds and I will tell you about all these in detail but primarily we have wireless access points that are driving a speed for two‑and‑a‑half and 5 gig, we have servers driving a speed for 25 gig and we have core networks for 400 gig and higher. And the interesting thing is that we have the possibility over the next five to six years to have about 6 new Ethernet speeds avail on the market and this is as many speeds as we have had in the past 30 years so there is definitely a whole lot of development going around Ethernet.
So the target applications, 10 gig obviously is pretty widely deployed and we see that everywhere in the network. The needs for two‑and‑a‑half are driven largely by wireless and a very large installed base of cat 5e and cat 6. 25 gig is the new speed designed for servers and data centre aggregation. There is more interfaces for the 40 and 100 gig and we are also working on 400 gig as a new core technology.
I will talk about each of these in a bit of detail. Two‑and‑a‑half gig and 5 gig, there is two drivers, the main one is high speed wireless, we have wireless speed now that exceed the wired speeds so ‑‑ we had about 600 MEGS on the wireless side, now and possibly with dot 11 AX we are seeing speeds in excess of 7 gig and possibly four times faster. The rule of thumb you needed about 7 a 5% of the wired speed to the radio speed so that translates pretty directly into requirement for two‑and‑a‑half gig and five gig on the access side. Hired wired speeds are available but I will show you this on the next slide, they also require a different type of cabling and may or may not support POE.
In terms of the other drivers, the large installed cabling base, and I started pulling Cat 5 I think in the early 2000s so even before this study was done but the IEEE used a study that was done from 2003 onwards to 2014 and this study alone cites there were 58 million metres of Cat 5e and Cat 5/6 installed, 3 million outlets, so you can see that is a huge amount. If you think every building that exists in every city in the world, every hotel and office building and apartment building, every school, you know, they just have a huge installed base and we want to use those for higher speeds. There is technology available or 25 G and 40 G they require different types of cabling and they don't have the distance to go up to 100 meet on cat 5. There is other applications, wireless is the primary driver, IEEE small cell, high desk security cameras so pretty much anything that can use a higher access speed than a gig.
That leads us to the IEEE task force. It was just started last March so couple of months ago and they have their first meeting next week, there is not a whole report to report yet. They have defined three speeds, two‑and‑a‑half gig over 100 metre Cat 5 and also 5 gig over 100 meet Cat 5e and ‑‑ POE support including the this is especially important when you are trying to get really high power up to 60 watts on PE O. And you can see the task force web page at the bottom if you want any more information. And we expect the standard to be pretty soon. The lower speed standards are going pretty fast because it's reuse of existing technology, so we expect standard in 2016 and interfaces in 2016.
There are two industry study groups also behind the two‑and‑a‑half and five gig Ethernet efforts, the MG based T alliance, this is one as far as I can tell, driven primarily by broad com and the end base T alliance which is everyone accept broad com. So it's interesting to see how these two groups will influence the standards, broad com has published a proprietary spec so I am sure they will be used to influence how the standard works.
Moving on to 25 gig. The market driver for this is really to provide two things: Juan server connection speed that is really optimised for cost throughput and efficiency and also to maximise the fish receive the server to switch interconnect and I have some math on that will show you that. Using single 25 gives us a lot of advantages and we are using 25 gig all over the place, for 100 gig, for the cowy four signalling which is the signalling between the optical modules and the A6. We use it optical modules themselves. The four break out is very popular. So 25 is kind of the new ten, if we look at the electrical signalling that we have to work with and we expect to see quite a few developments around 25 gig technology in the future. What about 40 gig? Wasn't there this whole thing we were going to do 100 and we added 40 and known like that and it was bit confusion about that. Yes, that is true. 40 gig in comparison by now is also 8 years ago old and we are working on the technology, standardised 6 years ago, it uses inefficient four times signalling in comparison and also has higher cost and larger QS of V plus optic in comparison to the QS of P 28 and the bottom line again is that just different market requirements, there is one for 40 gig and 25 gig.
The interesting math is that there is going to be a bunch of 3.2 terabit chips coming out on the market this year, that will be shipped by vendors. So if you do some math, just using 3.2 terabits kind of you can see that if you use a single lain of 25 gig you get much better efficiency in terms of the port density and the fabric utilisation so. If we used 40 gig with four times 10 gig signalling. In this it's 28 servers and 400 gig port uplinks, we get about 50% utilisation. Now the interesting thing is that if you are building a very large hyperscale data centre with 100,000 servers and you try and connect all of these you can see you need about 2500 less switches if you use 25 gig V 40 gig so that is a pretty substantial difference in cap ex and augtex. And the other table shows gist some straight math if you have a number of ports with different lanes of signalling so you can see that with 25 gig you can actually get 128 usable ports on a 3.2 terabit ASIC with 100 percent utilisation. That is really what is driving the math behind that.
In terms of developments there is a lot going on, the 25 gig ‑‑ G ‑‑ study group they combined their meetings with the existing 40 base task force so going to be one standard, they are slid those two together. There is no change in the schedule or anything like that so dot 3 B Q will have a standard for 25 gig and 40 G and there is also the Ethernet task force which is working on four different objects tiffs, a back plain Ethernet, 3 and 5 metre copper cable Ethernet and 100 metre multi‑mode interface. They are well underway. There is the first draft generated and like we saw we are going to see the standard sometime next year also with interfaces sometime next year.
There is the 25 gig Ethernet consortium, this is an alliance that wsa was founded bay few key market players and they are developing two standards out of the IEEE, 25 gig Ethernet and 50 gig standard, the specification is kind of very specifically strangely in that it's only for back plain and twin ex copper but does ‑‑ a dresser preclude or active cabling or some sort of fibre interface so the standard unfortunately is only available to members but I can't tell what you is in it. We are probably going to see some fibre interfaces that are longer than 100 metres that will will go in single mode for 25 gig. Here is a technology reference slides. I tend not read them at conferences, although they are very, very, very interesting, if I read them to you they are also very boring so I won't put the audience to sleep. We have this new thing called the SFP 28, the same size as SPF plus for one in 10 gig and lets and that is what everyone is using for 25 gig. I will skip over the technology references in the next sections.
Moving on to 40 gig. 40 gig is pretty good as far as the market is concerned, we have the QS FP plus, a bunch of interfaces, it's very popular for four times ten aggregation in data centres, a ‑‑ if you want to do native 40 gig we have a bunch of different standards available now, the one that was just developed in earlier this year is the ER 4 so we can get up to 40 kilometres and that is about all that is going on with 40 gig. As I mentioned there is also 40 G based T standard which is combined with 25 and we will see that sometime next year. And that is about all going won that, it's pretty much done and usable and deployable now. Here is a reference table that I will skip over.
And moving on to 100 gig. So this is a slide that is worth spending sometime on because we are at this kind of the turning point in the technology adoption. If you look at market adoption of a technology it turns out and this is the one thing I remember from my statistics class, everything turns out to be a bell curve as with anything. And if you look at how technology has adopted, it's adopted as a bell curve. So this is anything, vacuum cleaners or iPhones, new sweat shirts or something but there is innovater and then it moves into majority and the people who adopt it last. In terms of 100 gig, we are crossing the chasm, that is between the early adopters and the mass markets so you are about to cross the chasm with 100 gig. I thought we were going to do it last year but I was wrong. This year I think it's going to be the key trend. There is two things driving that: The smaller optics and QS TP 28 and CFP 4 and also the ASIC that I mentioned, that can now support up to 3,200 gigs on a single lined card so. That is driving a lot of adoption. Next year, a market analysis group, they project we are examining to ship 100 million Ethernet ports. Does anyone know how much we shipped last year? 17,000. So that is quite a big difference. And then in the next couple of years, we will see as any market develops, as we saw with 1 gig and 10 gig a bunch of different options come along as we have the technology. I put something like a theatre cat ‑‑ SFP we don't have that yet, but over the next couple of years something like that will come along give us much higher density commodity first. First generation 100 gig, some of you may have seen this but the fundamental challenge we have is the first generation 100 gig use a lot of 10 gig signalling, that is because of the technology that we have to do that. As 25 gig technology develops, we'd like to incorporate that into 100 gig to make it faster and cheaper and higher density, a lot of the 10 gig components are moving to 25 on the electrical and optical side. In terms of developments, there is not a whole lot examining oranges there is a short reach interface, that was developed, the important thing really is the electrical signalling so that is the CAUI 4 using 4 by 25 gigs, there were two objectives that were in the standard that didn't make it, the 20 metre multi‑mode interface, we already had the 100 metre. There wasn't a whole lot of difference or advantage over that. And the one that we really needed but didn't make it was a medium range single mode interface, right now for 100 we don't have anything between ‑‑ we have a two kilometre standard but not anything for short reach that is optimised for 25 gig signalling. So that unfortunately was taken out of the IEEE due to lack of consensus and I will show you how that affected things.
In terms of module evolution, if you remember the first CFP was kind of large and clunky and expensive in terms of power and also in terms of cost, they were massively expensive. Sometimes, as expensive as a line card. But over the evolution the generation just as we had for 10 gig, 10 gig started out as a very large 300 pin MSA fixed port back in 2002 when it was developed and we went to Zen pack and XFP and same things happening with 100 gig market. Now we are moving to second generation. The important ones are QS FP 28 and CFP 4. Those give us the really high density that we need in order to do high density line cards combined with the ASIC technology so that gives us the focal port capacity. Up to 22 to 44 ports ‑‑ 14 to 32 ports if you put them back to back with the CFP 4.
Module evolution, here is just a different graphic for you, and especially it shows new relation to an iPhone, I think this is probably one of the more important slides. Everyone wants to know how big is this thing. You can see the CFP is quite large, larger than an iPhone, CFP 2 is about the size of an iPhone, getting smaller. And the others will be about the size of an XFP, not quite ‑‑ definitely size of an XFP so it's getting smaller. This is what happens when we don't reach consensus, so an MSA, that means multi‑source agreement, a bunch of people getting together and deciding to do a standard. There is nothing wrong that. The IEEE Ethernet standards are written so other standards can be developed alongside of them. The problem is if you remember VHS V beta max is now we have 4 different VHSes and not just two. When when 100 gig first came out we had ten by ten MSA standard and that was to provide an alternative because the IEEE didn't have a sort reach single mode interface, two kilometres very popular, 10 popular because it was cheaper, but now for the new generation we have four MSAs that are doing a short reach single mode interface and in fact two of them are doing the same thing, but they don't interoperate and they are all driven by different competing vendors and different competing standards and different competing objectives. So, it's going to be interesting to see what plays out in this market but we definitely can't sustain new IEEE as well as all these four MSAs so someone is going to have to lose unfortunately and the market is going to have to decide that.
Here is a technology reference slide. So moving on to 400 gig and this is where it gets really hard. So, for 10 gig basically all we are doing is blinking a light and faster than 1 gig but that is basically it. 100 gig and especially 400 gig, this gets into really complicated DSP digital singinal processing, you have to do a lot of prioritiry modulation and a bunch of complicated stuff so we can get this technology to work. In the IEEE we realised the Ethernet at terabit speeds is not very practical from a technical and economical perspective but we know they are needed soon in core networks. So we would have loved to do terabit Ethernet, it's not feasible. If we make something that no one can buy we have failed. If we make something that people buy and it doesn't work, we have failed too. So the economics are really difficult here and wave bunch of different things to consider in terms of the electrical and optical signalling technology that we have, the optics form factor, balance versus all the market requirements. So there is a different market requirements we have to take into consideration, and the IEEE provides the open forum and ‑‑ we couldn't come to an agreement on the 100 gig standard that I mentioned for 500 metres because there is no obvious good solution and we had to pick one and we couldn't pick any. So the next higher speed is going to be 400 gig, there is a link to the IEEE bandwidth assessment ad hoc which studied the market and determined that 400 gig sounds like a good objective, just based on the technology and the speed requirements that we have.
So the task force has been going for a couple of years, they are working on, four interfaces, an electrical specification. The first someone defined, 400 G based ‑‑ 16 times 25 gig parallel over a 16 parallel multi‑mode fibre links and then the longer reach objectives get much harder and I will show that you on the next slide. We have also defined the important electrical interfaces, there is 25 and 50 gig interface. As 40 gig and probably 100 gig will be very popular we also want to support the four times 100 gig break out, that is a very useful application. So we see the standard and interfaces probably available in about two years from now.
This is the problem with the single mode interface is that there is no real clear technology winner, and everyone has a different proposal and everyone in the IEEE has their preference on technology. So there is a ‑‑ for the short reach interface like I said, it's pretty much done, 16 times 25 over multi‑mode, NRZ is not really feasible for higher speeds any more, and so we get into different modulation speeds, we can do knows a number of different ways, four ways to go faster, gist go faster, you can change your modulation, change the number of LANs or the number of fibres or some sort of combination of that and that is the tricky thing is to figure out which combination of speed modulation, fibres and lamsas gives us the best economics and the technical feasibility. So this is the big discussion in the IEEE and I will expect there will be a lot of debate next week in the meeting about it.
If you look at the module evolution, we will see the same thing as we did before. First generations which are already pretty well defined in terms of the CDF P style 3 and also the CFP 2, the same module you are using for 100 gig, we can run on 50 so that gives us 200. As the technology becomes available we will see some sort of evolution in the next couple of years into some sort of theoretical ‑‑ those modules don't exist but we know that they are coming. And that is really when Ethernet and terabit speeds becomes feasible is when we have this 100 gig electrical signalling.
So, if we look into the future, we know that Ethernet is continuing to evolve and meet new markets and different speeds are required for new applications. And we kind of have to get our heads around that 10X, 3X model isn't going to work any more. We had 10 ‑‑ it wasn't 3 X the cost initially obviously, but the new technology came down the to 3 X of cost eventually, we went from 10 meg to ‑‑ but that doesn't work any more. It turns out that the best increment or the building blocks in the speed multiples now are really four X the highest lain rate or 8, so that is why we get to 400 gig and to 4 by 25 for 100 gig and then in the future, as we have gone from 10 gig electrical and 25, 50 gig is the next speed that we will see and that could be the basis for some new technologies based on multiples of 50 gig so it's likely we will see a 50 gig Ethernet and some sort of 200 gig Ethernet sometime after that.
So if we want to sum it all up, here is basically a summary, we have two‑and‑a‑half and five gig coming for high Swede wireless and Cat 5, 10 gig is very widely deployed. 25 is coming soon for serve and 40 gig is pretty popular in data centres now. We are squarely in the middle of second generation technology with 100 gig and that is the CFP P 2 and 4 and 28. We are still aways away ‑‑ that will come Soulely. We are working on 400 gig Ethernet and that will use a lot of the 100 gig technology as a building block and Ethernet and terabit speeds is still not feasible in the next couple of years but we will get there eventually and it's probably ‑‑ it's likely to be some sort of multiple of 400 gigs so 1.6 terabit comes to mind as that ‑‑ using that four X multiple math.
More information go to the IEEE web pages or optics manufacturer web pages. That is about it. I have a ton of reference slides that are next so feel free to brows those in your spare time. I don't have time to go through them all. That is it. A lot, I know. Any questions?
FILIZ YILMAZ: Any questions for Greg? Is that all very clear? Any suggestions from the operational side, how these may come into life?
SPEAKER: Good afternoon, Martin Levy. This is all good but it's still 1,500 bytes for every frame. Can you talk about this?
GREG HANKINS: Thanks, Martin. So I think you said that at APRICOT to me last year and I don't have a good answer. So, I will just repeat the answer that I said, but there is very little to know interest in the i.e. E and standardising a larger frame size and that is because of the massive ‑‑ you saw over 40 year deployment of Ethernet technology that could be possibly inter ‑‑ uninteroperable or could have inter‑op issues if we increase the standard frame size. So every IEEE specification always starts out with preserve the 1,500 byte frame size and I don't see that changing. To be honest. But I agree that it's a problem.
RUEDIGER VOLK: Deutsche Telekom. Thanks for starting with that question, Martin. And I would have phrased that like, well, you were talking about the IEEE is seeing market demand for something and do I understand you right that the IEEE is not seeing a market demand for larger frame size? OK. Fine.
GREG HANKINS: Yes.
RUEDIGER VOLK: The other remark as you were referring to the 40 years of deployment. Unfortunately, I am too young to have been ‑‑ to have enjoyed 40 years ago. I enjoyed about 30 years ago installation of yellow wire, if you know what that is? How much of the technical characteristics of Ethernet of 40 or 30 years ago actually remains in what we are doing today?
GREG HANKINS: So, that is a good point. By that I meant more in the intellectual property and that everyone has A SIX that are tuned around these 1,500 bytes, and people are very hesitant to change that because it's been there for so long.
RUEDIGER VOLK: Well, OK. The nice answer to my question could have been, yes, we learned, we learned that data grim ‑‑ runs over circuit. That would have been a good answer. The answer that everybody has optimised their ASICs for 1,500 ‑‑ the large routers I am working with seem to do very well with larger and I don't think that the vendors of that equipment count as nobody.
GREG HANKINS: I agree because I work for one of those vendors so I definitely agree. So I think the ‑‑ my point would be if this is important to you then talk to your vendors and let them know that general frames are important because the message is not getting through between the users and the people that make the standards.
GEOFF HUSTON: APNIC. What is the minimum MTU size for this 400 gig orator bit specification?
GREG HANKINS: The minimum frame size, it would be 64 bytes.
GEOFF HUSTON: So that means that a maximum rate you are going to get a packet every 4.7 nanoseconds ‑‑ .67 nanoseconds, yes? Yes.
GREG HANKINS: For 400 gig, no, it was 1 point something nanoseconds.
GEOFF HUSTON: It's .67 for terabit. What is the fasters memory you have?
GREG HANKINS: Good yes question. So that is a whole other presentation. In fact, I just gave one about two weeks ago about memory speed. So yes, and of course I saw your presentation in APRICOT so good question. The challenges really that the memory speeds have by far lagged behind the packet rates, 100 gig is ‑‑ 7 .72 nanoseconds, 400 gig it's whatever it is divided by four. The random memory read rates are in the single nanosecond range so we can get down to a couple of nanoseconds but the packets are coming in faster. We have to buffer, we have to do a bunch of tricks because we can't look up the destinations faster than ‑‑ or as fast as the packets are coming in.
GEOFF HUSTON: And you have got no other answer at this point?
GREG HANKINS: That is in my other presentation. But yeah, good point, the point I think is that we need ‑‑ there is a bunch of different things in play here, there is obviously the ASIC capacity, there is the line rate, the memory capacity, and all those speeds have to increase at the same time in order for to us do faster Ethernet rates.
BENEDIKT STOCKEBRAND: It's not just the memory, it's also the TCAM or whatever because you transfer the problem into layer 3 where it's really painful. Basically, means you can get all the line speed you want, but if you can't route it it's completely useless so we have a problem there.
GREG HANKINS: You are right. By memory I meant look at memory, buffer memory, any memory that is used in a router to forward packets has to be able to keep up with the line rate coming in. It gets harder because in a lot of cases have to do multiple lookups per packet. It's actually multiple cycles of nanoseconds that we need to look up a packet. But fortunately, the buffer memory is keeping up with it so we can store the packets long enough while we are trying to look them up.
FILIZ YILMAZ: Any other questions? Well, thank you, Greg. I think we just recruited another talk for the next RIPE meeting as we speak. Thank you.
(Applause)
Next in line we have Greg Shepherd, he is going to talk about BIER. I don't know if you are like that sort but to put you in the immediate for the social activities this evening.
GREG SHEPHERD: Afternoon. Great to be talking about something new and to be act two of the Greg show this afternoon. Is there any other Gregs in the audience come on up after me. I don't get to choose them. I am the last Greg.
So, I am examining to talk about BIER, but before I dive into it, let me talk about what we are trying to do with T anyone here remember multicast? Anyone currently have multicast deployed today? Excellent. A fraction. So what is the first thing comes to mind without your hands up, when you hear multicast. Shout. Shite, horror, excellent. That is what I wanted to hear. We have been listening. We have the scars that come with all that experience as well. So the intention with BIER is to learn from this experiences and see if we can find something that actually addresses those problems today.
So let's take a quick history lesson. This was done back in 86 and what he tried to do with multicast is considerably different than what we do with it today and I am not going to go into that in great detail. What he was trying to do was create a Layer 2 broadcast domain on top of layer 3. A noble cause I suppose, but it was really part of this intention of Internet‑wide solution, part of the IP stack which is good and bad. It's good in that it becomes an embedded part of our solution set. It's bad it's difficult to change it to evolve it over time and it's embedded part of the solution. But my intention is to talk about what he was trying to do was a global Internet wide level solution. Today, our deployments are about our networks, what we are trying to solve for ourselves. This list comes up all the time. The reality is there are use cases for multicast that make a lot of sense. This list is more of a wish list of things that have been out there. The one we see the most and with some of your hands up, financial networks? Any financial networks? All right. That is bread and butter. IPTV, like video deployment? Excellent. NVPN? No NVPN deployments? Wow. There is one. I see a hand. During mention nothing about one to many applications in his original draft. It was all about Layer 2 applications in the Stanford campus that were broken we they stuck a router in the middle of the network, really collaborative group wear stuff and what he talked about in his original draft was collaborative solution, many to many applications, dynamics, receivers coming and going and we have been listening to that and solutions coming down the pipe to address some of those problems. But what he built was this Internet‑wide solution that was an explicit join model, which is nice in that we now, with Pam, we only send the data where it's requested and don't flood it everywhere but what this means is now we have three state for every flow, and it's relatively unbounded, right? Every new flow gets a new entry and as these trees grow we negative impact convergence time. If there is any sort of network failure, any node that is affected has to recalculate RFP and send all those joins max to MTU to this next RPF neighbour. And because these are receiver driven, the path of the data is defined by the receivers' view of its path towards a source. One the state is built the source sends the traffic down that path which may be different than the path Unicast would take because that is following Unicast reachability to the receiver, whereas the receiver's view towards the source. It causes challenges in trying to train people up, managing the network, you build a multicast network out and it's stable and you forget about it because you are moving on to your other problems of the day. When something goes bad you have to find that guy in the closet and it's a challenge if it's not part of your day‑to‑day operations.
So, with this state explosion there is actually no real way to aggregate that state as well. We have some solutions like the MVPN case. But with that comes this trade‑off between flooding traffic again where it's unwanted or just reexposing the state back into the core. In addition, we have data driven events which are unfriendly to the network, to say the least. And so listening to these things, and the special skill sets and challenges we are trying to come up with something completely different. So the benefits of multi‑point are well‑known. We have an entire industry based on multi‑point services and we fail to bring them to IP, broadcast television, it's slowly making its way there, sending one signal out and my cost are linear. We know there is an effective business model that the multicast can address if we had a technical way to solve it. Financials of course, this is all market data sent over multicast over walled garden networks. So, those people who don't have this day‑to‑day operational requirement often look to multicast, do this cost benefit analysis and saw it just wasn't worth the effort. We have seen a lot of examples where customers go ahead and write their own Unicast solution to do Unicast or host base replication they don't want to incur all those challenges of a multicast environment within their network. Only those networks where they can benefit and dedicate people to it are the ones. Being involved in multicast for way too long, probably recognise here he comes again, it's frustrating. We would like to do something better, so can we?
So stepping back as I mentioned before, this end‑to‑end global solution that they were after, you run a network, what you are most concerned is your network, get the traffic in to the right end points and I am done. Most of what we are are overlay topologies and all we need to do know is where the end receivers are, if I can encode the packet to identify those and use my Unicast reachable to get them there I might have a solution. If I take the smallest identifier for an end node, a bit, and I advertise that bit position associated with the router ID or the I can use the IGP to flood this information and build a bit index forwarding table using calculation and defining those end points in the packet itself. And we call this BIER, bit index explicit replication and it helps break the ice, it's often ‑‑ it's rare someone comes to the mic and says, I don't like beer. So that helps us a bit.
We actually have some adoption in the IETF, we had our first BoF last November and five vendors co‑authoring and in number of operators who are contributing to OAM and the use cases draft and presenting here today to get more operator feedback. We don't want to do any in a vacuum, we want to continue that with the talk today. We have got some traction. We had our first meeting in Dallas last march and to commerate that we had a beer brewed commissioned for this event, it's the BIER beer and we do have a handful of them here today for the right questions or consideration.
A list of drafts already adopted by the Working Group, the names have been adopted, you can see the general architecture, I encourage you to read that, talking about how the pieces came together and MPLS, though there really is no rules about this. We have adopted that, what the requirements are, what is can he employed out there and what the current use cases are. The draft has been flushed out as we go. The kind of feedback from the operator community, this is open draft and it's evolving. The NVPN draft is going by the case. That is ‑‑ large providers have a lot of it rolled out, I saw one hand here today, it seems a little ‑‑ not representative what have my experience has been but that is all right, I am always here to learn. And we know the challenges with that so we say BIER is a good plug in for that and because we are encoding in the IEG P, extensions to OSPF and ‑‑ in the IETF to progress the work. So what is the solution look like?
You got your network, it's a BIER domain. You take a bit string and take one unique bit position from that string and you assign it to each of the nodes, end nodes of your network, just the edges and now with a new opaque LSA you flood that information which is the bit position in the string and this association with the unique IP address of that node, and with that information flooded throughout the IGP, every intermediate router now can take the RIB and this bit position and build a BIER forwarding table, which is a set of bits reachable per neighbour by the SPF calculation. So now you don't have a per flow state any longer, a control plain any longer. Basic topology information carried in my IGP and so at ingress all I need to do is encapsulate with the correct bit set for all the members and using the Unicast I will forward and replicate down the shortest path to each one of the leaf nodes.
So what does forwarding look like. Here is the forwarding table ‑‑ so I might have to read this side. So, what I am showing here is a topology of the leaf edges being D, F, E and A, and at C, B, E and A, we have the BIER forwarding table itself. So the interesting case here would be looking at router B and you can see router D is bit position one, F is 2 and E is bit position 3. And so from router's B perspective bit position 1 and 2 are shortest path reachable via neighbour C. And B, the position 3 is shortest via neighbour E. So this is extracted from the RIB with ‑‑ every router builds this BIER forwarding table. So in forwarding you do need some signalling and the BIER itself doesn't define signalling but we do have a specification in the IETF to show how we use BGP to do this and it's the same BGP for ‑‑ all the information is already in the protocol. So when we have a receiver at edge router D, he sends the member ship information over the top, whatever the signalling is, in this case for example BGP C, router A gets it and it's associated with flow blue. So ‑‑ whatever state he has, he has got a single bit now, and encapsulates when a packet arrives matching that flow with this new encapsulation mask at the header. He looks it up in his table, he ends and matches and forwards and in this case it's very simple: It will be hop by hop all the way down to D. But now when we have got another joiner at node E he sends membership to blue with bit position 3. Ingress router ends those two together, we have a mask of 0101 so bit 3 and 1 are set so when A looks this up, it sees the match for both bits are neighbour C, or neighbour B, so it encapsulates them forwards. At router B we have the interesting case now where bit position one is reachable via C and we have to replicate. What we do at this point we actually and the incoming mask with the forwarding entry, so the results are only the bits that are reachable by that neighbour. And this is an important logic that has to take place at forward to go ensure we don't duplicate packets in the network. This is incredibly simple, it's just something we haven't built into routers. So, if we had done this 20 years ago when Diering was still evolving the solution we would be miles ahead. There is enough pain in the network that is the reasonable to see we have got five vendors working together to say how do we move this side. We get to E, position 3 matches, C does the same thing. And you can see the mask now forward hereby because both matched, at router B neighbour C only matched bit position one and the and function resulted in the outgoing packet with the encapsulation now of only that bit reachable from the and in function of the incoming mask and the table entry. And B the same way, only bit 3 matched from the incoming mask. And now we don't loop the packets through.
And in the final case, we have neighbour F, now we have anded all three bits together here and the same process takes place hop by hop where we did the matching and encapsulating only the bits reachable. C does the same here as well and you can see where even one and two are set, only bit position one is via D match the and in match forwards at and then F encapsulates only bit 2 to F.
Now the logic, this is an important part of the process to ensure that we don't duplicate packets anywhere in the network and this next slide will show you how that would be the case. So, if we weren't ‑‑ if we weren't changing the mask to match only those reachable, at router E and at C we would have basically bit D and F would still be reachable via C and we duplicate the packets through and sending down the network. So because this is a match function and and function we can come press the logic a bit and I cut some slides out on the forwarding logic for time but these slides are available in the IETF archives as well and I encourage to you read the drafts, join the BIER list and give us some operator feedback and pull me aside, one of my colleagues is here as well, he has been implementing this and is the author of the majority of these drafts. So if you have questions or ideas on how to contribute, pull us aside.
We have the solution, where do I put the bits and how many of them do I put? That is clearly the challenge. I can only put so many bits on the header and that scopes how many and leafs I can define, we are trying not to make this Internet wide solution, just my network, we can scope this /S‑Z something reasonable. So, the length is actually dependent a lot on the encapsulation type, where I can stick the bits, what platform support today and we came up with five different solutions and we are focusing on two right now, MPLS and v6 and that is really because that is where the interest is in the IETF. IETF is driven by who is going to participate. We don't do this in a vacuum, in fact the 80s pushed back a lot on our original idea and we evolved the solution towards something that we would take towards a standard and not just make work.
So in the MPLS case, the vendors have all confirmed that ‑‑ so that is what the Working Group is going to ‑‑ 256 bits mask space and with MPLS encapsulation the BIER header is after ‑‑ well the BIER label here, after the last label, you can have a VPN table for the case so that is ‑‑ at the egress PE and following is the payload. I actually, I realise now I don't like this slide, it makes the payload look tiny here. We are trying to figure where the mask. The BIER header portion itself is pretty straightforward, protocol, length, we have an ‑‑ you can have hashing the flow information and carry that in the packets for equal cost, multipath and in the case of MVPN, we actually have the BFRID which is the ingress router which can be defined here in the header as well.
MVPN over BIER. Currently the solutions we have today are base on the roast an model which is MLDP, there is even RSVPTE which is explicit trees or dynamic trees as well but a lot of control plain and complexity, we have again that trade‑off between flooding the traffic town wanted nodes or exposing the customer state back into the core, neither is a good solution. With BIER what we have had is this multi point topology information in the IGP. I don't have to have all these different explicit signals for default MB Ts and data driven events or any of that. All that machinery goes away. The current solution have these, the various modes and the profiles and we are disbanding that entirely and we can using the existing BGP for MVPNs just at the edges so that signalling stays the same, we just change the transport and the core.
So as I mentioned, the BGP control planes were used ‑‑ transfer into the BGP updates either through route reflector or meshed over the top. We can maintain by carrying the serve label as well as the BFRID. Also no tree for VPN, there is no state in the core, no decision to make on how often or how many states you provide, in fact a lot of the NVPN providers have to cap the customers in terms of how much state they are going to allocate at their PEs because they are where the state is overloaded anyway and so they have to define that in some way to minimise the impact on their network and charge accordingly to help alleviate some of that stress.
Now, 256 bits, in some cases not enough. Our interaction with operators, it's ‑‑ the distribution in network size is nonlinear. 256 seems to address 80 to 90% of the customers we have spoke when and those who need more than that don't need 257, they need thousands. And it's, nonlinear but a handful need that. How can we address this in some way still using concept of BIER? What we have impotioned here is idea of sets, we have 256 bits in an individual set but the ability to build BIER forwarding tables for various sets that are different functions and we can overlap them as needed but the point is a single packet then can only be addressed to a single set and if you need to address more than 256, you have to send two packets into the network. This is ingress replication, I understand, but again this is not ingress replication per receiver, it's per 256 receivers in a set. And depending on your topology, you could intelligently distribute your bit assignments to prevent duplicate packets on the same network. In this case we see here they are following the set link, set ID 1 and 2, have the same bit positions in each mask set but different context entirely because they are different sets. At some point they are going to diverse because in this case we have seen set one and two are geographically diverse, now I only have single packets being replicated out to those end nodes.
In the extreme case where you mix these that replicated packet is going to take those paths. We can see possibly some opportunity here to create some sort of operator intelligence to define the bits in a way to prevent that replication as much as possible, or do you mean case. A multicast flow has multiple receivers in different sets means that packet has to be sent to each set within the network. Is this a problem? Not necessarily, it's not an insurmountable problem, it's a technical limitation and our feedback so far from operators has not been negative in this regard and we have hopefully addressed this in some way. Give us some feedback. With set identifier as part of the packet that ensures that the correct BIER forwarding table is being looked up on a per node basis so it does require more BIER forwarding table entries, one per set. We can implement in the label itself.
You can also split BIER areas as well and this is really no different than any sort of inter‑domain interaction. It does require to expose the state here in an ABR. There is really no way to carry BIER inside a BIER in most cases, because it's multi point, and there is no way to know what that mask is going to be required at each of those end bit nodes. So, it does require to decapsulate, look at the IP state, then look up at the mask for that next set or next area and encapsulate and forward.
In conclusion, so, packets now for multipoint services follow Unicast path. It's SPF now using your current iBGP that. Means there is though multicast plain, convergence time is as fast as Unicast. No complex protocol, training, no ‑‑ RP assignments and any that have nonsense we had to deal with in the past. And because this is just edge definitions, it's a really nice plug in potentially for SDN, a route reflector can be there but we can centrally control policy is one possibility. And it really creates this many to many reachability within the IGP, the just a new attribute in the topology and minimise complexity is really what we are after.
So, in all fairness, what are the disadvantages? There is an upper bound, 256 bits is all we get right now. We can extend that based on whatever platforms can evolve but today we are only seeing the traction right now. On the low end boxes that is going to be a challenge because it's all cut in stone right now and going to be an evolutionary process to push this forward. Push the vendors and the commodity chip vendors to start implementing this in a way that is deployable in your networks. We can use sets to increase that but it does mean potentially additional forwarding tables in your intermediate nodes. You can use areas but your ABR gets exposed to the state and the low end platform problem is biting us a bit today.
So, questions...
JAN ZORZ: Are there any questions? I am Jan and I will continue with the housekeeping of this session.
SPEAKER: Some questions from the ‑‑ the CO Cloud global friends wants to know if user infrastructure supports this technology ‑‑
GREG SHEPHERD: Can you repeat ‑‑
SPEAKER: Will the user infrastructure support this technology?
GREG SHEPHERD: Does what?
SPEAKER: Users technology.
JAN ZORZ: Will the user infrastructure support this technology?
GREG SHEPHERD: User infrastructure? Anyone shine a light on that?
SPEAKER: Probably the equipment at their house.
GREG SHEPHERD: Interesting point. So, IETF had its first hack ton in Dallas and Homenet actually jumped on this as a solution, they have a multipoint requirement and they have been overloading some sort of convoluted way, BIER can do this for resource discovery, I tried to avoid because it was a kind of a red herring so we actually had this guy fall into our lap and jump in and implement it and get it forward and he won the hack ton at the IETF with BIER over v6.
SPEAKER: Will there be conflicts in the communications with other networks if this technology is not deployed?
GREG SHEPHERD: Will ‑‑
SPEAKER: If one network has this technology and another does not have it.
GREG SHEPHERD: ABR between them, it's how you get the traffic in your network get it across and send it off to the edge.
JAN ZORZ: We have five minutes left for questions. And Leslie was first Leslie Carr: Is this going to support US MP or some sort of load balancing?
GREG SHEPHERD: Yes. If you look at the BIER mask I showed, there is entropy field, so you can do a calculation hash on the ‑‑ carry that throughout the network and ECMP paths can be calculated along the way.
SHANE KERR: B I I. So what is the expected management model? It seems like there is a lot of configuration that needs to be done on all these end points and things like that. What are the thoughts on that?
GREG SHEPHERD: Clearly it's a challenge. It's not a technical challenge because this is not logically sophisticated, in fact in talking to chip vendors, logic is free, its state is the cost in building a chip and this solution pulls all that state off the chip, so it can actually have a lower cost solution but there is an evolution requirement two year run rate to get the new chip going out there. So the vendors we have been speaking with work with, in all fairness I am from Cisco as well so embedded in this process, we are targeting the programable platforms, it doesn't have IP awareness but got some programmability, we can embed this solution into an ex‑software on right platform. That is why the MVPN case is below hanging fruit right now because it's large providers, big boxes and easier migration path.
SPEAKER: From IAJ. One question. How do you handle path MTU discovery?
GREG SHEPHERD: It's a good question.
JAN ZORZ: How does IPv6 even work over this?
GREG SHEPHERD: That is just the payload. This is a multi port transport inside the network, like MPLS. And the payload is v4 or v6 and it's agnostic at that point. So turn it up to 9 K.
SPEAKER: ICANN. So you say you want to have 256 bits into your bit mask
GREG SHEPHERD: Correct.
SPEAKER: It's a good compromise for MVPN. If you are looking at more consumer type deployment it may prove to be small, we don't really ‑‑ do you have any idea if it's going to match fail or not? And really the concern; if you start to burn this number into ASIC and create ‑‑ exactly that long, and you are wrong in the future, could that be a problem?
GREG SHEPHERD: I am trouble catching any of this. So...
GREG SHEPHERD: Right. If you start burning in the ASIC that is an issue, that is going to ‑‑ really that chip is targeted for, which platform. Even home that is fairly disjointed. Some people look at Homenet and think a dozen or couple of 100 devices, no the home is going to be more complicated than provider network, thousands if not tens of thousands devices, if there is a the case a whole different consideration. But that could be a couple of generations out there. So, right now, the IETF is targeting 256, and I think vendors have a choice to go beyond that, and it's actually in the length is in the header and we have a way to extend that dynamically with sets so if you have, like the sequence of sets 0, 1, 2, are ex ten Sybil sequences of 256 bits so a 512 mask can be represented at two sets of 256.
SPEAKER: I have more thinking about a million.
GREG SHEPHERD: /OERBG the million case, that is an MTU problem, that is not an ASIC problem.
JAN ZORZ: Are there any more questions? Last question. No. OK. Thank you very much.
GREG SHEPHERD: Thank you.
(Applause)
JAN ZORZ: Now, since we love lightning talks so much, we put the usual tomorrow session in today Opening Plenary, so now we have lightning talks, two of them, and two Dutch people. I would like to remind you to rate the talks throughout ‑‑ all the session today and tomorrow. I think there is a prize for people that rates the talk. I don't know. Maybe. But rate the talks, anyway. So, now we have Martijn Hoogesteger from University of Twente. We have ten minutes and if you want to leave some room for questions, you need to include this in your ten minutes.
MARTIJN HOOGESTEGER: Thank you, I will try to fit some minutes of questions in there. Thank you. I am a master's student from the University of Twente. I am also a RACI attendee, so the other presentations are tomorrow, please go to them as well. I am here to tell you something about Internet traffic statistics archive, we have been working on at the university. And I would like to start off with a couple of questions.
It's a little game of have you ever.... I am wondering if you have ever assumed http accounts for most of the Internet traffic. You have probably assumed it and made decisions on and what to focus on but is it actually true for all of the networks? Maybe not. You can probably answer this for yourself. And there is no plot twist in my presentation, don't worry http probably is the most important protocol. If you have wondered if HTTPS is slowly taking over http. Probably you need to know if it's going to be more important in your traffic. Maybe this is true for your country but is it true all over the world? How applicable is your solution.
Or maybe you have been interested in the differences of Internet characteristics between countries or continents. Traffic has some characteristics in your country but it might be different in other countries. I have actually seen some differences in the ratio of it. VP and UDP traffic between countries. Or maybe you have basically thought of does the Internet grow? This is a very basic question that even a journalist or an economist could ask. How important is the Internet to us.
These questions you could probably answer because you are doing research in this, sour probably maybe even operating a network. But you can't answer this for other networks, and if you are a journalist, how will you answer these questions?
Well, there was a solution to finding this kind of information. Internet 2 in America are the enabling network and they both did the weekly reports. The weekly works were public on a network, you see http is is indeed important protocol, almost 37% of their traffic, but HTTPS ‑‑ you can see the TCP traffic is the most important, etc.. but this was only on their network, the ABLE network and they stopped this in 2010. The website is off‑line right now. It was cited a lot by journalists and economists and it was used in network research as well, to say things like http is the most important, that is why we are focusing on this. This is down, there is no real source of information right now.
So we figured a new source of information is needed, which brings me to the project that we have been working on, which is called ITSA, the Internet traffic statistics archive and consists of a couple of phases. Firstly, we collect the data which is based on NetFlow data, put this through a couple of scripts, which generates JSON report file which is a very average, small file with information on the data. This reduces like hundreds of big bites of NetFlow data to maybe a file of 100 megabytes. This JSON report file is sent to us, stored in a local database and published publically for anyone to access. Let me go through a couple of the phases. First off the data providers, of course we want to provide service where you can see data from all around the world. We set up the system with cooperation from DEIC and CESNET, you can download reports from that every week from the past year. But of course we want to expand this and show you what the Internet looks like everywhere. So we are talk with a couple of other end ‑‑ network operators to have their data included as well, and of course, if you know anybody, if you would like to cooperate on this, please let me know.
And hopefully we will show more and more how the Internet looks like in general. The data is just stored at the network operators. We don't need to have it. They run our script and only have to send us 100 megabyte file, maybe 200 but not in the ranges of hundreds. It generates a JSON report file and generates some statistics, a, we are expanding on this as well, the project is in development so if you want to give me some tips on what you think is interesting to see, let me know.
The report file is just JSON, it's very readable so, anybody could use it to perform their own statistics on it.
But we are showing a couple of examples on a website. So the report file is sent to us, it's stored, it's presented publically, how we do that of course, it's a website, how we accomplish it. We have some nice Google charts, graphs on it to show you a couple of examples what could you do with the data, what could you represent. Here we see last week for NREN in Brazil, http is the most used protocol there, followed by HTTPS which is almost a third of the web traffic there. So on this website you can download the JSON files and see some information on the provider, how does their network look like, what router does the data come from, and of course, we are talking about NetFlow data so we are ‑‑ it's also listed, what their sampling rate is. This is the system right now. People can access it and view it, it's on stats.simple web.org. We are looking to expand it with more data sources, as I mentioned before we want to show you what the Internet looks like in general of course, so it would be nice to have data from all of the continents. If there is anybody here who has some data from Antarctica, I would love it, maybe one packet in a year or; it would be interesting. We are looking to include IPv6 of course, it's done a lot but it would be nice to include it here as well to see this network, how is the IPv6 adoption in it. That is something we are including, if you have any other ideas, again let me know. And we are expanding the website with historical views so you can see how is HTTPS increasing over time or is TCP increasing. This was interesting in the Internet two example I gave you could see in 2004 HTP was into the very important protocol, it was bill board protocols of course. And you can see http increasing over the years. And as I showed before HTTPS on their network in 2010 was only 3% of traffic and it was last week 17%. We want to include comparative views so you can compare Brazil and Europe or an aggregated view or you can see how does the whole world look like.
That is quickly the project that we are doing. Again if, you have more questions, send me an e‑mail, come up to me. Ricardo is advising me on this. Visit our website. If you have any questions now, I think we have one‑and‑a‑half minutes left.
JAN ZORZ: Are there any questions?
SPEAKER: We run the ‑‑ domain name. We host one of the name servers for A Q which is Antarctica, we might can give us some stats ‑‑ that is a joke. Did the university develop the platform you use to publish the data?
MARTIJN HOOGESTEGER: Yes, we have been working on this project for a few years and we have developed everything, yes.
SPEAKER: Is there a reason why you didn't go with C‑CAN, for example?
MARTIJN HOOGESTEGER: This is very specific to NetFlow data and what we want to accomplish with it and it's a very decentralised system so it's just an original idea so ‑‑
SPEAKER: It's not for generic data certificate, it's for very specific data set?
MARTIJN HOOGESTEGER: Yes.
SPEAKER: Do you have API support or API keys or stuff like that? Access to the data through an API?
MARTIJN HOOGESTEGER: No, you can download the JSON files and do anything with the data you want. So we don't have an API, you can just download all the data if you want to.
SPEAKER: We can talk about that during the break.
JAN ZORZ: We have 15 seconds left for any kind of question. Thank you very much.
(Applause)
And now we have Remco, he will explain to us how the thousand dollars IXP can be bit. Dollars or euros?
REMCO VAN MOOK: It's dollars, actually, which is almost the same as euros. And while I was ‑‑ while I already uploaded this deck and then I found out that the actual value of the thousand dollar bill that is on the front of my presentation is actually worth 2,500 dollars these days. So, before we start, this is a work in progress, this is not finished and I am fully expecting to be cleatly shot down by about half of the people in this audience. That is a design goal for this presentation.
There is a big difference between setting up a new exchange from the ground up and parachuting in a satellite for an existing exchange. The latter I am not going to cover because there is all sorts of reasons for designs and decisions that these people make. My goal is to do this presentation and look to come to something like a full deck, and a blueprint after the summer, and I want your help.
So, this morning, at the BGP tutorial, there was this slide and it had an Internet exchange has to have a big Ethernet switch because Internet Exchanges are complicated beasts that require huge equipment. Fine. So, here is how Internet Exchanges started. I don't know if you recognise, the one on the right is the original linked switch. It's called ‑‑ got eight 10 meg ports as far as I can recall but Michael can probably tell you all about it, and the one on the left is one of the original boxes used by AMS‑IX and this is what it looks like today, at least I have just picked some samples of some equipment vendors. So, if you are a new IXP setting up in some part of the world, what do do you? If you talk to the established IXPs, I mean, some of them are friendly and will give you a pile of old boxes they no longer use because they have burn out of them, they will even offer to run them for you if you want. The ITU has developed interesting Internet Exchanges and they will come over in a nice suit and tell you setting up an Internet Exchange will cost you at least 2 million dollars, and your local networking ‑‑ network geek wants big box with lots of shiny lights and he can't afford one himself. That is probably not what you want to do. But what you really want to do if you are a new IXP is do some market research and build a community. And I know that sounds incredibly dull, so I have done you all a services I have done the first bit for you so you just have to build communities from now on, awesome.
So market research. I went and collected some data from Euro‑IX covering 204 from all over the world, it takes a while for traffic data to be collected and December was a trade‑off in how complete was for all the exchanges, for us how recent. Some of the exchanges, are dead, struggling, dieing, defunct, as happens. So the question really is, on the market research, so how much traffic should you anticipate on a new Internet Exchange? What should be you be designing for? And that is actually a little bit of a shocker. So here is ‑‑ and yes it's unreadable but that is by design. Because there is absolutely no way I can fit 200 tags on a single PowerPoint slide. This is the breakdown of all peak traffic of all IXPs around the world and over on the far end right side is DE‑CIX of course but somehow PowerPoint didn't want to print that one out. And just to be very clear, this is a logarithmic scale because otherwise it would end up all weird and awkward and you couldn't make any sense out of it. So this is 20 gigs. And 20 gigs in peak traffic is, well, what could you ‑‑ let's say commodity hardware, if you look ‑‑ 66% of all exchanges around the world have less than 20 gigs in peak traffic. And that includes all of the massive European exchanges. If you actually exclude Europe from this data, you go back to the 20 gig line, 78% of all Internet Exchanges around the world have less than 20 gig peak traffic. So, the quick and easy conclusion of the market research is, if you manage to get 20 gigs of peak traffic ‑‑ it's time to go to the pub and when you are associate again you need to start thinking about how you expand this thing. Until then, first you need to grow in order to scale and then you need to scale in order to grow and not the other way around. And so actually, if you want to help out new IXPs you should sponsor meeting rooms and facilitate local community building and give them your international contact. So, looking at the ‑‑ OK, what are the requirements I should build and design an IXP against, a start‑up one? So here we go. So, 20 gigs of peak capacity, which is usually averages out to about 8 gigs of average traffic, 20, 30 ports, since newer developing areas you are going to get some demand for 10 meg, 100 MEGS, maybe multiple one gigs, you want to have portal website, quarantine VLAN, route server, Arp watch, the usual. What do you need for that? Well, it's this: This is actually the contents of my basement. It's a 24 port office switch. It's a super micro serve low end and some cabling and SFPs, 279.95 dollars on Amazon right now. The server you can get for 545.96 over at New‑egg and you have some cabling. Which gives you about 25 dollars to buy 6 pegs and get your first community meeting done.
So, that is ‑‑ I mean, that is the start of the presentation. What I am ‑‑ what this really is about, is don't waste your time, if you want to set up an Internet Exchange with your friends in some part of the world, don't waste your time agonising over designs and buying big massive boxes because there is no point, you are just wasting money and you are making the discussion a lot more complicated because there is lots of money involved. And you should really be spending that time on successfully building that community and actually getting traffic going and once you have the traffic, then you start worrying about what you should do. And how you should design that. And that is really it. And I have two minutes left.
JAN ZORZ: Thank you, Remco. Are there comments, questions? We have a few minutes.
NIGEL TITLEY: One of the founders of links, your links slide was slightly inaccurate. That was the second LINX switch. The first one was a five port hub. Does anyone remember those?
REMCO VAN MOOK: Thank you.
AARON HUGHES: Limelight network, use (mike) I was going to say the same thing, the first link switch, there was a hump before it, Phillip Smith may even have it. If you want to add 10 big forts that exchange you do need to buy a second switch because of the way the queuing strategies work, that is assuming somebody comes along and says they want a 10 gig port. If it's a small Internet Exchange you may not need it. But, I can't join the exchange ‑‑ probably only got 1 gig ports.
REMCO VAN MOOK: You should probably reconsider your strategy how you deploy and where you deploy.
JIM REID: Speaking for myself. That is great idea and I think you are on the right lines as far as the technical aspects of setting you an Internet Exchange on a small change or introductory Internet Exchange but I think the problems are not about pricing or hardware, I think the problems are layer 9 and above and regulatory regimes, the which in which incumbent Telcos will invent restrictive practices, problems with cross‑connection, you know that story better than I do and I think that is the area we need to focus on.
REMCO VAN MOOK: I completely agree. At the same time, I am Dutch, which means I am frugal, some would say cheap, I don't just worry about spending my own money, I worry about people sending theirs and that is where this came from. I have seen some evidence and I heard a couple of anecdotes today while talking to people about newly set up exchanges in ‑‑ remote parts of the world where the first thing was done was buy ‑‑ fully loaded chassis with ports nobody could connect to or support because that technology wasn't available in the country. So there we go.
SPEAKER: I am not speaking on behalf of the people who gave me this T‑shirt, just to be clear. So, I was wondering, lots of Internet Exchanges are actually just one page marketing materials for small data centres in remote cities, right, so you shouldn't take them seriously, so I think probably, if you look at anything that is really an Internet Exchange in the real world then it's probably ‑‑ they do use more than your 20 gigs so I think your numbers are probably a bit ‑‑
REMCO VAN MOOK: You would be very surprised/disappointed at what is actually in those 200 exchanges.
SPEAKER: Give me example of 20 gigabit ‑‑
REMCO VAN MOOK: NaMeX at in Rome is actually right at the 20 gig mark.
SPEAKER: Right, about 300 dollar switch from Amazon so that is interesting, I assume those are ‑‑ sorry, gigabit ports?
REMCO VAN MOOK: Yes.
SPEAKER: Can you bundle them and say I want to bundle them, that is pretty cool.
REMCO VAN MOOK: So the switch, I mean, I am not working for any of the vendors, that is shared design between the world renowned switch makers free com HP and H free C, this is a shared design, they all sell it under their own label, 191024 G, it's a mouthful but a ‑‑ I needed something for my basement and when I read this research, oh, hang on, I actually have this. That is my ten minutes well and truly gone.
JAN ZORZ: The mics are now closed. Or is it a really, really quick question?
SPEAKER: If you use model based CD Us you can bring your computer down to about 200 dollars or 100 dollars for three gigabit ports.
REMCO VAN MOOK: If you use secondhand hard you can probably get it down to 500 or 300 or if you buy used switch, this is all new equipment. This actually has ‑‑ the switch has a lifetime warnty, if it breaks you ship back the remains and you get a new won.
JAN ZORZ: Remco, thank you.
(Applause)
So before we go for a break, I would like to mention two things: This is now the first time that we are experimentally running the IETF help desk out there, if anyone wants to know anything about the IETF and how things work, please stop by. And second thing, now we have a break, come back in half an hour. There is another plenary session full of very good topics, and after that at 6:00 we have our best current operational practices task force meeting here in this room. Thank you.
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE