Tips on Startup web application serving 101

This post comes from my observation of a large number of emails on Barcamp and other mailing lists on ‘choosing the right host’ for start up web infrastructure. There is also a general impression that substantial investments are required to serve applications on production infrastructure. There is also an impression that start ups start out from having a ‘big single box’ architecture. All of which are wrong ways to go about getting your popular application serve a growing user base at a low cost.

I thought I will get a couple of experts to talk about building scalable and performing infrastructure for your start up that won’t cost a bomb. These are coming. But before they come, I thought I will post this to reflect a novice’s (me, of course) views on how to do it. Read the full post for a few case studies.

  • Architecture : this is very important. Before you come up with an architecture that can deal with scale, just know how you want to perform (response times, resource usage, throughput) because the scale will need to handle performance and there is a trade off between scale and performance. Cache where you can, partition your data (using shards), think about the file system to store content, think about using clouds like Amazon Web Services, etc.
  • Design : do this carefully. For schema design, selection of storage engines, choosing which data types to use, selecting indexes to use and handling concurrent database connections, you need to think well and do it well.
  • Configuration : From selecting which web and database server you can use (eg, lighthttpd vs Apache, BerkeleyDb vs MySql) to RAM used, output buffers, compressing data to keep alive connections, this is probably more choice than skill.

Whatever you do, benchmark and profile your systems as you go.Startups to Scale

    View SlideShare presentation or Upload your own. (tags: scalability performance)

    PlentyOfFish (apparently a dating website 🙂 is a good case study of a one man start up that makes $10 million a year from advertisements. The data on PoF is from this amazing website called HighScalability.com and the costs and exact server specifications (matches PoF high level specifications) are our own for our application called Lukup.

    1. Performance

    • 1.2 billion page views per month, 30 million+ hits per day (500 pages per second), 64k simultaneous connections, 2 million page views per hour
    • 500k unique logins per day, 45 million visitors per month

    2. Infrastructure

    • 2 load balanced web servers, Dell PowerEdge 1U SATA Intel Xeon 3210 Quad core processors with 2 X 250GB drives and 8 GB RAM
    • 2 database servers, Dell PowerEdge 2950 SATA 2U Intel Xeon 5335 Quad Core processors with 7 X 250GB drives and 4 GB RAM and RAID10
    • 1 load balancer, 1 server each for cache (16 GB RAM) and content
    • private rack with 10U server capacity, Cisco 1000 MBPS 24 port switch
    • IP KVM switch
    • 2500-3000 GB bandwidth transfer on all servers

    3. Costs

    • Total cost is Rs 1.2 lakh ($3000 per month) from ThePlanet hosting; RackSpace quoted $7000 per month and local India based datacenters cost a big bomb (Rs 5 lakhs+ per month)
    • Of course, we bargained hard 🙂

    Even $3000 is high for a boot strapped startup but you can get a web and database server for as low as $350 per month and then scale up with higher usage. What this all means is that you do not need huge funding to set up a sprawling infrastructure to serve millions of users; all you really need is avoiding the most common gotchas in infrastructure setup and a good application that appeals to lots of people.

    Mysql 2007 Tech At Digg V3

    View SlideShare presentation or Upload your own. (tags: digg mysql)

    The first presentation was Perlie, the one on Digg dealt with a PHP and MySql application and here is a good Ruby on Rails case on scaling Twitter.

    Scaling Twitter

    View SlideShare presentation or Upload your own. (tags: performance scaling)

    Leave a comment

    Your email address will not be published.