When your server CPU load is at 97% for 30–60 minutes then it’s not good. Especially if it happens more times a week. And this is what happened last week. I started to monitor the situation more closely, and I saw very weird user actions. I had no idea what was going on. My first thought was that somebody did it with malicious intent and it was a kind of a DOS attack. I contacted the user. She explained what she wanted to achieve. I don’t want to get into the details, but it turned out that it was a valid but very rare and weird user scenario and she was doing something in the wrong order, so the unusual load was cased by a user error. Finally I advised her how to do the steps in the proper order to achieve her goal but not causing an extreme load on the server. Even after this things didn’t go smoothly, it turned out that there was a bug in the “Find Duplicates” feature. There were bookmarks with empty urls among the user’s imported bookmarks, which caused an exception. Today I fixed the bug.
So I was lucky it wasn’t a DOS attack. But what I learned from this that I need to prepare to handle a possible DOS attack. I haven’t thought of it until now. This week my number one priority task is to develop and implement some defense mechanism against a DOS attack.
Another thing that came to my mind was auto scaling. I think the nature of this web app doesn’t require auto scaling but utilizing it is something I might consider in the future. AWS has a pretty good dynamic auto scaling service.