목록전체 글 (31)
Studying data

이상치(Outliers) 이상치(Outliers)란 데이터의 전체적인 패턴에서 동떨어져 있는 관측값 즉, 변수 분포에서 비정상적으로 벗어난 편차가 큰 값을 말한다. 이상치는 평균에 영향을 미칠 뿐만 아니라 분산(표준편차)이 커져서 불안정한 자료가 되고, 데이터의 통계값과 분포를 왜곡할 수 있다. 이처럼 이상치는 매우 중요한 요소이기 때문에 우리는 항상 데이터 전처리를 할 때 이상치들을 어떻게 처리해야 할지 고민해야 한다(분석을 위해 이상치를 포함시켜야 할지 혹은 제거해야 할지 등). 더욱이 예측 모델을 만드는 경우, 훈련 데이터셋에서 이상치를 미리 제거하면 예측 성능을 올릴 수 있기 때문에 이상치를 잘 다루는 것은 필수라고 볼 수 있다. 이상치의 종류 데이터 생성 시 실수로 생겨난 이상치(non-natu..

You did such a great job helping Julia with her last coding contest challenge that she wants you to work on this one, too! The total score of a hacker is the sum of their maximum scores for all of the challenges. Write a query to print the hacker_id, name, and total score of the hackers ordered by the descending score. If more than one hacker achieved the same total score, then sort the result b..

Julia asked her students to create some coding challenges. Write a query to print the hacker_id, name, and the total number of challenges created by each student. Sort your results by the total number of challenges in descending order. If more than one student created the same number of challenges, then sort the result by hacker_id. If more than one student created the same number of challenges ..

Harry Potter and his friends are at Ollivander's with Ron, finally replacing Charlie's old broken wand. Hermione decides the best way to choose is by determining the minimum number of gold galleons needed to buy each non-evil wand of high power and age. Write a query to print the id, age, coins_needed, and power of the wands that Ron's interested in, sorted in order of descending power. If more ..

Julia just finished conducting a coding contest, and she needs your help assembling the leaderboard! Write a query to print the respective hacker_id and name of hackers who achieved full scores for more than one challenge. Order your output in descending order by the total number of challenges in which the hacker earned a full score. If more than one hacker received full scores in same number of..

You are given two tables: Students and Grades. Students contains three columns ID, Name and Marks. Grades contains the following data: Ketty gives Eve a task to generate a report containing three columns: Name, Grade and Mark. Ketty doesn't want the NAMES of those students who received a grade lower than 8. The report must be in descending order by grade -- i.e. higher grades are entered first. ..

Given the CITY and COUNTRY tables, query the names of all the continents (COUNTRY.Continent) and their respective average city populations (CITY.Population) rounded down to the nearest integer. Note: CITY.CountryCode and COUNTRY.Code are matching key columns. Input Format The CITY and COUNTRY tables are described as follows: [MySQL Solution] SELECT COUNTRY.CONTINENT, FLOOR(AVG(CITY.POPULATION)) ..

Given the CITY and COUNTRY tables, query the sum of the populations of all cities where the CONTINENT is 'Asia'. Note: CITY.CountryCode and COUNTRY.Code are matching key columns. Input Format The CITY and COUNTRY tables are described as follows: [MySQL Solution 1] SELECT SUM(CITY.POPULATION) FROM CITY, COUNTRY WHERE CITY.COUNTRYCODE = COUNTRY.CODE AND COUNTRY.CONTINENT = 'Asia'; [MySQL Solution ..