How to be a Programmer
  • How to be a Programmer: Community Version
  • LANGS
  • How to be a Programmer: Community Version
    • Appendix A - Bibliography/Websiteography
    • Appendix B - History
    • Contributions
    • Glossary
    • Creative Commons Attribution Share-Alike
    • Summary
    • 1. Beginner
      • Personal-Skills
        • Learn to Debug
        • How to Debug by Splitting the Problem Space
        • How to Remove an Error
        • How to Debug Using a Log
        • How to Understand Performance Problems
        • How to Fix Performance Problems
        • How to Optimize Loops
        • How to Deal with I/O Expense
        • How to Manage Memory
        • How to Deal with Intermittent Bugs
        • How to Learn Design Skills
        • How to Conduct Experiments
      • Team-Skills
        • Why Estimation is Important
        • How to Estimate Programming Time
        • How to Find Out Information
        • How to Utilize People as Information Sources
        • How to Document Wisely
        • How to Work with Poor Code
        • How to Use Source Code Control
        • How to Unit Test
        • Take Breaks when Stumped
        • How to Recognize When to Go Home
        • How to Deal with Difficult People
    • 2. Intermediate
      • Judgment
        • How to Tradeoff Quality Against Development Time
        • How to Manage Software System Dependence
        • How to Decide if Software is Too Immature
        • How to Make a Buy vs. Build Decision
        • How to Grow Professionally
        • How to Evaluate Interviewees
        • How to Know When to Apply Fancy Computer Science
        • How to Talk to Non-Engineers
      • Mentoring
        • How to Be Mentored
        • How to Mentor Others
      • Personal-Skills
        • How to Stay Motivated
        • How to be Widely Trusted
        • How to Tradeoff Time vs. Space
        • How to Stress Test
        • How to Balance Brevity and Abstraction
        • How to Learn New Skills
        • Learn to Type
        • How to Do Integration Testing
        • Communication Languages
        • Heavy Tools
        • How to analyze data
      • Team-Skills
        • How to Manage Development Time
        • How to Manage Third-Party Software Risks
        • How to Manage Consultants
        • How to Communicate the Right Amount
        • How to Disagree Honestly and Get Away with It
    • 3. Advanced
      • Compromising-Wisely
        • How to Fight Schedule Pressure
        • How to Understand the User
        • How to Get a Promotion
      • Serving-Your-Team
        • How to Develop Talent
        • How to Choose What to Work On
        • How to Get the Most From Your Team-mates
        • How to Divide Problems Up
        • How to Handle Boring Tasks
        • How to Gather Support for a Project
        • How to Grow a System
        • How to Communicate Well
        • How to Tell People Things They Don't Want to Hear
        • How to Deal with Managerial Myths
        • How to Deal with Organizational Chaos
      • Technical-Judgment
        • How to Tell the Hard From the Impossible
        • How to Utilize Embedded Languages
        • Choosing Languages
  • Cómo ser un Programador: Versión Comunitaria
    • Apéndice A - Bibliografía/Sitografía
    • Apéndice B - Historia
    • Contribuciones
    • Glossary
    • Licencia Creative Commons Atribución-CompartirIgual
    • Resumen
    • 1. Principiante
      • Personal-Skills
        • Aprende a depurar
        • ¿Cómo depurar dividiendo el espacio del problema?
        • ¿Cómo eliminar un error?
        • ¿Cómo depurar utilizando un registro (Log)?
        • ¿Cómo entender problemas de rendimiento?
        • ¿Cómo solucionar problemas de rendimiento?
        • ¿Cómo optimizar bucles?
        • ¿Cómo manejar el costo de la entrada/salida (E/S)?
        • ¿Cómo gestionar la memoria?
        • ¿Cómo manejar errores intermitentes?
        • ¿Cómo aprender habilidades de diseño?
        • ¿Cómo realizar experimentos?
      • Team-Skills
        • ¿Por qué es importante la estimación?
        • ¿Cómo estimar el tiempo de programación?
        • ¿Cómo encontrar información?
        • ¿Cómo utilizar a las personas como fuentes de información?
        • ¿Cómo documentar de manera inteligente?
        • ¿Cómo trabajar con un código deficiente?
        • ¿Cómo Utilizar el Control de Código Fuente?
        • ¿Cómo realizar pruebas unitarias?
        • Tomarse descansos cuando te sientes bloqueado
        • ¿Cómo reconocer cuándo es hora de ir a casa?
        • ¿Cómo lidiar con personas difíciles?
    • 2. Intermedio
      • Judgment
        • ¿Cómo equilibrar la calidad contra el tiempo de desarrollo?
        • ¿Cómo gestionar la dependencia del sistema de software?
        • ¿Cómo decidir si el software es demasiado inmaduro?
        • ¿Cómo tomar una decisión de compra frente a desarrollo interno?
        • ¿Cómo crecer profesionalmente?
        • ¿Cómo evaluar a los candidatos en una entrevista?
        • ¿Cómo saber cuándo aplicar conceptos avanzados de ciencias de la computación?
        • ¿Cómo hablar con personas no ingenieras?
      • Personal-Skills
        • ¿Cómo mantenerse motivado?
        • ¿Cómo ser ampliamente confiado?
        • ¿Cómo hacer equilibrio entre tiempo y espacio?
        • ¿Cómo realizar pruebas de resistencia?
        • ¿Cómo equilibrar brevedad y abstracción?
        • ¿Cómo aprender nuevas habilidades?
        • Aprender a escribir
        • ¿Cómo hacer pruebas de integración?
        • Idiomas de comunicación
        • Herramientas pesadas
        • ¿Cómo analizar datos?
      • Team-Skills
        • ¿Cómo gestionar el tiempo de desarrollo?
        • ¿Cómo gestionar los riesgos del software de terceros?
        • ¿Cómo gestionar a los consultores?
        • ¿Cómo comunicar la cantidad adecuada?
        • ¿Cómo disentir honradamente y salir airosos?
    • 3. Avanzado
      • Compromising-Wisely
        • ¿Cómo Combatir la Presión del Cronograma?
        • ¿Cómo Entender al Usuario?
        • ¿Cómo Obtener un Ascenso?
      • Serving-Your-Team
        • ¿Cómo Desarrollar el Talento?
        • ¿Cómo Elegir en Qué Trabajar?
        • ¿Cómo Obtener lo Mejor de tus Compañeros de Equipo?
        • ¿Cómo Dividir Problemas?
        • ¿Cómo Manejar Tareas Aburridas?
        • ¿Cómo Obtener Apoyo para un Proyecto?
        • ¿Cómo Hacer Crecer un Sistema?
        • ¿Cómo Comunicarse Bien?
        • ¿Cómo Decir Cosas que la Gente no Quiere Escuchar?
        • ¿Cómo Lidiar con Mitos Gerenciales?
        • ¿Cómo Lidiar con el Caos Organizacional?
      • Technical-Judgment
        • ¿Cómo Distinguir lo Difícil de lo Imposible?
        • ¿Cómo Utilizar Lenguajes Incorporados?
        • Elección de Lenguajes
  • How to be a Programmer: Community Version
    • Appendix A - Bibliography/Websiteography
    • Appendix B - History
    • Contributions
    • Glossary
    • Creative Commons Attribution Share-Alike
    • Summary
    • 1. Beginner
      • Personal-Skills
        • Learn to Debug
        • How to Debug by Splitting the Problem Space
        • How to Remove an Error
        • How to Debug Using a Log
        • How to Understand Performance Problems
        • How to Fix Performance Problems
        • How to Optimize Loops
        • How to Deal with I/O Expense
        • How to Manage Memory
        • How to Deal with Intermittent Bugs
        • How to Learn Design Skills
        • How to Conduct Experiments
      • Team-Skills
        • Why Estimation is Important
        • How to Estimate Programming Time
        • How to Find Out Information
        • How to Utilize People as Information Sources
        • How to Document Wisely
        • How to Work with Poor Code
        • How to Use Source Code Control
        • How to Unit Test
        • Take Breaks when Stumped
        • How to Recognize When to Go Home
        • How to Deal with Difficult People
    • 2. Intermediate
      • Judgment
        • How to Tradeoff Quality Against Development Time
        • How to Manage Software System Dependence
        • How to Decide if Software is Too Immature
        • How to Make a Buy vs. Build Decision
        • How to Grow Professionally
        • How to Evaluate Interviewees
        • How to Know When to Apply Fancy Computer Science
        • How to Talk to Non-Engineers
      • Personal-Skills
        • How to Stay Motivated
        • How to be Widely Trusted
        • How to Tradeoff Time vs. Space
        • How to Stress Test
        • How to Balance Brevity and Abstraction
        • How to Learn New Skills
        • Learn to Type
        • How to Do Integration Testing
        • Communication Languages
        • Heavy Tools
        • How to analyze data
      • Team-Skills
        • How to Manage Development Time
        • How to Manage Third-Party Software Risks
        • How to Manage Consultants
        • How to Communicate the Right Amount
        • How to Disagree Honestly and Get Away with It
    • 3. Advanced
      • Compromising-Wisely
        • How to Fight Schedule Pressure
        • How to Understand the User
        • How to Get a Promotion
      • Serving-Your-Team
        • How to Develop Talent
        • How to Choose What to Work On
        • How to Get the Most From Your Team-mates
        • How to Divide Problems Up
        • How to Handle Boring Tasks
        • How to Gather Support for a Project
        • How to Grow a System
        • How to Communicate Well
        • How to Tell People Things They Don't Want to Hear
        • How to Tell People Things They Don't Want to Hear
        • How to Deal with Managerial Myths
        • How to Deal with Organizational Chaos
      • Technical-Judgment
        • How to Tell the Hard From the Impossible
        • How to Utilize Embedded Languages
        • Choosing Languages
  • Как быть программистом: Community Version
    • Приложение A - Библиография/Список сайтов
    • Приложение B - История
    • Участие в проекте
    • Глоссарий
    • Creative Commons Attribution Share-Alike
    • Содержание
    • 1. Начинающий программист
      • Personal-Skills
        • Научитесь отлаживать
        • Как отлаживать, разделяя пространство проблемы
        • Как устранять баги
        • Как отлаживать, используя логи
        • Как определять проблемы производительности
        • Как устранять проблемы производительности
        • Как оптимизировать циклы
        • Как справиться с расходами на операции чтения и записи
        • Как управлять памятью
        • Как устранять плавающие баги
        • Как научиться проектировать программы
        • Как экспериментировать
      • Team-Skills
        • Почему важно оценивать задачи
        • Как оценивать время на разработку
        • Как искать информацию
        • Как спрашивать людей
        • Как документировать правильно
        • Как работать с плохим кодом
        • Как использовать системы контроля версий
        • Как писать юнит-тесты
        • Делайте перерывы, когда вы в тупике
        • Как понять, когда идти домой
        • Как вести себя с трудными людьми
    • 2. Программист среднего уровня
      • Judgment
        • Как балансировать качество и время разработки
        • Как управлять зависимостями
        • Как оценивать стороннее программное обеспечение
        • Как решить: покупать программу или писать свою
        • Как расти профессионально
        • Как проводить собеседования
        • Как понять, когда применять высокие технологии
        • Как разговаривать с неинженерами
      • Personal-Skills
        • Как сохранять мотивацию
        • Как заслужить доверие
        • Как балансировать процессорное время и память
        • Как проводить стресс-тестирование
        • Как балансировать краткость и абстракцию
        • Как осваивать новые навыки
        • Научитесь печатать вслепую
        • Как проводить интеграционное тестирование
        • Языки взаимодействия систем
        • Стандартные технологии
        • Как анализировать данные
      • Team-Skills
        • Как управлять временем разработки
        • Как управлять рисками, связанными со сторонним программным обеспечением
        • Как руководить консультантами
        • Как соизмерять количество общения
        • Как честно выражать несогласие
    • 3. Продвинутый программист
      • Compromising-Wisely
        • Как справляться с давлением графика
        • Как понять пользователя
        • Как получить повышение
      • Serving-Your-Team
        • Как развивать таланты
        • Как выбрать, над чем работать
        • Как получить наибольшую отдачу от коллег
        • Как разделять задачи
        • Как распределять скучные задания
        • Как получить поддержку для проекта
        • Как развивать систему
        • Как качественно взаимодействовать
        • Как сообщать неприятное
        • Как справляться с менеджерскими мифами
        • Как справляться с организационным хаосом
      • Technical-Judgment
        • Как отличить сложное от невозможного
        • Как использовать встроенные языки
        • Выбор языка программирования
  • How to be a Programmer 中文版
    • 词汇表
    • 附录 A - 书目/网站目录
    • 附录 B - 历史
    • Contributions
    • Creative Commons Attribution Share-Alike
    • How to be a Programmer 正體中文版
    • 1. 入门
      • Personal-Skills
        • 學習除錯
        • 如何通过分割问题 Debug
        • 如何移除一个错误
        • 如何使用日志调试
        • 如何理解性能问题
        • 如何修复性能问题
        • 如何优化循环
        • 如何处理I/O代价
        • 如何管理内存
        • 如何处理偶现的 Bugs
        • 如何学习设计技能
        • 如何进行实验
      • Team-Skills
        • 为什么评估很重要
        • 如何评估编程时间
        • 如何发现信息
        • 如何把人们作为信息源
        • 如何睿智地写文档
        • 如何在糟糕的代码上工作
        • 如何使用源代码控制
        • 如何进行单元测试
        • 毫无头绪?,休息一下
        • 如何识别下班时间
        • 如何与不好相处的人相处
    • 2. 进阶
      • Judgment
        • 如何在开发质量与开发时间权衡
        • 如何管理软件系统依赖
        • 如何判断软件是否太不成熟了
        • 如何做购买还是构建的决定
        • 如何专业地成长
        • 如何评估面试者
        • 如何决定什么时候使用奇妙的计算机科学
        • 如何与非工程师交谈
      • Personal-Skills
        • 如何保持活力
        • 如何被广泛信任
        • 如何在时间与空间权衡
        • 如何进行压力测试
        • 如何在简洁与抽象间平衡
        • 如何学习新技能
        • 学会打字
        • 如何做集成测试
        • 交流语言
        • 重型工具
        • 如何分析数据
      • Team-Skills
        • 如何管理开发时间
        • 如何管理第三方软件危机
        • 如何管理咨询师
        • 如何适量交流
        • 如何直言异议以及如何避免
    • 3. 高级
      • Compromising-Wisely
        • 如何与时间压力做斗争
        • 如何理解用户
        • 如何获得晋升
      • Serving-Your-Team
        • 如何发展才能
        • 如何选择工作的内容
        • 如何让你队友的价值最大化
        • 如何划分问题
        • 如何处理无聊的任务
        • 如何为工程获取支持
        • 如何发展一个系统
        • 如何有效地沟通
        • 如何告诉人们他们不想听的东西
        • 如何处理管理神话
        • 如何处理组织混乱
      • Technical-Judgment
        • 如何从不可能中找到困难的部分
        • 如何使用嵌入型语言
        • 选择语言
  • How to be a Programmer 中文版
    • 词汇表
    • 附录 A - 书目/网站目录
    • 附录 B - 历史
    • Contributions
    • Creative Commons Attribution Share-Alike
    • How to be a Programmer 中文版
    • 1. 入门
      • Personal-Skills
        • 学会 Debug
        • 如何通过分割问题 Debug
        • 如何移除一个错误
        • 如何使用日志调试
        • 如何理解性能问题
        • 如何修复性能问题
        • 如何优化循环
        • 如何处理I/O代价
        • 如何管理内存
        • 如何处理偶现的 Bugs
        • 如何学习设计技能
        • 如何进行实验
      • Team-Skills
        • 为什么评估很重要
        • 如何评估编程时间
        • 如何发现信息
        • 如何把人们作为信息源
        • 如何睿智地写文档
        • 如何在糟糕的代码上工作
        • 如何使用源代码控制
        • 如何进行单元测试
        • 毫无头绪?,休息一下
        • 如何识别下班时间
        • 如何与不好相处的人相处
    • 2. 进阶
      • Judgment
        • 如何在开发质量与开发时间权衡
        • 如何管理软件系统依赖
        • 如何判断软件是否太不成熟了
        • 如何做购买还是构建的决定
        • 如何专业地成长
        • 如何评估面试者
        • 如何决定什么时候使用奇妙的计算机科学
        • 如何与非工程师交谈
      • Personal-Skills
        • 如何保持活力
        • 如何被广泛信任
        • 如何在时间与空间权衡
        • 如何进行压力测试
        • 如何在简洁与抽象间平衡
        • 如何学习新技能
        • 学会打字
        • 如何做集成测试
        • 交流语言
        • 重型工具
        • 如何分析数据
      • Team-Skills
        • 如何管理开发时间
        • 如何管理第三方软件危机
        • 如何管理咨询师
        • 如何适量交流
        • 如何直言异议以及如何避免
    • 3. 高级
      • Compromising-Wisely
        • 如何与时间压力做斗争
        • 如何理解用户
        • 如何获得晋升
      • Serving-Your-Team
        • 如何发展才能
        • 如何选择工作的内容
        • 如何让你队友的价值最大化
        • 如何划分问题
        • 如何处理无聊的任务
        • 如何为工程获取支持
        • 如何发展一个系统
        • 如何有效地沟通
        • 如何告诉人们他们不想听的东西
        • 如何处理管理神话
        • 如何处理组织混乱
      • Technical-Judgment
        • 如何从不可能中找到困难的部分
        • 如何使用嵌入型语言
        • 选择语言
  • 2-Intermediate
    • Judgment
      • Design Patterns
Powered by GitBook
On this page

Was this helpful?

  1. How to be a Programmer: Community Version
  2. 2. Intermediate
  3. Personal-Skills

How to analyze data

PreviousHeavy ToolsNextTeam-Skills

Last updated 5 years ago

Was this helpful?

Data analysis is a process in the early stages of software development, when you examine a business activity and find the requirements to convert it into a software application. This is a formal definition, which may lead you to believe that data analysis is an action that you should better leave to the systems analysts, while you, the programmer, should focus on coding what somebody else has designed. If we follow strictly the software engineering paradigm, it may be correct. Experienced programmers become designers and the sharpest designers become business analysts, thus being entitled to think about all the data requirements and give you a well defined task to carry out. This is not entirely accurate, because data is the core value of every programming activity. Whatever you do in your programs, you are either moving around or modifying data. The business analyst is analysing the needs in a larger scale, and the software designer is further squeezing such scale so that, when the problem lands on your desk, it seems that all you need to do is to apply clever algorithms and start moving existing data.

Not so.

No matter at which stage you start looking at it, data is the main concern of a well designed application. If you look closely at how a business analyst gets the requirements out of the customer's requests, you'll realize that data plays a fundamental role. The analyst creates so called Data Flow Diagrams, where all data sources are identified and the flow of information is shaped. Having clearly defined which data should be part of the system, the designer will shape up the data sources, in terms of database relations, data exchange protocols, and file formats, so that the task is ready to be passed down to the programmer. However, the process is not over yet, because you (the programmer) even after this thorough process of data refinement, are required to analyze data to perform the task in the best possible way. The bottom line of your task is the core message of Niklaus Wirth, the father of several languages. "Algorithms + Data Structures = Programs." There is never an algorithm standing alone, doing something to itself. Every algorithm is supposed to do something to at least one piece of data.

Therefore, since algorithms don't spin their wheels in a vacuum, you need to analyze both the data that somebody else has identified for you and the data that is necessary to write down your code. A trivial example will make the matter clearer. You are implementing a search routine for a library. According to your specifications, the user can select books by a combination of genre, author, title, publisher, printing year, and number of pages. The ultimate goal of your routine is to produce a legal SQL statement to search the back-end database. Based on these requirements, you have several choices: check each control in turn, using a "switch" statement, or several "if" ones; make an array of data controls, checking each element to see if it is set; create (or use) an abstract control object from which to inherit all your specific controls, and connect them to an event-driven engine. If your requirements include also tuning up the query performance, by making sure that the items are checked in a specific order, you may consider using a tree of components to build your SQL statement. As you can see, the choice of the algorithm depends on the data you decide to use, or to create. Such decisions can make all the difference between an efficient algorithm and a disastrous one. However, efficiency is not the only concern. You may use a dozen named variables in your code and make it as efficient as it can ever be. But such a piece of code might not be easily maintainable. Perhaps choosing an appropriate container for your variables could keep the same speed and in addition allow your colleagues to understand the code better when they look at it next year. Furthermore, choosing a well defined data structure may allow them to extend the functionality of your code without rewriting it. In the long run, your choices of data determines how long your code will survive after you are finished with it. Let me give you another example, just some more food for thought. Let's suppose that your task is to find all the words in a dictionary with more than three anagrams, where an anagram must be another word in the same dictionary. If you think of it as a computational task, you will end up with an endless effort, trying to work out all the combinations of each word and then comparing it to the other words in the list. However, if you analyze the data at hand, you'll realize that each word may be represented by a record containing the word itself and a sorted array of its letters as ID. Armed with such knowledge, finding anagrams means just sorting the list on the additional field and picking up the ones that share the same ID. The brute force algorithm may take several days to run, while the smart one is just a matter of a few seconds. Remember this example the next time you are facing an intractable problem.

Next

Team Skills - How to Manage Development Time